At Infura, we always strive for 100% data accuracy and service availability. When we fall short of that, we want our users to be aware and to understand how we are learning from previous incidents so that the quality of our service continues to improve.
During the 33 minutes of this incident, our API was returning stale block data with requests for most data being stalled at block 8795342
We utilize a high-performance datastore for storing near head data to optimize for data retrieval latency and consistency. This datastore is used as a common backend for answering common request methods like eth_getBlockNumber
and eth_getBlcokByNumber
, as well as state-accessing calls such as eth_call
. In the event of a degradation of our primary datastore, we have hot-standby infrastructure in place but the current process for promoting the standby infrastructure requires manual approval steps. Because of this, our time to recovery is not as quick as it could be.
Going forward, we are going to prioritize work on automating this failover process. While failures occur in any system, automated remediation should shield our users from being affected when these events occur.
Additionally, we will strive to share updates in real-time as an incident is occurring so you have as much information as possible to keep your team and your users informed.
As we roll out improvements to our infrastructure we will keep you informed via our blog