Solana Outages Again After a Year: Review the Historical Outage Records
Colin Wu . 2024-02-07 . Data

By GaryMa

On February 6, the Solana network experienced another outage, the last time being around February 25, 2023. According to matthew sigel, head of digital asset research at VanEck, the Solana outage was due to a failure of the BPF (Berkley Packet Filter) loader, which is the mechanism for deploying, upgrading and executing programs on Solana. This may have something to do with a previous SMID proposal that added an interceptor to block the use of metadata in BPF because it was no longer needed. This came from the 0093 upgrade, but there was some kind of bug inside, which was found on the test network and a fix was created, but not yet implemented. It is assumed that someone manually triggered the error, causing Solana to go down.

Solana’s “outage” problem was often criticized by the community, although the network was basically stable in the past year, but in the past Solana has experienced several times of downtime, WuBlockchain summarize as follows:

1. On February 6, 2024, the BPF (Berkley Packet Filter) loader failed and was down for 4 hours and 46 minutes.

2. On February 25, 2023, Solana main network performance problems, unable to process user transactions. Later, Solana released an improved network upgrade plan, including measures to improve the Solana upgrade process, build a countermeasure team, and improve the restart process.

3. On October 1, 2022, the network was down due to a node configuration error.

4. Near August 3, 2022, there was a large-scale theft of coins from the Solana wallet, which was eventually found to be due to a vulnerability caused by the centralized Sentry server.

5. Near June 1, 2022, the durable nonce function in the transaction is vulnerable, causing the network to restart, and the interruption time is about 4.5 hours.

6. Near May 1, 2022, due to the casting of a new NFT project, a large number of robot transactions emerged, leading to the loss of consensus among the main network nodes, and then the block was suspended for 7 hours.

7. Near January 21, 2022, due to large market fluctuations, the network was flooded with a large number of transactions submitted by arbitrage robots, resulting in a heavy load on the network and an interruption time of up to 30 hours. At the time, though, it was officially classified as Degraded Performance, and the Solana community subsequently updated Mainnet to 1.8.14 in an attempt to improve the network.

8. Around September 14, 2021, due to the hot IDO activity of the decentralized social networking Protocol on the Raydium platform, many users sent a large number of transactions through machine scripts written by the users, which caused a “memory overflow”, resulting in the collapse of the verification node, and eventually the entire network could not block out. The outage lasted up to 17 hours.

9. Near September 3, 2021, the network was unstable and performance decreased for a duration of about 1 hour.

10. Near May 4, 2021, the network performance decreased, resulting in a large number of transactions could not be completed.

Looking back at historical network events, we will find that the emergence of a large number of transactions is the main reason for the historical network disruption, which may be related to Solana’s mechanism, according to Hu Zhiwei, president of the Boundary Intelligence Research Institute, because Solana will also pass consensus messages between verification nodes as a special transaction message. A large number of message jams lead to the consensus message can not be transmitted normally, so that the consensus can not proceed normally. At the same time, some of Solana’s features were targeted to cause network downtime. For example, in order to concurrently process transactions, the write lock (write-lock) is locked at many important addresses, so that the transaction becomes sequential execution rather than concurrent, which greatly affects the processing ability of messages; In order to process forks, the node keeps possible fork information, resulting in memory overflow and so on.

Faced with the common occurrence of spam transactions resulting in reduced network performance and even downtime, Solana co-founder Anatoly Yakovenko previously acknowledged the problem and said that the introduction of “actual flow control” to solve the problem. As for network outages caused by causes such as transaction nonce and node configuration errors, Solana officials also quickly released a repair version for node upgrades after the event.

This downtime after a year may be both good news and bad news, but it is more a kind of vigilance, especially in the context of the current Solana ecological heat gradually heating up,network stability is still a key concern.

