April 14th, Tencent Cloud announced that on April 8th, a large number of users reported service disruptions, including issues with logging into the Tencent Cloud console.
Today, Tencent Cloud has published a retrospective analysis of the incident and provided an overview of the situation. After troubleshooting, it was found that the inability of customers to log in to the console was caused by abnormalities in the cloud API. This affected certain public cloud services that rely on the cloud API, such as cloud functions, text recognition, microservices platform, audio content security, and captcha.
The official statement mentioned that the outage lasted approximately 87 minutes, during which 1957 customers reported issues.
The root cause of the malfunction was attributed to insufficient backward compatibility of the new version of the cloud API service and inadequate configuration data gray release mechanism.
During the API upgrade process, changes in the interface protocol of the new version resulted in abnormal processing logic for data transmitted from the old version frontend, generating erroneous configuration data. Due to insufficient gray release mechanism, the abnormal data quickly spread throughout the network, causing overall API usage abnormalities.
Following the outage, Tencent Cloud initiated a standard rollback plan to revert both the service backend and configuration data to the old version. However, due to a circular dependency where the container platform hosting the API service relied on the API service for scheduling capabilities, automatic service recovery was hindered.
Manual intervention was required to restart the API service and complete the fault recovery process.
The incident was reviewed as follows:
To prevent similar incidents in the future, improvement measures will be taken, focusing on enhancing system resilience, strengthening change management and protection measures, and improving fault response and communication capabilities.
These measures include:
The incident review and explanation by Tencent Cloud have been commended by some netizens for its transparency and commitment to improvement.