At around 23:35 UTC, July 9th we received an alert that our Core API was not responding. This resulted in our SaaS customers not being able to use the Flagsmith dashboard (app.flagsmith.com). Customers SDK’s serving flags were not impacted for those using the Edge API. Please note, any customers still using our Core API to serve flags were also impacted. This number is limited as we have advised customers to migration to the Edge API starting in June 2022.
Our team resolved the issue at 3:06 UTC, July 10th and the Core API was fully responsive. The root cause of the issue was a database running at maximum CPU caused by requests to an end point that triggered an inefficient query. We also had our load balancer consistently recycling unhealthy API tasks that also strained the system due to unnecessary database connections. These two items combined, resulted in the core API being unresponsive.
We recovered the database by dropping all traffic and terminating all open connections. This allowed the database to be recovered and process traffic normally.
We are mitigating future issues like this by doing the following: