At 13:15 UTC on 03 Feb 2022, we began deploying a routine release of the Flagsmith application to our production SaaS environment. This release included a database migration which added a new unique index to one of our tables which holds information about multivariate values for features. When the migration was run in our other environments we noticed no ill effects from the addition of the index, however, in production where we have substantially more data this index took longer to add than anticipated and required a full table lock during that period.
Our monitoring shows that the application was unresponsive for a period of just under 2 minutes while the migration was running.
To improve on this in the future, we are planning to upgrade our version of Django to allow us to easily add indexes concurrently. We will also be monitoring more carefully for future index additions and checking whether they will require a table lock. Finally, we will be looking at making our staging environment more representative of production in terms of data so that we can catch issues such as this in the future.