Slowness in Builder Prime app
Incident Report for Builder Prime
Postmortem

Builder Prime began experiencing performance issues at approximately 10:30am Eastern Time on July 14, 2021.  In many cases, these issues made the app unusable for periods of time while we worked to restore the service.  Service was fully restored at approximately 2:30pm Eastern Time.  Please note that no data was lost from the database during this time.  We also strongly encourage you to subscribe for updates on our status page to be immediately notified about any issues or maintenances that are occurring with the Builder Prime service.  You can subscribe for updates here: https://builderprime.statuspage.io.

What happened?

The Builder Prime database went through a relatively routine upgrade between 1:00am and 2:00am on July 14.  All indications were that the upgrade was successful, and our tests confirmed this as well.  The database memory was increased to better accommodate some of the increased load we were seeing on the app, so this was supposed to make the app perform better.  As the load on the database began to increase during the morning of July 14, the upgraded database was unable to handle it adequately even though the load never increased beyond what we were previously able to handle under the previous configuration.

What did we do to fix it?

Builder Prime support was alerted about the issue at approximately 10:30am and began to investigate at that time.  We engaged our service provider for support on this as well.  Ultimately, the only conclusion we could draw at the time was that the upgraded database could not keep up with the number of queries and updates going against it.  We made the decision to perform another upgrade on the database and move it to infrastructure with significantly larger resources.  We cut over to the new database at approximately 2:30pm and pretty quickly saw load and response times return back to normal.  In fact, we observed performance that was much better than it had been previous to the first upgrade.

What are we doing to prevent this in the future?

We take this type of issue incredibly seriously.  We understand that many of you depend on this application to be up and running, especially during the business day.  There are several measures that we are taking to prevent this type of issue from occurring in the future.

1) We have improved our alerting to be notified of potential issues with database performance sooner, so that action can be taken sooner.

2) We will be changing our maintenance window for this type of change and only performing database upgrades over the weekends if at all possible.  

3) We have strengthened our escalation procedures with our service provider to get them engaged with us more quickly when an infrastructure issue arises.

4) We will be immediately commencing on efforts to further tune performance with respect to the database to make absolutely sure that the database can continue to scale with the increasing load.

5) We have engaged with our service provider's solutions architecture team to ensure we are doing everything possible to strengthen the reliability and performance of the application, and especially the database.

6) We will be evaluating tools to assist with load testing to better ensure that these types of database upgrades do not cause similar issues in the future.

7) We are following up with our service provider to understand why the upgraded database had significantly worse performance than the previous database, when the opposite should have been true.

We value the trust that you place in us to help you run and grow your businesses, and we are treating this with the utmost urgency.  We sincerely apologize for the difficulty this has caused and we are doing everything possible to ensure we do not see this type of issue or any similar issues again.

Posted Jul 15, 2021 - 11:30 EDT

Resolved
We will be sending a post mortem for this issue by tomorrow with what happened and how we will prevent this type of issue in the future.
Posted Jul 14, 2021 - 17:23 EDT
Monitoring
We have applied a fix and performance looks to be improving. We are closely monitoring.
Posted Jul 14, 2021 - 15:15 EDT
Update
We will be taking the app offline for a minute or two to try to correct the issues permanently.
Posted Jul 14, 2021 - 14:27 EDT
Update
We are still actively working on this issue and engaged with our service providers
Posted Jul 14, 2021 - 14:01 EDT
Update
We are still working to resolve this issue.
Posted Jul 14, 2021 - 13:06 EDT
Update
We are seeing some improvement, but there is still slowness and we continue to work on the issue.
Posted Jul 14, 2021 - 11:44 EDT
Investigating
We are currently investigating the cause of slowness being experienced in the app.
Posted Jul 14, 2021 - 11:12 EDT
This incident affected: Web and Mobile Apps.