Free Assessment: How does your sales & marketing measure up?

Get Started
Close

Free Assessment:

How does your sales & marketing measure up?
Take this free, 5-minute assessment and learn what you can start doing today to boost traffic, leads, and sales.
Get Started
Carina Duffy

By Carina Duffy

Apr 9, 2019

Topics:

HubSpot
HubSpot

A Breakdown of HubSpot’s Outage Retrospective for the Non-Technical User

Carina Duffy

By Carina Duffy

Apr 9, 2019

A Breakdown of HubSpot’s Outage Retrospective for the Non-Technical User

If you’re a HubSpot user, unless you were on vacation from March 29 until now (if you were, I’m jealous!), you recently experienced one of the most catastrophic outages HubSpot has ever had.

Honestly, it was a mess. Most of the time during product outages you’ll have one or two tools that aren’t working or are having bugs, and it’s resolved within minutes or hours.

This time it was just about everything: emails, form submissions, workflows, lists, sales tools, CRM, imports, analytics. It’s hard to find any pieces of the tool that WEREN’T affected during that time.

What’s even more troubling is that this outage took about 36 hours to be mostly resolved, but the processing of the backlog of data from those 36 hours is still going on at the time this article was released.

Thankfully, during all of this, HubSpot’s crisis communication was solid. Updates on status.hubspot.com were regular and timely (although are they ever frequent enough??) given the scale of the situation.

JD Sherman (HubSpot’s COO) released an article on March 29 with an apology and an outline of next steps for the team -- namely, doing an in-depth retrospective on the cause of the issue and how they’ll make sure it won’t happen again.

That retrospective was delivered on April 4. You can read the full article here. There’s a lot of detail in there about how their systems are structured and what exactly happened. If you’re not into all of the “geek-speak,” we’ve got you covered.

A Quick Review of How HubSpot’s Infrastructure Works

HubSpot uses a combination of software systems -- Kafka and ZooKeeper -- that allow all of the HubSpot tools to talk to each other and all of the data to be processed effectively.

Both these software systems have redundancies and safeguards built into them so that if some servers crash, other servers can pick up the slack, and end users don’t experience any issues.

So What Broke?

It’s a bit difficult to explain without getting super technical, but think about it like a series of unfortunate events.

High strain was put onto ZooKeeper, causing parts of it to crash. Typically, ZooKeeper recovers quickly, but in this case, it took several minutes. The delay in recovery then broke the communication between ZooKeeper and Kafka, causing Kafka to crash.

Even though the team was able to restore ZooKeeper, the damage was done in Kafka and it wasn’t able to recover. What made things worse was a second outage in ZooKeeper accompanied by trying to restart Kafka, which started to cause data corruption.

Why Did It Take So Long to Fix?

Corrupted data? That sounds bad. And well, it is. This is actually why some things took so long to come back online.

When the HubSpot team realized that the server recovery was starting to corrupt data, they had a decision to make: either focus on recovering data (and safeguarding against corrupted data) or focus on restoring the tools.

They decided to focus on recovering data to ensure that there would be no gaps in historical data for customers (which in the long run they believe to be the right decision, and I’d personally agree!). This is the reason that the affected tools took almost 36 hours to be restored.

So, in the name of protecting customer data, HubSpot manually recovered a whoooole bunch of our data, and then was able to restore the affected tools.

This is also why you’re still seeing (at the time this article was published) the “continuing to process data from March 28 & 29” status message from HubSpot.

What Now?

Now that we know exactly what happened, HubSpot’s got a plan to make sure this never happens again. An interesting note in all of this is that HubSpot’s own teams use many of their tools across different parts of the business, so this not only affected their customers but their own business (even more motivation to make sure it never happens again!).

They’re making changes in a few different areas to protect against another outage: technical/infrastructure, reliability, testing, and communication.

Technical / Infrastructure

As is to be expected, HubSpot will be doing some restructuring of their server clusters to make sure it’s not even possible to have an outage this large again. By doing this, any outage that does happen should be restricted to a small piece of the platform, and the recovery time for issues should be significantly quicker.

Reliability

HubSpot does have a team of people who test and upgrade their systems, but it hasn’t been as high of a priority as it should be. Now, they’ll have a dedicated team of people who will “oversee new standards, frequencies, and resources to ensure that we're consistently evaluating our key infrastructure systems for code fixes and critical patches without gaps.”

Testing

Along with investing much more heavily into the reliability of their platform, HubSpot is also increasing the level of frequency and depth to which they’re testing their systems. Again, it’s not that these processes didn’t exist before, but this outage uncovered some gaps in the frequency in which they test for massive failures, as well as how comprehensively they test these systems.

Communication

Lastly, HubSpot is committing to making their communication during any major incident more frequent and helpful, specifically in the minutes and hours immediately following an issue.

Their status updates will now include more detailed explanations of what is going on, as well as when the next update can be expected.

In Conclusion

No one here is pretending that this outage wasn’t bad. Not even HubSpot. But one of the things I appreciate the most about HubSpot as an organization is their transparency and willingness to admit when they’ve messed up.

They know the impact this had on their customers, and on their own business, and they’re actively seeking to make sure it never happens again.

So, even if you’re a little rattled by this outage, know that improvements are being made, fixes are being implemented, and HubSpot will continue to make their product the best it can be. Okay -- HubSpot lovefest over!

Free Assessment:

How does your sales & marketing measure up?
Take this free, 5-minute assessment and learn what you can start doing today to boost traffic, leads, and sales.

Related Articles

HubSpot Update September 2023: What’s New from INBOUND

September 13, 2023
Jessica Palmeri Jessica Palmeri

HubSpot Pricing: Your Guide to Everything HubSpot Costs

August 24, 2023
Will Smith Will Smith

How To Measure The Trust You've Built With Your Audience (with template)

August 7, 2023
Marcus Sheridan Marcus Sheridan

HubSpot CRM Review — Pros and Cons

July 24, 2023
John Becker John Becker

Track These 5 Inbound Marketing Metrics to See Better Results

October 31, 2022
John Becker John Becker

Can HubSpot Help My Retail Business Grow?

July 31, 2022
Joe Bachir Joe Bachir

Get More Out of HubSpot Reporting With a Third-party Tool

July 9, 2022
John Becker John Becker

4 Keys To An Effective HubSpot Strategy in 2022

April 22, 2022
Carina Duffy Carina Duffy

Using They Ask, You Answer in Customer Service

April 1, 2022
John Becker John Becker

Is The HubSpot Free CRM Actually Free?

February 18, 2022
Joe Bachir Joe Bachir

Ultimate List of HubSpot Pros and Cons

January 8, 2022
Carina Duffy Carina Duffy

How To Optimize Your Marketing Automation Workflows With HubSpot (Tips)

November 12, 2021
Kimberly Marshall Kimberly Marshall

Top 13 Inbound Marketing & HubSpot Solutions Partner Program Agencies for 2022

October 29, 2021
Kimberly Marshall Kimberly Marshall

HubSpot Sales Hub: 18 Things Every Sales Rep Should Know How to Do (+ Videos)

October 28, 2021
John Becker John Becker

INBOUND 2021 Recap: Takeaways, Speakers, and Lessons Learned

October 25, 2021
John Becker John Becker

Need a HubSpot Admin? Here’s How to Find and Hire the Right Candidate

October 22, 2021
John Becker John Becker

How to Get Sales Reps to Use the HubSpot CRM

October 18, 2021
Kimberly Marshall Kimberly Marshall

HubSpot and Data Privacy: How to Collect Contacts the Right Way

October 4, 2021
John Becker John Becker

INBOUND is Fast Approaching, Google Leads are Syncing, and Workflow Actions are Placeholding [Hubcast 275]

September 23, 2021
Carina Duffy Carina Duffy

How to Know When You’ve Outgrown HubSpot Sales Hub Starter

September 23, 2021
John Becker John Becker

How to Get the Most Out of Your 2-Week HubSpot Free Trial

September 13, 2021
John Becker John Becker

CMS Hub Starter, business unit add-on, and Stephanie does email validation on a giant database [Hubcast ep. 274]

August 27, 2021
Carina Duffy Carina Duffy

How much HubSpot do I need?

August 17, 2021
John Becker John Becker

New HubSpot CMS Hub Starter Tier Released for Growing Businesses

August 6, 2021
Paul D. Grant Paul D. Grant

5 HubSpot Sales Hub Tips for Assignment Selling

July 23, 2021
Tracey Stepanchuk Tracey Stepanchuk