Twitter thrives on shares, not just within the social media platform but from partner links all over the Internet. Except on Monday, most of those links stopped working.
For approximately an hour, anyone trying to share recently published articles on Twitter was met with an error message clearly intended for developers:
It was almost as if Twitter was informing publishers that they didn’t pay their water bill and, as such, couldn’t publish links on the social network.
What went wrong?
We didn’t have to wait too long for Twitter CEO Elon Musk to explain. In response to a tweet from former Netscape founder and well-known venture capitalist Marc Andreessen pointing out how four of the five top Twitter trends were about Twitter, Musk tweeted, “A small API change had massive ramifications. The code stack is extremely brittle for no good reason. Will ultimately need a complete rewrite.”
A small API change had massive ramifications. The code stack is extremely brittle for no good reason. Will ultimately need a complete rewrite.March 6, 2023
This seemingly clear-headed tweet though should be cause for alarm. Musk claims the code stack (basically a massive stack of programs that all work together to create the Twitter whole) is brittle and needs a rewrite. What he fails to mention is that among the thousands of Twitter employees he laid off since November, a good number of them were engineers and, it’s safe to assume, some were in what’s known as QA or quality assurance.
Typically if you plan on making any kind of code change to a website, online service, or app, QA tests it on an offline copy of the platform. In this way, they ensure that the updates, no matter how small, won’t adversely impact the live environment.
The concept is known as “production,” the live site or service, versus “staging,” an environment that’s identical to live but can not be seen or touched by users. You run your new code or feature through staging, a group of QA testers applies a set of known scenarios (maybe they throw in an edge case or two) and as long as there are no red flags, the update gets pushed from Staging to Production.
Twitter, which has seen its overall reliability drop (from going offline to having features appear and disappear unexpectedly) since Musk took over, may be getting its updates in a different way.
Musk likes to test features on production (the live site). As a result, he keeps running into unintended consequences.
There is some disagreement on whether or not there is a Twitter QA team.
Some argue one exists but Musk grows impatient and then pushes untested code live.
Others insist that Elon Musk arrived at Twitter and discovered that Twitter had no QA team and it was long in the practice of pushing untested code live. That though seems highly unlikely.
I asked Musk directly on Twitter if the API update was tested on staging before being pushed live and will update this post if he responds.
Never assume
The assumption he made here, that a small API change would have little impact on the site was a poor one. And, yet, Musk still doesn’t understand that he’s doing it wrong.
Testing features of any kind on a live version of a complex platform like Twitter will inevitably result in bugs and crashes.
Will rewriting the code stack solve all this? Maybe, but very few platforms stay as clean as they were on launch and even if the rewrite is robust and perfect, frequent updates and fresh features will test that stability.
As long as Musk refuses to fully test what he launches before he launches it, there is no scenario in which Twitter escapes regular downtime.
This is a simple fix, Elon, make QA an inescapable part of the development pipeline and save yourself and us a lot of headaches. Or keep doing it your way because that’s working out so, so well.