Technical debt and airplanes are a rough combination
By Paul Witman
By Scott Mackelprang
In late 2022, winter storms and technical problems caused Southwest Airlines to cancel thousands of flights, disrupting travel for 2 million passengers and complicating life for their flight crews and other staff. In early April 2023, Southwest released an action plan to guide public perception of the updates and changes they plan to make to prevent future such issues.
On April 18, Lynn — a friend whose name has been changed to protect her anonymity — was on her way from New Orleans to Dallas, then to California for a brief visit with old friends. She, along with thousands of other Southwest passengers, were stuck waiting to board, or on the tarmac, due to “a vendor-supplied” component that failed. The component failure blocked access to “operational data,” so Southwest was forced to halt takeoffs for about an hour. The winter breakdown, along with the 4/18 failure, have been blamed by many on “technical debt.”
For consumers, it’s a word that can carry a lot of baggage, along with opportunities. Without debt, most of us would be hard-pressed to save enough to buy a car or a home.
But if used poorly, debt can get us into a lot of trouble, and cost a lot of money. Much of this is true for business debt — loans are a key to making investments, and to maintaining needed cash flow. Technical debt, on the other hand, is more like “deferred maintenance” akin to potholes in the street that may damage cars, but don’t have substantial direct consequences until they cause an accident and someone gets hurt.
Technical debt for technology, like Southwest’s aircraft and crew management systems, isn’t always so obvious. It can arise as a result of business growth unmatched by technical capacity growth, by process issues (e.g., requiring flight crews to make a phone call to report their location and readiness status, rather than using an app on their phone), or by investing in other IT projects that are customer-facing and generate customer value and new revenue, among others.
Southwest made a point of saying, in their action plan, that the airline would spend over $1 billion on “technology projects” in 2023. However, as in prior years, they don’t specify whether or how much of those projects are customer-facing, versus how much is focused on improving the stability and performance of the “back end” operational systems required to get planes in the air and back to the ground safely.
In the April 18 case, the problem was reported to be a third-party component, specifically, a firewall. Calling it a “third party” appears on its face to be a blame diversion — it was “their (the firewall maker’s) fault.” Firewalls are not even remotely likely to be something that an airline would build for themselves, in any case, and Southwest made the choice of device to use, so they can’t just blame the vendor. Further, to say that the failure of “a firewall” caused this outage is to say that there was only one of these devices in place, where one would expect redundant components to prevent exactly this scenario.
It is easy to second-guess these issues and to make judgments after. In Lynn’s case, despite the hour-delayed takeoff from New Orleans, and shrinking her layover in Dallas from two hours to 40 minutes, it turned out that the re-scheduling system worked in her favor.
She wound up right back on the same plane for the trip to California, and that leg of the journey landed on time. We can only hope and trust that Southwest is going to get rid of some of that old technical debt and prioritize upgrading the capacity of their scheduler, as well as the reliability of their flight operations systems. It is more fun and more immediately profitable to build tools for new revenue — but dragging that old infrastructural baggage along can be really expensive as well.
Paul Witman is an information technology management professor at Cal Lutheran University and Scott Mackelprang is a consultant in cybersecurity.