Incident support

March 17, 2026


Last night, a 4+ hour Railway incident created a Kernel incident. Their status page tells us that the incident lasted 37m. The incident reframed the value of support for me.

screenshot of a Slack conversation during incident  Before this, I thought of support as working on customer issues. Things like collecting bugs and feedback, helping deploy and debug. That’s still true, but I now realize that’s like saying firefighters exist to teach “stop, drop and roll” to kids.

Customers understand that incidents happen. When they do, as fixes are being worked on, most customers are patient, but only insofar as they’re kept in the loop. Because incidents aren’t just “I can’t get this to work”. They’re“my CEO and customers are demanding updates.” Because all your customers have their own customers, so when a vendor goes down, communication lets the entire dependency chain provide downstream communications. If AWS has an issue that affects Railway that affects us (Kernel) that affects Rye, Rye needs to be able to provide continuous updates for their customers. When your service goes down, communication briefly becomes the service.

This is where customer engineering, as a practice, really shows how it diverges from customer support. Instead of simply triaging and passing off customer tickets, customer engineering serves as the frontline of incident management: actually testing whether our systems are down, creating repro cases if possible, and communicating with infrastructure engineering and the customer during the incident. It’s testament to the fact that business is increasingly run on and increasing by software, business processes must align with software processes.