Software Engineering Institute | Carnegie Mellon University
Software Engineering Institute | Carnegie Mellon University

Digital Library


Fixing Under Fire

  • This presentation was created for a conference series or symposium and does not necessarily reflect the positions and views of the Software Engineering Institute.
  • Abstract

    We've discovered that 2 a.m. is a bad time to fix your service. Over the past year, we learned that some of our design decisions actively harmed visibility—late-night pages were more painful than they had to be. In response to this, we evolved our architecture to make our microservices better and ease our immediate (sleepless) pain.

    By the end of this talk, you'll know the questions you need to ask when designing a highly available, cloud-based, fine-grained SOA system to improve visibility, diagnose-ability, and developer happiness. You'll also have a starting point for answering these questions based on our experiences: How do you get good information, indicating the problem source across microservices? How do you determine the tradeoff between quickly resolving the problem and collecting the information needed to address the root cause? And how do you provide enough information to come back later and fix the underlying problem?

  • Download

Part of a Collection

SATURN 2018 Presentations