We've discovered that 2 a.m. is a bad time to fix your service. Over the past year, we learned that some of our design decisions actively harmed visibility—late-night pages were more painful than they had to be. In response to this, we evolved our architecture to make our microservices better and ease our immediate (sleepless) pain.
By the end of this talk, you'll know the questions you need to ask when designing a highly available, cloud-based, fine-grained SOA system to improve visibility, diagnose-ability, and developer happiness. You'll also have a starting point for answering these questions based on our experiences: How do you get good information, indicating the problem source across microservices? How do you determine the tradeoff between quickly resolving the problem and collecting the information needed to address the root cause? And how do you provide enough information to come back later and fix the underlying problem?