search menu icon-carat-right cmu-wordmark

Evaluating Distributed Systems Architectures for Fault-Tolerant Applications (SATURN 2008)

April 2008 Presentation
James Scott

Presentation for the 2008 SATURN workshop held in Pittsburgh


Software Engineering Institute



A large body of experience has been developed within the telecommunications industry with regard to fault-tolerant distributed systems architecture. This presentation focuses on key topics to consider in evaluating a proposed architecture for use in asynchronous, event-driven applications whose system quality attributes include stringent requirements for availability, reliability, and evolvability. A representative list of such topics includes - The processing model - Interprocess Communication - Redundancy Model - Fault Management and Recovery - Graceful Degradation Under Load - Operational Management and Maintenance - System Debugging Environment Architecture and design patterns derived from best practices emerging from the telecommunications industry will be discussed in order to provide additional insight into proven architecture and design practices being used in deployed fault-tolerant commercial systems. In addition, there will be discussion about how these topics and patterns can be applied within the context of the SEI Architecture Tradeoff Analysis Method (ATAM) of software architecture evaluation.