A reliable CORBA-based network management system
Network management provides the central nervous system for the networks of telecommunications providers. A telco's network management system (NMS) needs to support uninterrupted management functionality of complex networks. The reliability of such systems has direct impact on the quality of services (QoS) provided to the consumers. Even a short down-time of the NMS may cause customer dissatisfaction, revenue losses, and may even jeopardize life. In order to expedite the process of transforming technological capabilities into services and to shorten the development cycle of its NMS, the telecommunication industry is adopting CORBA as an underlying architecture. However, neither the CORBA specifications nor the available services currently provide direct support for fault-tolerant objects. Consequently, NMS developers using CORBA must provide their own fault-tolerance mechanism for mission-critical objects. This paper reviews available fault-tolerance approaches in the research literature, presents the architecture of GTE's next generation NMS, discusses the reliability issues involved in such systems, and provides our approaches to solve them. Specifically, we present in detail our fault-tolerance approaches for the naming server, event channels, and other inhouse built critical business objects. A brief comparison of our approaches with others is also given.