Unreliable intrusion detection in distributed computations
Distributed coordination is difficult, especially when the system may suffer intrusions that corrupt some component processes. In this paper we introduce the abstraction of a failure detector that a process can use to (imperfectly) detect the corruption (Byzantine failure) of another process. In general, our failure detectors can be unreliable, both by reporting a correct process to be faulty or by reporting a faulty process to be correct. However, we show that if these detectors satisfy certain plausible properties, then the well-known distributed consensus problem can be solved. We also present a randomized protocol using failure detectors that solves the consensus problem if either the requisite properties of failure detectors hold or if certain highly probable events eventually occur. This work can be viewed as a generalization of benign failure detectors popular in the distributed computing literature.