SWIM Protocol

SWIM "Outsourced Heartbeats"

The Scalable Weakly Consistent Infection-style Process Group Membership (SWIM) Protocol is a group membership protocol based on "outsourced heartbeats"^[1] used in distributed systems, first introduced by Abhinandan Das, Indranil Gupta and Ashish Motivala in 2002.^[2]^[3] It is a hybrid algorithm which combines failure detection with group membership dissemination.

Protocol

The protocol has two components, the Failure Detector Component and the Dissemination Component.

The Failure Detector Component functions as follows:

Every T' time units, each node ( $N_{1}$ ) sends a ping to random other node ( $N_{2}$ ) in its membership list.
If $N_{1}$ receives a response from $N_{2}$ , $N_{2}$ is decided to be healthy and $N_{1}$ updates its "last heard from" timestamp for $N_{2}$ to be the current time.
If $N_{1}$ does not receive a response, $N_{1}$ contacts k other nodes on its list ( ${N_{3}, ..., N_{3 + k}}$ ), and requests that they ping $N_{2}$ .
If after T' units of time: if no successful response is received, $N_{1}$ marks $N_{2}$ as failed.

The Dissemination Component functions as follows:

Upon $N_{1}$ detecting a failed node $N_{2}$ , $N_{1}$ sends a multicast message to the rest of the nodes in its membership list, with information about the failed node.
Voluntary requests for a node to enter/leave the group are also sent via multicast.

Properties

The protocol provides the following guarantees:

Strong Completeness: Full completeness is guaranteed (e.g. the crash-failure of any node in the group is eventually detected by all live nodes).
Detection Time: The expected value of detection time (from node failure to detection) is $T^{'} \dot{} \frac{1}{1 - e^{- q_{f}}}$ , where $T^{'}$ is the length of the protocol period, and $q_{f}$ is the fraction of non-faulty nodes in the group.^[3]

Extensions

The original SWIM paper lists the following extensions to make the protocol more robust:^[2]

Suspicion: Nodes that are unresponsive to ping messages are not initially marked as failed. Instead, they are marked as "suspicious"; nodes which discover a "suspicious" node still send a multicast to all other nodes including this mechanism. If a "suspicious" node responds to a ping before some time-out threshold, an "alive" message is sent via multicast to remove the "suspicious" label from the node.
Infection-Style Dissemination: Instead of propagating node failure information via multicast, protocol messages are piggybacked on the ping messages used to determine node liveness. This is equivalent to gossip dissemination.
Round-Robin Probe Target Selection: Instead of randomly picking a node to probe during each protocol time step, the protocol is modified so that each node performs a round-robin selection of probe target. This bounds the worst-case detection time of the protocol, without degrading the average detection time.

References

^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[:0-2] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[:1-3] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1]

[2]

[3]

v t e Computer science
Note: This template roughly follows the 2012 ACM Computing Classification System.
Hardware	Printed circuit board Peripheral Integrated circuit Very-large-scale integration System on a chip (SoC) Energy consumption (green computing) Electronic design automation Hardware acceleration Processor Size / Form
Computer systems organization	Computer architecture Computational complexity Dependability Embedded system Real-time computing Cyber-physical system Fault tolerance Wireless sensor network
Networks	Network architecture Network protocol Network components Network scheduler Network performance evaluation Network service
Software organization	Interpreter Middleware Virtual machine Operating system Software quality
Software notations and tools	Programming paradigm Programming language Compiler Domain-specific language Modeling language Software framework Integrated development environment Software configuration management Software library Software repository
Software development	Control flow Software development process Requirements analysis Software design Software construction Software deployment Software engineering Software maintenance Programming team Open-source model
Theory of computation	Model of computation Stochastic Formal language Automata theory Computability theory Computational complexity theory Logic Semantics
Algorithms	Algorithm design Analysis of algorithms Algorithmic efficiency Randomized algorithm Computational geometry
Mathematics of computing	Discrete mathematics Probability Statistics Mathematical software Information theory Mathematical analysis Numerical analysis Theoretical computer science Computational problem
Information systems	Database management system Information storage systems Enterprise information system Social information systems Geographic information system Decision support system Process control system Multimedia information system Data mining Digital library Computing platform Digital marketing World Wide Web Information retrieval
Security	Cryptography Formal methods Security hacker Security services Intrusion detection system Hardware security Network security Information security Application security
Human-centered computing	Interaction design Augmented reality Virtual reality Social computing Ubiquitous computing Visualization Accessibility Human–computer interaction Mobile computing
Concurrency	Concurrent computing Parallel computing Distributed computing Multithreading Multiprocessing
Artificial intelligence	Natural language processing Knowledge representation and reasoning Computer vision Automated planning and scheduling Search methodology Control method Philosophy of artificial intelligence Distributed artificial intelligence
Machine learning	Supervised learning Unsupervised learning Reinforcement learning Multi-task learning Cross-validation
Graphics	Animation Rendering Photograph manipulation Graphics processing unit Image compression Solid modeling
Applied computing	Quantum computing E-commerce Enterprise software Computational mathematics Computational physics Computational chemistry Computational biology Computational social science Computational engineering Differentiable computing Computational healthcare Digital art Electronic publishing Cyberwarfare Electronic voting Video games Word processing Operations research Educational technology Document management
Specialized Platform Development	Thermodynamic computing
Error creating thumbnail: File missing Category File:Global thinking.svg Outline File:Symbol question.svg Glossaries

SWIM Protocol

Contents

Protocol

Properties

Extensions

See also

References

Navigation menu

SWIM Protocol

Protocol

Properties

Extensions

See also

References

Navigation menu

Search