A Fault-tolerant Distributed Library for Embedded Real-time Systems
Typ
Examensarbete för masterexamen
Program
Computer systems and networks (MPCSN), MSc
Publicerad
2020
Författare
Gudmandsen, Johanna
Hashem, Hashem
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
A distributed embedded control system (DECS) may have functionality that is
safety-critical and time-sensitive, meaning if these systems malfunction the consequences
could be devastating. In order to meet these requirements, a system must
fulfill real-time constraints and guarantee correct functionality even in the presence
of faults.
In this thesis we present a software library providing clock synchronization, realtime
scheduling and fault-tolerant decision making. It is intended for use with
DECS communicating via controller area network (CAN). To achieve fault-tolerant
decision making, we propose an early-stopping fault-tolerance algorithm solving up
to t faults in a system of 2t + 1 nodes. We further propose an adaptation of this
algorithm to real-world applications where there may be an interval of correct values
instead of one correct value, as assumed in the base solution.
The result is a lightweight and efficient library. The clock synchronization requires
one message and has a precision comparable to other known solutions, but is not
fault-tolerant. The scheduler runs in O(n2) time and uses a non-preemptive ratemonotonic
policy. It can handle up to 63 user-defined tasks, and has a worst-case
task delay of 2.5 ms for the lowest-priority task in a system with 60 tasks, assuming
a task execution time of 0. The drawback is its inability to handle mixed-criticality
task sets. Our proposed algorithm utilizes the properties inherent in CAN to provide
an efficient way to rectify faults in the value domain. Due to the early-stopping property
of the algorithm, the bus utilization increases linearly with the number of faults.
We conclude that while the library is practical and efficient, fault-tolerant clock
synchronization and fault handling in the time domain are necessary improvements
before the library can be used in production systems.
Beskrivning
Ämne/nyckelord
Byzantine fault tolerance , Real-time scheduling , CAN , Distributed systems , Embedded control systems