Software-Based Fault-Tolerance

David Holdt

AbstractThis Masters Thesis investigate the possibility of using software-based Error Detection And Correction (EDAC) to protect the storage of space-borne computers from Single Event Upsets (SEU s). Two software-based fault-injectors are implemented, which simulate the effects of SEU s as stochastic processes, and a number of experiments are performed on programs running on prototypes of the onboard computer of a small satellite, DTUSat. The faults are injected in a continuous manner until a failure is detected (i.e. test until destruction ), after which the state of the target is examined. One of the targets programs is the DTUSat boot-software, while the other is an program resembling the application software for the DTUSat. In order to perform a large number of experiments, two harnesses have been implemented, which allow batches of experiments to be carried out automatically. The results of the experiments are analyzed, an a number of metrics are measured (time of first error, runtime before failure, cause of failure etc.). The injectors are designed to be quite flexible, in that the user can control the fault-rates and injection model. Also, a number of experiments are performed with different fault-rates, and it is found that the results obtained are representative over a large interval of injection-rates. In order to carry out the experiments in a resonable amount of time, the injection-rates used are somewhat higher than what can be expected in a low earth orbit (LEO).
KeywordsFault-tolerance, fault-injection, stochastic simulation, seu, leo, edac, hamming code
TypeMaster's thesis [Academic thesis]
Year2004
PublisherInformatics and Mathematical Modelling, Technical University of Denmark, DTU
AddressRichard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby
SeriesIMM-Thesis-2004-22
Note
Electronic version(s)[pdf]
BibTeX data [bibtex]
IMM Group(s)Computer Science & Engineering