Previously I wrote about safe-software requirements and software architecture. Here in Part 3 of this series I describe MCU-hardware testing. To assure proper operation of safe software, you must periodically test parts of the controller (hence the MCU) or have built-in features to assure safe operation (CRC checking, for example).
For Class B software, the following parts of an MCU must undergo testing:
- CPU registers
- Program counter
- Interrupt handling and execution
- Clock
- Invariable memory
- Variable memory
- Memory addressing
- Internal buses, data, and addressing
- External communication
- Digital and analog I/Os
CPU registers are tested for stuck-at faults, which means bits in registers remain stuck at logical 0 or 1 independent of the value written to this register. Suppose a register has the value 000000002. As a test, a new value is written, for example, 111111112. If one or more bits in register remain in the logic-0 state, the MCU register has a stuck-at fault. You can detect this fault through functional test or by periodic self-test using static-memory test or single-bit protection.
In a similar way you test the MCU's program counter for a stuck-at fault by using logical monitoring of program sequence, or by independent time-slot monitoring.
Interrupt handling and execution is tested for lack of interrupts or too frequent interrupts. This can be done by functional test or time-slot monitoring.
The clock requires testing for an incorrect frequency. In case of a quartz-crystal-based clock, generation of subharmonic or over-harmonic frequencies must be detected. To test the clock, you monitor the frequency or use time-slot monitoring.
For invariable memory (in most cases flash memory) all single-bit faults must be detected. You can accomplish this by word protection with single-bit redundancy, periodic modified checksum, or multiple checksums.
Variable memory should be checked for a DC fault that acts like a short circuit. That is, two memory signals are connected when they should not be. This connection might result in wired-OR or wired-AND logic functions. Detection is provided by periodic static memory test or single-bit redundancy.
Memory addressing is tested for stuck-at faults. This can be done by using a testing pattern, periodic CRC, or word-protection, including address with multi-bit redundancy.
Internal data paths also should be tested for stuck-at faults. If the MCU is equipped with single-bit data protection, that suffices to comply with Class B requirements. Otherwise use a protocol test or testing pattern method. Internal addressing must be tested for wrong addresses. It can be realized by single-bit redundancy or by using a testing pattern that includes addresses.
For external communication, Hamming distance is an essential test. Hamming distance refers to the minimum number of errors that could transform one string into another. For example Hamming distance between 000001112 and 000000002 is 3. Class B software and external communications require a Hamming distance of 3. If hardware protection is not available for external communication, you can run a protocol test to check communications. You also can use hardware protection, such as CRC or multi-bit redundancy.
For digital and analog I/Os, you must test addressing and timing: first by using a protocol test, for example, and second, by using using time-slot monitoring or scheduled transmission tests.
So now you are familiar with which MCU internal circuits require testing to comply with Class B requirements. Class C software adds another set of requirements. If an MCU includes hardware CRC or redundancy it becomes easier to implement safety checks; however, you also could use software routines.