When designing robust software, we must determine if each task is running, and also check that the interrupts work properly, especially if an interrupt comes from a "tick" timer.
Detection is made easier if each task is executed in turn and is only executed once through the full sequence of tasks. In most implementations, this is the case, whether you are using an RTOS or a simple "round-robin" scheduler.
I am a little concerned by the comment in Massimo Manca's blog Low-Power Design: MCU Software, in which he says that some RTOSs are moving away from this approach to conserve power. However, I won't consider this possibility at the moment.
Techniques to detect the correct sequence of operation that I have used include assigning a numeric identifier to each task, and each time the task is invoked checking that the calling process has the correct identifier. "Bumping" the identifier to the next task takes place within the current task. Other techniques include checking that the stack level is correct on starting a task, and a return from a single point in all subroutines so the return details can be verified. The last two items are difficult to implement in high-level languages.
I thought more about how to add confidence that tasks execute consecutively. The idea of a Linear Feedback Shift Register (LFSR) popped into my head, and although I have not yet implemented this idea, I share my thoughts so perhaps together we can come up with a new approach. An LFSR is elemental in a CRC (cyclic redundancy check). And it found use in an early microprocessor debugging technique known as "Signature Analysis," in which monitoring a bit stream through a particular node during a fixed period results in a digital "signature." Figure 1 shows the register setup used in a Hewlett Packard HP5004A signature analyzer. I propose we use an LFSR for our thought project, because 16 bits is far more likely to reveal an error than the 256 possibilities for an 8-bit register, despite an extended execution time.
Depending on how we arrange the feedback taps, the sequence can go through each binary combination without repeating. The number of stages in the shift register extends the number of clock pulses it takes for the register to cycle, and the initial setting of the shift register (the seed) coupled with the input data (if there are input data) all affect the result.
I propose five different seed values as part of a lookup table, and for each of them a set of results at fixed clock intervals. There would also be five data words to shift into the LFSR. To start a sequence we load a seed value into the LFSR and load a data word into a public register. Each task reads and rotates the public register one bit and shifts the LSB onto the LFSR. After a full cycle of tasks (or multiple cycles) we compare the value of the LFSR to the expected result. This sequence could extend through several more task cycles. Then we could change the seed and the input data and repeat the process.
Let's extend this a bit further. If we can figure out a fixed timing ratio between the tick interrupt and the cycle of tasks we can get the timer interrupt to execute a rotation cycle for the LFSR as well, thereby confirming its operation and timing. This is somewhat akin to an external windowed watchdog. If everything was fine then you could clock a simple external watchdog. I don't believe that this is superior to some external windowed watchdogs -- you might remember my disillusionment with one particular blog -- because it all boils down to one section of code, and if it executes erroneously, the watchdog would get clocked. Nevertheless, I believe the approach has much merit. What say you?
Sources:
"The Ouroboros of the Digital Consciousness: Linear-Feedback-Shift Registers" by Clive "Max" Maxfield, EDN January 4, 1996.
"Implementing CRCs," by Jack Crenshaw, Embedded Systems Programming, 1993.
"Pseudo-random connections for 2 through 16 stages," Don Lancaster, The TTL Cookbook, 3rd printing 1975.
"AN222-4: Guidelines For Signature Analysis- Understanding the Signature Measurement," Hewlett-Packard.
I recall my googling for watchdogs sometime back and reading some paper. It was mentioning a watchdog methodology called Limp-Home recovery. Is it something like a lift having just enough battery backup to the nearest floor in the event of power failure? Could it be implemented in a small mcu embedded system?
It would be great if anyone in the team could enlighten us further.
Signature analysis is a very helpful technique for debugging hardware. I ahve used it a few times with great success. You can break all the hardware 'loops' (in my case all I needed to do was gate off the Program Counter from 'jumping' and it would just count sequentially thru the microcode. I could put a probe on any input or output of a device and since the signature of a working node was known it was easy to find any errors.
I think extending this to software is a great idea. In particular if the 'calling sequence' can be retained it would hopefully make it much easier to see where the error started. Maybe a runaway pointer in a calling routine trashed some data that the called routine needed. Now you can trace back to see which routine was the one that messed things up.
raimond 2/23/2013 1:07:40 PM User Rank Word wizard
Re: Home Brewn
The "classic" cooperative multitasking is very much the same like the preemptive multitasking, with the single difference that it is ... not preemptive. Every task is responsible to yield itself and let the other tasks run. No task can interrupt other tasks. Only interrupts interrupt tasks. This is exactly like "no rtos" systems, only that the main program is organized as a cooperative multitasking.
Personally I'm not a fan of preemptive rtoses. I think it's just insane to transform every single task in an interrupt. A very hard to control (and program) system where almost every task can interrupt any task. That's why those rtoses need complex services like semaphores, message queues and the like. You need to protect every single shared variable, or need to use a semaphore or queue instead of just a bit or byte variable.
I was forced some time ago to develop an uC/OS-II system. The hardware was not "friendly" either, having lots of multiplexed GPIO pins for many devices on board. It was just a nightmare, the firmware became full of rtos function calls, it was like somebody punished me to use the rtos :D I was almost 90% of the time focused on how to solve rtos integration with the application instead of the application itself.
My conclusion is: the preemptive rtoses are good bussines for their creators, they make the users captive.
Davidmicro 2/23/2013 12:13:38 PM User Rank Program Manager
Re: Home Brewn
->The problem is that we have no check yet that the interrupts are actually running. In almost every project I have worked with the timer interrupt is asyncrconous with the round robin tasks and so using it to clock the CRC register to get predictable results makes it difficcult for me to see how it can be implemented,
Timer Control Register, Timer Flag Register can be monitored to see the status. Secondly, architecture of each MCU describes the timer interrupt process in detail. It sound like the project that you worked uses the customized RTOS.
antedeluvian 2/23/2013 11:18:02 AM User Rank Blogger
Re: Home Brewn
jk
Although you could probably implement this as external hardware, it wasn't my intention. What I am aiming at is to do is to be so sure of the operation of the mircocontroller that a simple watchdog is sufficient.
JK & Curt
It appears that I hadn't made my description clear enough. Lety me try and clarify. Let us assume a simple device that measures an analog input, displays the value on an external LCD and also generates a serial output on a UART. The software can be broken up into several tasks. 1. Measure elapsed time; 2. A/D conversion; 3. display driver; 4. UART driver; 5. Housekeeping. There is of course the interrupts, but let's ignore that for the moment.
In a simple system I employ a kind of home grown round-robin approach where each task is executed in sequence. You could use a timer as part of the interrupts that ensures that each task has an equal amount of time. Another approach, which I believe is called cooperative multitasking, breaks each task into sub tasks, and controll is passed top the next task when the executing task is ready and the time spent in one task is determined by how the programmer has subdiviced it.
What I am proposing would work in either case. If the mirco goes "nuts" execution of the code can jump anywhere and so what we are trying to detect is when the tasks are not executed in sequence. At the starty of every task would be a small software routine that shifted the CRC register. Part of the housekeeping task is to ensure that the CRC is the value that is intended.
The problem is that we have no check yet that the interrupts are actually running. In almost every project I have worked with the timer interrupt is asyncrconous with the round robin tasks and so using it to clock the CRC register to get predictable results makes it difficcult for me to see how it can be implemented, but it seems to be possible since this is what windowed watchdogs do. This is kind of where I was looking for suggestions.
At the first sight, this idea looks like a home brewn external watchdog. However, as always simple things are interesting and can come handy. This idea is novel for me as this watchdog, as I understand, is purely hardware and is independent of what happens inside the mcu.
However, same questions that CC asked linger in my mind too. I am sure we are up for an interesting discussion as we develop this idea.
I think it's an interesting idea and worth exploring. Three questions that spring to mind on a first reading are a) what agent reads the shift register? b) When? and c) what are the underlying assumptions that have to be made about tasks and valid task execution paths?
I'll have to think about this some more tomorrow, and will watch the discussion here with interest.
To save this item to your list of favorite Microcontroller Central content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.