|
|
View previous topic :: View next topic |
Author |
Message |
Coffee
Joined: 10 Feb 2015 Posts: 9
|
Failure Mode in Unused Code Space with FILL_ROM and CHECKSUM |
Posted: Thu Sep 24, 2015 7:52 am |
|
|
I'm using a PIC18LF8722 with Version 5.049 compiler.
My requirements are that upon detection of a failure, the system shuts down.
I'm looking at unused code space and want to control the situation if the micro inadvertently vectors into that space.
If it does, I want the micro to execute a "safe" instruction all the way to the end of memory, 0x1FFFFF, then expect the program counter to wrap around to location 0x000000. So, I fill the unused code space with a "safe" instruction. There are two potential "safe" instructions as I see it -- 0x00 (NOP), and 0xFF (ADDLW). The NOP is obvious. The ADDLW takes a little more reasoning. I understand that it will add the next value, 0xFF, to the W (accumulator) register. That's it. I don't see anything wrong with letting the micro execute ADDLW till it wraps back to 0x000000 and starts back up. Adding 0xFF to the accumulator repeatedly won't affect my outputs. At reset, I initialize everything. So, the code is back in control.
Also, at reset, the first thing the code does is run a checksum calculation on the entire code space memory and if it doesn't match the value calculated at compile time and stored in the micro's ID Location, I go into a safe failure mode.
I welcome comments especially if there is a problem with this execution...
Thanks in advance for your comments. _________________ Thank you,
Bill |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19589
|
|
Posted: Thu Sep 24, 2015 8:32 am |
|
|
Problem with both, is that they don't guarantee that the stack is re-balanced. Assuming the code has gone wrong, the stack could be half full, and then the 'main' code could get a stack overflow later. So second problem created by the first....
This is why CCS use the sleep, which assuming you have the watchdog enabled, then results in a watchdog restart.
Given you are on a PIC18, which has a reset instruction (which will clear the stack), why not use this?. 0000 0000 1111 1111. CCS went their way, because this doesn't exist on the older PIC's.
#FILL_ROM 0x00FF |
|
|
Coffee
Joined: 10 Feb 2015 Posts: 9
|
|
Posted: Thu Sep 24, 2015 8:40 am |
|
|
Ttelmah wrote: | Problem with both, is that they don't guarantee that the stack is re-balanced. Assuming the code has gone wrong, the stack could be half full, and then the 'main' code could get a stack overflow later. So second problem created by the first....
This is why CCS use the sleep, which assuming you have the watchdog enabled, then results in a watchdog restart.
Given you are on a PIC18, which has a reset instruction (which will clear the stack), why not use this?. 0000 0000 1111 1111. CCS went their way, because this doesn't exist on the older PIC's.
#FILL_ROM 0x00FF |
Ah, the stack... Excellent point!
Yes, SLEEP would be a much better instruction and let my WDT do its thing. _________________ Thank you,
Bill |
|
|
RF_Developer
Joined: 07 Feb 2011 Posts: 839
|
|
Posted: Thu Sep 24, 2015 8:52 am |
|
|
With these schemes, its important to implement a restart_cause check to determine why the processor restarted and act accordingly. If the PIC does by some near-miracle execute the NOPS or whatever (and leaving them at 0xFF would be my choice) then it will wrap round to the reset vector and the code will start again, but as Ttelmah says, it'll be a warm start from an unknown state - don't forget the peripherals and external hardware may also not be in a sensible state.
So, whenever your code does reset, it pays to find out why. |
|
|
Coffee
Joined: 10 Feb 2015 Posts: 9
|
|
Posted: Thu Sep 24, 2015 1:09 pm |
|
|
RF_Developer wrote: | With these schemes, its important to implement a restart_cause check to determine why the processor restarted and act accordingly. If the PIC does by some near-miracle execute the NOPS or whatever (and leaving them at 0xFF would be my choice) then it will wrap round to the reset vector and the code will start again, but as Ttelmah says, it'll be a warm start from an unknown state - don't forget the peripherals and external hardware may also not be in a sensible state.
So, whenever your code does reset, it pays to find out why. |
Yes, I absolutely agree. A hard restart is a much better resolution.
I started this discussion because I was forced to leave the unused code space at default which I observed to be 0xFF. I can not get the #ID checksum_program directive to work properly when I use the #Fill_Rom directive. There apparently was an issue with this and CCS fixed it in version 5.046. I'm using version 5.049 and it still is not fixed. I have an issue report to CCS support right now.
In the mean time, your responses motivated me to come up with an interim solution. I don't use the #Fill_Rom directive and permit the unused code space to remain at 0xFF. At the end of code space I am inserting the SLEEP command 0x0003 ... see following code:
Code: | #ORG 0x1FFFE
ROM int16 CONST sleep_inst = 0x0003; |
Errant code vectored into unused code space will now fall down to the inserted sleep_inst and fall victim to my Watch Dog Timer.
This is not perfect but it will do until I can get the Fill_Rom directive working. Thanks to your comments, I think I have a working plan... _________________ Thank you,
Bill |
|
|
RF_Developer
Joined: 07 Feb 2011 Posts: 839
|
|
Posted: Fri Sep 25, 2015 1:46 am |
|
|
My point is really that no matter what scheme you adopt: filling with NOPs or other safe code (and you accept that the default state of program memory is "safe") and/or using the internal or an external watchdog; the heavy lifting has to be done at reset. It's all about what you do once your code has regained control. Why did the processor reset: Because the watchdog timed out? Problem. Unexpected soft restart? Problem. Hardware reset? Not a problem - that's a normal start, though be careful about how you implement any external hardware watchdog to make sure the PIC can distinguish a watchdog generated reset from a normal hardware start.
In practise a lot of this sort of effort is wasted and can be very difficult to test. You may be trying to meet some specification that says thou shalt do this. But you have to wonder if the treatment is worse than the disease. I run PICs in high RF environments. Stuff happens. I have had them reset, and no, I don't have a watchdog enabled. I have had analogue inputs being disturbed, and CAN comms getting messed about with. But I've never felt the need to use any special "safety" techniques... I do checksum my code however and never had a a failure. I'd have to assume the PIC was dead in the water if I did: no amount of restarting's going to solve that one. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19589
|
|
Posted: Fri Sep 25, 2015 2:34 am |
|
|
The reason #FILL_ROM is not changing the checksum, is that it is not working at all.....
A few versions ago, it was, but just tried, and it merrily leaves the memory unchanged.
There is also a slight 'thing to remember', that the value it expects is a 16bit instruction word, not a byte. You talk about instructions like '0x00', and '0xFF', but these are only half instructions on a PIC18.
The memory erases to 0xFFFF, which is carefully left as a NOP, so realistically stick with this. For myself, I have never used the #ID checksum ability, instead just using the ability to store the checksum as the last word in ROM using #ROM. This offers choices between CRC, and simple sum checksums, and it is far smaller to code, just to have the whole of memory sum to a fixed value. Much smaller code in your testing, and still likely to find any major error.
So if you have:
Code: |
#ROM 0x1FFFC = {0x0003}
#ROM 0x1FFFE = checksum
|
You get both a checksum in the ROM, and a sleep before this.
This form has the memory simply 16bit sum to 0x1248. |
|
|
Coffee
Joined: 10 Feb 2015 Posts: 9
|
|
Posted: Fri Sep 25, 2015 3:41 pm |
|
|
Point taken RF_Developer. My designs run in high ESD environments and I have similar experiences as you described. But, requirements are requirements and the regulatory folks are more comfortable with the status quo. So we do what we must and focus on deliverable high-quality products. My code does a lot of checking on reset and will only proceed if everything is in order. It also does a lot of I/O monitoring in real time and is equally unforgiving if something goes wrong. Fortunately, in this application if the code sees anything amiss it can simply shut things down.
I don't recall ever seeing a field return with a code checksum error. I have seen the error come up in production. Though that was a while ago.
Ttelmah: Thanks for pointing out about the 16 bit instruction word. Also, thanks for your comment about the Fill_Rom not working again. I'm waiting on CCS to see what they say about it being broke again.
Yeah, I'm Ok with the 0xFFFF memory, then landing in the 0x0003 Sleep instruction.
I like the idea about the smaller code. I'm going to look into stuffing the checksum into the last location and just not using Fill_Rom. Good Tip.
Many thinks to both of you for your insight and comments. _________________ Thank you,
Bill |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|