CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

Reset mystery

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
guy



Joined: 21 Oct 2005
Posts: 297

View user's profile Send private message Visit poster's website

Reset mystery
PostPosted: Sun Jan 13, 2019 1:14 am     Reply with quote

I need your help guys. Somewhat complicated:
Chip: PIC24FJ64GA308, CCS v. 5.078
I have occasional resets in a cellular gateway that I designed - normally it connects every 4 hours to the server via cellular modem, but once in a few days after a successful connection it experiences a few consecutive resets (anywhere between 4-10), then recovers back to normal.
In an attempt to identify the reason for the resets I now send the contents of the RCON & RCON2 registers to the server's log.
The problem is that I don't see any reset bit being set.
Please help me review?
Code:

#include <24FJ64GA308.h>
#word RCON_REG = 0x0740
#word RCON2_REG = 0x0762
...
fprintf(MODEM,
 "AT^SISS=1,\"address\",\"http://*****.com/****.aspx?site=%s&filename=%s&CRC=%Lu&myID=%Lx&retry=%u&rsn=%u&RCON=%Lu,%Lu&V=%3.2w\"\r",SITE,uploadFilename,modemCRC,myID,modemRetryCntr,connRsn,<b>RCON_REG,(RCON2_REG&0x0F)</b>,VER);
// after I print out RCON & RCON2 I clear some of the bits for the next log:
 RCON_REG&=0x3620;   // clear flag bits [b]hopefully correct ones[/b]
 RCON2_REG&=0xFFF0;

What I see in the log after powerup is:
Quote:
site=ISR_PGS&filename=&CRC=0&myID=6&retry=0&rsn=4&RCON=163,10&V=0.11

so RCON=163 decimal means MCLR reset, software WDT enabled, POR+BOR. Good. RCON2 = 10 decimal - VDDBOR, VBPOR. somewhat strange but acceptable? BTW VBATBOR disabled in the config.bits.

Then I clear the bits and the next iteration shows:
RCON=32,0 - only SWDT enabled - good.

Then after a day or two I see the reset phenomena:
Quote:
rsn=2&RCON=32,0
this is from the successful attempt - reason=2=normal timed operation.
after about 20 seconds (strange - almost too quick for reconnection) I see
Quote:
rsn=4&RCON=32,0
rsn=4&RCON=32,0
rsn=4&RCON=32,0

reason=reset, RCON unchanged!!!
any ideas? Thanks to anyone who goes through all this.
Ttelmah



Joined: 11 Mar 2010
Posts: 19551

View user's profile Send private message

PostPosted: Sun Jan 13, 2019 2:10 am     Reply with quote

Obvious question. Why not just read 'restart_cause()'?.
That having been said, we need to see your fuses. The whole 'configuration',
clock, fuses etc..

It is possible to get to address 0, without a reset actually happening. For
instance if code overwrites the stack, you could pop a 'return' to address
zero for example. Similarly, if you get an interrupt enabled, for which there
is not a handler, this will result in a 'call' to an invalid entry in the interrupt
table and again you can arrive at the reset location, without a physical reset.
A 'goto 0', also gives the same effect.
These are not actually 'resets', but (of course) give the same effect....

That Bit3 is set in RCON2 at boot, suggests a rather slow rise time on
the supply.

You probably really need to be thinking of diagnosing 'where' the code
is actually getting to. Declare a simple global variable (so it won't be
initialised at boot), and at points in your code write a 'debug' number
to this (declare a debug macro, that writes a value to the variable). Then
at the entry to every section of the code put a new number into this.
At the start, print this. Then if it contains 123 for example you know that
the failure occurred between wherever this is loaded with 123, and where
it is loaded with 124. Result you can narrow down massively where the
failure is occurring.
I have had an issue with some arithmetic operations leading to completely
inexplicable failures on the PIC24. CCS supplied a new DLL for the compiler,
following a long trail of debugging. You might be hitting the same problem.
guy



Joined: 21 Oct 2005
Posts: 297

View user's profile Send private message Visit poster's website

PostPosted: Sun Jan 13, 2019 2:35 am     Reply with quote

Thank you Ttelmah!
Quote:
Obvious question. Why not just read 'restart_cause()'?.

I'm never quite sure what is the code behind this specific function, per MCU. I prefer to turn to the datasheet and study each register/bit...
Quote:
That having been said, we need to see your fuses. The whole 'configuration',
clock, fuses etc..

While copy-pasting I already see NOWDT + WDT_SW... Oops.
Code:
#fuses XT,PR_PLL,BROWNOUT,NOWDT,ICSP1,SOSC_SEL,WDT_SW,WPCFG,PROTECT,WPFP,NOVBATBOR

Thanks!

Re 'Restart' vs. 'Reset', it would seem like that unless I made a mistake in the logging/RCON addresses etc. I will go over the list you sent and take another look.
Quote:
That Bit3 is set in RCON2 at boot, suggests a rather slow rise time on the supply.

Maybe it's a power supply issue? The modem disconnection creates a spike? Then again why would it happen 4 times in a row (20seconds in between resets) and then disappear?

Ttelmah, before I go into 'where' the bug is, I'd like to find out what causes the reset/restart. It would significantly narrow down the options (lots of code...)

Quote:
I have had an issue with some arithmetic operations leading to completely
inexplicable failures on the PIC24. CCS supplied a new DLL for the compiler,
following a long trail of debugging. You might be hitting the same problem.

This reminds me of a nasty printf() of a floating point I once had, which caused a similar issue. Maybe this is the key? Do you have more info about your case (version number when it was solved?)
THANK YOU!
Ttelmah



Joined: 11 Mar 2010
Posts: 19551

View user's profile Send private message

PostPosted: Sun Jan 13, 2019 3:22 am     Reply with quote

I'd suspect 5.082, is the first with it fixed. Not sure though, when it first
appeared (I only started this code with I think 5.074).
I hit it a few months ago, and they tried a lot of fixes, but I was also having
a memory allocation problem on the same chips. A really basic arithmetic
operation, (involving quite a bit of casting though), made the system 'restart'
exactly the same sum performed split into a couple of operations, worked
fine...
temtronic



Joined: 01 Jul 2010
Posts: 9246
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Sun Jan 13, 2019 6:05 am     Reply with quote

Whenever I hear/read about 'random resets' I think of hardware first. Consider that your product does the same thing every 4 hrs ( 6 * a day) then 'once in awhile' fails. A program generally does the same thing, over and over, so ask what's different when it fails ? Perhaps the PCB doesn't have enough filter caps, maybe the PSU is marginal, could be RF from the cell xcvr ? Possibly a bad solder joint ? The last once drove me nuts for 3 months. One ADC reading 'usually worked'...turned out to be a trim pot wiper hadn't been soldered ! Only a 'push fit' contact so temperature diconnected the 'joint'.
I can't say it IS a hardware problem, but I'd look there first
guy



Joined: 21 Oct 2005
Posts: 297

View user's profile Send private message Visit poster's website

PostPosted: Sun Jan 13, 2019 7:57 am     Reply with quote

temtronic you hit the nail on the head!
Unbelievable: I have a 32.768KHz watch crystal connected to the RTC. The specific board I was testing exhibited excellent clock accuracy UNTIL I DISCOVERED that holding my finger AN INCH from the crystal would already affect the RTC frequency. A little closer and it would exceed the 2sec WDT and cause a reset. I assume this was happening due to random capacitance/interference/wiring...
I still don't know why I didn't see the WDT flag set but I will start from here.
Thank you so much guys!
temtronic



Joined: 01 Jul 2010
Posts: 9246
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Sun Jan 13, 2019 8:19 am     Reply with quote

gee, the old guy(me...) got ONE right !!
You also have to be sure to supress ANY RF that might 'upset' things...
so lots of bypass caps, proper grounding and a high current PSU. Every cell transmitter takes a LOT of current and that can drop the VDD just enough to cause random 'funny' things.
Sounds like you have more than 1 unit, so if only 1 is acting up, looks at wiring, shielding, PSU...anything that's different than the other units.

Jay
Ttelmah



Joined: 11 Mar 2010
Posts: 19551

View user's profile Send private message

PostPosted: Sun Jan 13, 2019 11:44 am     Reply with quote

Interestingly, the errata sheet for the PIC24FJ128GA310 family, says that
"POR and BOR bits will get set after Reset." (in the case of resetting from
deep sleep), so it looks as if these bits may well have an issue.

Nothing there on WDT not being indicated in the RCON/RCON2 registers.

Ah. I know what is happening. Very Happy

This chip has the windowed WDT.
In the event that you have a windowed WDT, and do a restart_wdt
instruction outside the window, this triggers a chip reset.
This is not a watchdog reset, but is flagged as a normal reset.
The data sheet claims it will be a watchdog reset, but I've seen on
other PIC's that this is not the case....

So it is not executing a watchdog reset, but the restart_wdt, is going
outside of the window time, and this is causing a device reset....
guy



Joined: 21 Oct 2005
Posts: 297

View user's profile Send private message Visit poster's website

PostPosted: Mon Jan 14, 2019 2:34 am     Reply with quote

Thanks for looking into this but I see that WINDIS is set to Standard WDT, not Windowed.

I see a special note for the SOSC crystal but I meet the requirements. I guess the driving circuit is weaker on this PIC...? and no High Power option.
Ttelmah



Joined: 11 Mar 2010
Posts: 19551

View user's profile Send private message

PostPosted: Mon Jan 14, 2019 3:22 am     Reply with quote

RTC crystals are normally low power.

Let's try something 'lateral'. Keep your diagnostic code, and disable
just about everything else. Have the code just start, enable the watchdog,
and sit in a delay loop till it triggers. If the diagnostic then gives the
same result, you have an 'interesting' behaviour from the PIC. Watchdog
not being reported!...
Is there and other code being called before your diagnostic?. There is
an issue commonly met with 'restart_cause', that quite a few things do
clear bits in the RCON registers, so this needs to be called right at the
start of the main code. You can't call anything like a setup_wdt function
before this, or you 'lose' the real data. I wonder if this is what is happening
here?.
guy



Joined: 21 Oct 2005
Posts: 297

View user's profile Send private message Visit poster's website

PostPosted: Mon Jan 14, 2019 3:36 am     Reply with quote

Quote:
You can't call anything like a setup_wdt function
before this, or you 'lose' the real data.

Hmmmm. I found:
Code:
setup_wdt(WDT_2S);

and also
Code:
printf(writeRFlog,"MDM RESET ERR=%u RSTCOZ=%Lu\r\n",modemError,restart_cause());

so I guess my debugging code was irrelevant altogether...
Ttelmah



Joined: 11 Mar 2010
Posts: 19551

View user's profile Send private message

PostPosted: Mon Jan 14, 2019 4:23 am     Reply with quote

Eureka!... Very Happy

Yes. The call to restart_cause will clear the bits as your code does.
The setup_wdt, will also change a couple of the bits.
So the diagnostic wasn't giving the actual boot information.... Sad

If using restart_cause, I always just save the value into a temporary variable
as the first instruction in the code. This way the 'boot' value is stored to
avoid this type of problem. Same would apply if wanting to directly access
the registers.

A caveat....
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group