View previous topic :: View next topic |
Author |
Message |
julienm
Joined: 28 Jul 2020 Posts: 31
|
PIC18F6722 stuck in i2c call |
Posted: Tue Feb 23, 2021 3:35 am |
|
|
Hi everyone,
I have a couple boards (out of >300) that stopped booting properly after being shipped and used daily.
Power cycling them does nothing but (re)flashing the same firmware that they already have will fix them.
The firmware being read protected, I have no way to check if it was altered.
The microchip AN1310 bootloader should be present on those boards, but I'm not able to enter in it.
I was not able to replicate the issue on my side, but I received 2 of them that are "stuck" and in both case it seems the firmware hangs just before the first write on the i2c bus.
Monitoring SDA/SCL, they both jump to 5V on boot and stay there.
My guess is that it's stuck somewhere in i2c_start(), otherwise I should see written bytes on SDA.
Code: | #use i2c(master, sda = PIN_C4, scl = PIN_C3, FORCE_SW) |
I'm not sure what to do now, I can fix them by reflashing the fw, but I'd like to understand and prevent that problem to show up.
The 18F6722 errata mentions a bug with clock stretching, but looking at the ASM code generated by i2c_start, I can't see any loop I could be stuck in.
What else could I do to diagnose those 2 µC? |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9238 Location: Greensville,Ontario
|
|
Posted: Tue Feb 23, 2021 6:22 am |
|
|
I don't use that PIC or 'bootloader' but have some general comments...
What environment were these 2 boards ? Any chance an EMI event occoured?
Same or similar location inside plant ? Same power supply feed ?
ANYTHING 'common' with the two units ?
Bootloader. No access ? Seems 'strange' to me. Do the other 598 work the same way ?? I 'think' you'd press a buttoon and an option to 'download' would appear on a screen ? Obviously all bootloaders are not the same...
What's the I2C peripheral that the PIC is controlling ? If attached to the outside World, any chance it has been damaged by EMI ?
Someone ( PCMP or Mr. T ? ) posted a 'reset I2C bus' code fragment,might be worth installing, especially sine you're bitbanging the I2C. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19535
|
|
Posted: Tue Feb 23, 2021 6:35 am |
|
|
I'd be looking very carefully at the connections to the MCLR pin, and
also check whether the faulty units possibly have a different chip revision.
It sounds to me as if the program flash memory has been corrupted. A
voltage spike on MCLR above Vdd can trigger this.
Is there anything that actually tells you if any code is running at all?.
Otherwise, the I2C pins would be set as inputs on boot, so ths should
always go high even if no code is running at all. |
|
|
julienm
Joined: 28 Jul 2020 Posts: 31
|
|
Posted: Tue Feb 23, 2021 10:15 am |
|
|
Thanks to both of you
temtronic wrote: |
What environment were these 2 boards ? Any chance an EMI event occoured?
ANYTHING 'common' with the two units ?
|
Nothing really common as far as I can tell, but I don't have much information. EMI is an option indeed, but 2 cards stuck at the same spot (see below)
temtronic wrote: |
What's the I2C peripheral that the PIC is controlling ? If attached to the outside World, any chance it has been damaged by EMI ?
|
I have 3: an eeprom 24LC1025, a temperature sensor SA56004 and a current/charge monitor LTC2946.
They could be damaged, but then why would it work again if I flash the pic?
Ttelmah wrote: | I'd be looking very carefully at the connections to the MCLR pin, and also check whether the faulty units possibly have a different chip revision.
It sounds to me as if the program flash memory has been corrupted. A voltage spike on MCLR above Vdd can trigger this. |
We checked the connections and pullups. However if the units were faulty, why would it fixes itselfs when flashing the firmware with the ICD device?
If the flash was corrupted with EMI, what are the odds that both boards are stuck at the exact same spot?
Ttelmah wrote: |
Is there anything that actually tells you if any code is running at all?.
Otherwise, the I2C pins would be set as inputs on boot, so ths should always go high even if no code is running at all. |
They are set as input by CCS macros and also by our code.
I know that a large portion of the boot sequence is running, because I'm moving some motors, steppers, etc. I can trace that and confirm everything is executed until we reach the i2c code. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19535
|
|
Posted: Tue Feb 23, 2021 10:47 am |
|
|
If there wasn't an issue with the code in the chip, then reflashing
should make no difference.....
Conversely if the I2C code had a problem, then it'd be showing in the
other units.
It suggests perhaps a specific cell in the flash is losing it's contents,
which is why my question about the chip version. I'd be suspicious that
maybe both are from the same (faulty) batch. |
|
|
julienm
Joined: 28 Jul 2020 Posts: 31
|
|
Posted: Tue Feb 23, 2021 10:55 am |
|
|
Ok understood. I will check batch numbers.
Could something weird occur with registers (set by the i2c components?) which would persist on power cycle but would be cleared when flashing the firmware? |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19535
|
|
Posted: Wed Feb 24, 2021 12:41 am |
|
|
julienm wrote: | Ok understood. I will check batch numbers.
Could something weird occur with registers (set by the i2c components?) which would persist on power cycle but would be cleared when flashing the firmware? |
No, and yes.
All 'registers' are lost when you power cycle.
However if the power does not go off for long enough for the gates to
discharge, if a FET was actually in a locked state, this might not clear.
A FET can become locked on, if it is reverse biased, which then comes
back to EMI.
It is possibly interesting that the person having potential program loss
in the thread PCM points to, was also using PROTECT. |
|
|
julienm
Joined: 28 Jul 2020 Posts: 31
|
|
Posted: Wed Feb 24, 2021 1:45 pm |
|
|
Very interesting! These guys have the same random issues, which can only be fixed by a firmware reflash.
So from what I understand, no brownout handling (my case) + bootloader (idem) could result in the pic executing arbitrary code when voltage drops.
This code could potentially be the bootloader code in charge of writing/erasing the flash.
Is that correct?
Now my question is: since the very first line of code is the GOTO in charge of jumping to main() address, how is it possible that my corrupted board still execute main() partially? |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19535
|
|
Posted: Thu Feb 25, 2021 3:04 am |
|
|
I think the phrase "answers on a postcard to", probably applies...
However my guess would be that the 'arbitrary' address, depends on
what value the program counter just happens to go to on these particular
chips. So will depend on the actual way that the memory cells lose their
contents as the voltage drops. So the same chip may always go to the
same address, but what that address actually is is completely unpredictable. |
|
|
julienm
Joined: 28 Jul 2020 Posts: 31
|
|
Posted: Thu Feb 25, 2021 10:45 am |
|
|
Ttelmah wrote: | I think the phrase "answers on a postcard to", probably applies... |
I googled for that idiom but I'm still unsure what you meant
Do you mean that rogue execution during brownout is so random that it is possible the erased sector is also random, not just the first one? |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19535
|
|
Posted: Thu Feb 25, 2021 1:15 pm |
|
|
Yes.
"Answers on a postcard", was a phrase used in a lot of TV shows, when
they were asking for answers that could be almost innumerable, and almost
any one could be right. |
|
|
asmallri
Joined: 12 Aug 2004 Posts: 1635 Location: Perth, Australia
|
|
Posted: Thu Feb 25, 2021 10:47 pm |
|
|
PCM Programmers link to the old thread pretty well covers it. Yes there is a VERY REMOTE chance there is a bug in the silicon causing this issue but seeing as your symptoms match up with the standard issues with bootloaders for this class of PIC, the villain is almost certainly either the brown out setting or the absolutely daft LVP enable config setting. If the programming pins are left floating and LVP is enabled, chaos rules. _________________ Regards, Andrew
http://www.brushelectronics.com/software
Home of Ethernet, SD card and Encrypted Serial Bootloaders for PICs!! |
|
|
julienm
Joined: 28 Jul 2020 Posts: 31
|
|
Posted: Fri Feb 26, 2021 9:16 am |
|
|
LVP is disabled.
Why do think this is specific to this class of PIC?
I've been trying to generate brownout issues by lowering the voltage on the 230V/12V supplier or directly on the 12/5V converter but it's pretty robust.
Is there a way I can reliably enter the "rogue" mode so that I can test my pic with and without the BROWNOUT fuse? |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9238 Location: Greensville,Ontario
|
|
Posted: Fri Feb 26, 2021 9:53 am |
|
|
Remove power....Vdd drops.....see what happens ?
If the main PSU filter cap is big enough, it'll give you time to see on a scope what happens ?? |
|
|
|