Can a cpu wear out?

picprogrammer · Joined: 10 Sep 2003 Posts: 35

Of course you think impossible.

But what happens here. I have got a big network using CAN running for 8 years now and sometimes new modules are added most are unchanged for years.
A few months ago one unit did not receive any CAN data anymore. Bad driver or wiring was suspected. But after some debugging it seems after receiving one CAN message no more messages were received anymore. Replaced the 18F2680 and original old software and OK. I'm running the internal oscillator.
This week I have 2 modules LCD connected to I2C with problems. They collect data from CAN and show it on the LCD, new data every 3 seconds. Again the 18F2680. Both a few days in between did not update the data on the display.
After a power cycle they did work again but after a few hours stopped again.
On the serial debug power I saw a watchdog timeout repeating.
After a lot of debugging it seems it was stopped on the command i2c_write(data) a crash and watchdog comes in. After reboot this repeats itself and only possibility to escape is power cycle. The I2C slave seems not the problem I replaced the I2C LCD and in the reboot situation the I2C data line was high.
A new 18F2680 and both working fine for a some days now.
Do these cpu's wear out?

Ttelmah · Joined: 11 Mar 2010 Posts: 19538

The commonest components to 'wear out' are capacitors. Especially electrolytics. Try looking very carefully at the supplies. You may find that ripple has got worse and this is causing a CPU failure that the 'old' chip just happens to be particularly susceptible to....
Semiconductors do degrade, particularly if run 'hot'. Generally far more probable at normal PIC temperatures though are other failure mechanisms, such as humidity resulting in corrosion on legs, and penetrating along the legs into the package. This can then result in semiconductor failures. Older PIC's with larger die sizes have longer expected lives, than more modern types. However for example, a 'forty year' expected lifetime, if you have a thousand devices implemented, may still have you finding individual devices failing after 'only' ten years. The most likely component to fail in the PIC is the program memory.
Depending on the environment, you may well be suffering from an oxidation failure propagating along the legs into the package. At this timescale you are unlikely to be experiencing an electromigration failure.
Other things that can accelerate failures are if the device is close to something giving small levels of atomic radiation. Even an old 'radium' luminous dial can give a quite significant acceleration in IC failures...

newguy · Joined: 24 Jun 2004 Posts: 1909

Did I read correctly that you have a network connected via CAN and the processors are running from their internal oscillators? While it's true that modern PICs have really quite good internal oscillator accuracy, I still wouldn't feel comfortable using it on a CAN network.

Can part of your issue be explained by clock drift?

Ttelmah · Joined: 11 Mar 2010 Posts: 19538

I hope he is not using the internal oscillator for CAN. Even the best PIC internal oscillator is an order of magnitude outside the required specs for even quite low speed CAN, even more so if the bus frequency is higher... Confused

temtronic · Posted: Mon Jul 30, 2018 7:21 am

yup....
I'm running the internal oscillator.

even I (the old guy) use a real xtal/caps for high speed(9600+) serial.

CAN is real darn 'fussy' about it's clock !!!

Ttelmah · Joined: 11 Mar 2010 Posts: 19538

Yes. Even at 125KHz, the required tolerance is just under 1.25%. Even at 25C, the internal oscillator on this PIC only manages +/-2%. While for a reasonable temperature range it is +/-5%..... Sad

picprogrammer · Joined: 10 Sep 2003 Posts: 35

Yes i'm running 125khz CAN on internal oscillator. >20 different units. This can be problematic i know bit this cannot explain why after exactly 1 message never any will be received.

Ttelmah · Joined: 11 Mar 2010 Posts: 19538

Long term drift in the oscillator....

Remember this is resistor/capacitor based. It derives from an internal gate that is laser trimmed at manufacture. It sounds to me as if this is now drifting off significantly as the chip warms up. So works just for a few minutes after boot, then gets a massive failure rate.

picprogrammer · Joined: 10 Sep 2003 Posts: 35

The internal oscillator may be not a smart idea but if the drift was the problem it would happen no messages were received or sometimes a few.
This cpu stop after exactly 1 message. The I2C problem with locking cpu on the i2c_write has nothing to do with a drifting oscillator. Even a watchdog reboot does not solve this. If the slave device is the problem the i2c_write returns no ack.

Ttelmah · Joined: 11 Mar 2010 Posts: 19538

An I2C device that holds SDA low, can hang the master. This has always been an issue with I2C, which is why sensible coder's check the SDA line is high before starting an I2C transaction. The recovery for this is to pulse SCL repeatedly, which is meant to trigger a recovery of the slave.

On receiving one message, the question is how long the client takes to reach it's error limit?. It is possible that this is being reached after one message.

picprogrammer · Joined: 10 Sep 2003 Posts: 35

As in my first message, the SDA is high. Otherwise it would not pass the i2c_start. Checked with oscilloscope. After replacing cpu on both units no more problems.

temtronic · Posted: Tue Jul 31, 2018 6:30 am

comment.
since replacing the PIC 'cured' the problem....

I'd check the PCB for a bad solder connection. Look under a good light with mag lens.
I got 'bit' with a similar problem and had missed soldering ONE connection. The pin was making physical connection until the temp/hunid changed. Under the right conditions the PCB shrunk just enough to open the circuit. It had worked for months...sigh

I'v got 30 year old PICs still running today so they don't 'wear out'....

Jay