|
|
View previous topic :: View next topic |
Author |
Message |
alevyvs
Joined: 06 Jun 2016 Posts: 6
|
address fault ISR |
Posted: Mon May 21, 2018 9:40 am |
|
|
Hi. I'm using PCD 5.078 with a dsPIC33FJ128GP706A.
I'm trying to figure out why my system is resetting from time to time (can be up to two weeks between resets, but is generally an inconsistent amount of time). Different builds of code don't exhibit the problem at all, or sometimes at a higher likelihood than others. I've been unable to isolate the problem to a small sample program, but I finally have a build that reliably resets within a few seconds.
Per the debugging I've already done, the reset happens when my code causes an address error trap. When this happens, I intentionally wait around until the WDT trips. Here's the ISR, mostly written by my predecessor, which is able to give the offending program counter:
Code: |
uint32_t AddrErrProgCntr;
#locate AddrErrProgCntr=0x2000
#INT_ADDRERR
void isrAddrFail()
{
unsigned int *puwStack = NULL;
#asm
mov W15, puwStack;
#endasm
puwStack -= 0x13u;
uwPcAddrLo = puwStack[0u];
uwPcAddrHi = puwStack[1u] & 0x00FFu;
AddrErrProgCntr = ((uint32_t) uwPcAddrHi << 16u) | uwPcAddrLo;
output_low(IO_HANG_PIN);
// Wait for WDT trip
for(;;) {
int i;
for(i = 0; i < 5; i++) {
output_high(IO_HANG_PIN);
delay_ms(1u);
output_low(IO_HANG_PIN);
delay_ms(1u);
}
delay_ms(3u);
}
}
|
After the WDT trips, I output AddrErrProgCntr to a CAN bus, which tells me the address is 0x107DE. Here's the listing file around that location:
Code: |
.................... case I2C_MSG_FW_HW_VER:
.................... if (index > 0u) {
107BC: MOV 2408,W0
107BE: CLR.B 1
107C0: CP0.B W0L
107C2: BRA Z,108A2
.................... txData[index - 1u].versionInfo.fwMajor = qmsg->data[0];
107C4: MOV 2408,W4
107C6: CLR.B 9
107C8: SUB W4,#1,W5
107CA: MOV #AA,W4
107CC: MUL.UU W5,W4,W0
107CE: MOV W0,W5
107D0: ADD #22,W5
107D2: MOV #A52,W4
107D4: ADD W5,W4,W6
107D6: MOV 240E,W0
107D8: MOV #3,W4
107DA: ADD W4,W0,W0
107DC: MOV.B [W0],[W6]
.................... txData[index - 1u].versionInfo.fwMinor = qmsg->data[1];
107DE: MOV 2408,W4
107E0: CLR.B 9
107E2: SUB W4,#1,W5
107E4: MOV #AA,W4
107E6: MUL.UU W5,W4,W0
107E8: MOV W0,W5
107EA: ADD #22,W5
107EC: ADD W5,#1,W0
107EE: MOV #A52,W4
107F0: ADD W0,W4,W5
107F2: MOV 240E,W0
107F4: MOV #4,W4
107F6: ADD W4,W0,W0
107F8: MOV.B [W0],[W5]
|
It looks to me like a copy of memory address 2408 (per the .sym file, that's the index variable of this function) into W4. How could that possibly cause an address error, when that exact same operation was performed just one C line ago, at 0x107C4?
What's really "interesting" to me is that if I change the following code, I can get completely different behavior out of the offending part of the system. Basically I'm forcing the baud rate variable that I declare to be either 1, 8, or 16 bits, which causes everything to shift around in memory.
Code: |
#define FAULT_FAST_1 0
#define FAIL_TO_BUILD 1
#define WORKS_AT_LEAST_30_MINUTES 2
#define FAULT_FAST_2 3
#define ENUM_IS_16_BIT 4
#define ACTIVE_I2C_CONFIG FAULT_FAST_1
#if (ACTIVE_I2C_CONFIG == FAULT_FAST_1)
typedef enum {
I2C_BAUD_SLOW,
I2C_BAUD_FAST
} I2cBaudRate_t;
#elif (ACTIVE_I2C_CONFIG == FAIL_TO_BUILD)
typedef enum {
I2C_BAUD_SLOW,
I2C_BAUD_FAST,
INVALID_I2C_BAUD
} I2cBaudRate_t;
#elif (ACTIVE_I2C_CONFIG == WORKS_AT_LEAST_30_MINUTES)
typedef int I2cBaudRate_t;
#define I2C_BAUD_SLOW 0
#define I2C_BAUD_FAST 1
#elif (ACTIVE_I2C_CONFIG == FAULT_FAST_2)
typedef enum {
I2C_BAUD_SLOW = 0x100,
I2C_BAUD_FAST = 0x101
} I2cBaudRate_t;
#elif (ACTIVE_I2C_CONFIG == ENUM_IS_16_BIT)
typedef enum {
I2C_BAUD_SLOW,
I2C_BAUD_FAST,
INVALID_I2C_BAUD_0,
INVALID_I2C_BAUD_1,
// cut for space
INVALID_I2C_BAUD_257
} I2cBaudRate_t;
#endif
|
Changing the ACTIVE_I2C_CONFIG above results in the behavior described by the definitions:
FAULT_FAST_1 and _2 each fail within a few seconds, which corresponds with the timing of the I2C messages that this chip is receiving.
FAIL_TO_BUILD results in a "No overload function matches" compilation error. If I use 5.073 instead of 5.078, it compiles just fine. The line is shown at the end of this post with some further discussion.
WORKS_AT_LEAST_30_MINUTES and ENUM_IS_16_BIT both output identical .hex files, and do seem to work.
If I compare the listing files between FAULT_FAST_1 and WORKS_AT_LEAST_30_MINUTES, there are no differences (aside from the program addresses) in the listing file shown earlier. That suggests to me that the address I'm looking at is not really what's causing the trap to occur. Is the ISR above correctly decoding the source of the trap?
With regards to the failing build:
The line which causes the compilation error is the function call in the if statement in the following code. The error is "*** Error 165 "C:\projects\etx8\etx8-mgt-dsp\i2c_msg.c" Line 219(37,40): No overload function matches". There is no overload for that function. However, we also get a warning on that line for all other builds: ">>> Warning 240 "C:\projects\etx8\etx8-mgt-dsp\i2c_msg.c" Line 219(30,34): Pointer types do not match"
So, for the error it complains about the size_t* pointer, but the warning is issued about the uint8_t**. I've been ignoring this warning because I know the pointer types do match (no warnings were generated when using 5.073), but I wonder if there's something more here that I'm missing.
Code: |
// Only showing the prototype:
vsBool_t VsQueueRead(VsQueue_t *q, uint8_t **itemPtr, size_t *itemBufferLen);
// declared at file-scope
VsQueue_t *_msgRXQ = NULL;
// declared local to the function
size_t len;
uint8_t *qptr;
if (VsQueueRead(_msgRXQ, &qptr, &len) == VS_TRUE) {
|
Can anyone offer any help? Thanks in advance. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19538
|
|
Posted: Mon May 21, 2018 10:41 am |
|
|
The address is the return address, so the error is the line before.
107DC: MOV.B [W0],[W6]
Add some debugging and see what 'index' contains. You have it limited to being >0, but is it possibly getting much too large and trying to address beyond the RAM.
On your build, the earlier versions of CCS did not complain if pointer types do not match. So the fact it built with the older versions does not prove the types match. The current version is quite strict on typing so you need to look carefully at how the types actually are declared. However it is only a warning, since the code will automatically cast incorrect types to work. |
|
|
alevyvs
Joined: 06 Jun 2016 Posts: 6
|
|
Posted: Mon May 21, 2018 11:37 am |
|
|
Wow, I can't believe I didn't think of that before. Thanks, I'll take a closer look at that.
And seriously, thanks for taking the time to read through all of that, I know it was a lot. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19538
|
|
Posted: Tue May 22, 2018 3:11 am |
|
|
I actually have to say 'well done' on posting some real information. The value from the return stack, and assembler listings at this point made it possible to help. |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|