|
|
View previous topic :: View next topic |
Author |
Message |
jeremiah
Joined: 20 Jul 2010 Posts: 1362
|
promotion of literals when compared to a object of type char |
Posted: Tue Jun 21, 2011 7:40 am |
|
|
I was curious as the method the CCS compiler is supposed to be using when comparing a variable to a #define'd literal.
Take for example:
Code: |
/*******************************************************************************
* FILENAME: wf.c
* DESC: This is the definition file for main program. It contains the
* main() entry point and interrupt routinse (as well as some utility
* functions.
* COMPILER: CCS PCWHD Revision 4.121
*
* REVISION:
* 2011/06/21 - JHB - Revision 1.0.0.0
*
*******************************************************************************/
#case
#include <24FJ256GA106.h>
//16 bit pointers
#device *=16
//Fuses for this project
#FUSES NOWDT //No Watch Dog Timer
#FUSES NOJTAG //JTAG disabled
#FUSES NOPROTECT //Code not protected from reading
#FUSES NOWRT //Program memory not write protected
#FUSES NODEBUG //No Debug mode for ICD
#FUSES ICSP1 //ICD uses PGC1/PGD1 pins
#FUSES NOIOL1WAY //Allows multiple reconfigurations of peripheral pins
#FUSES WINDIS //Watch Dog Timer in non-Window mode
#FUSES WDT128 //Watch Dog Timer PreScalar 1:128
#FUSES WPOSTS16 //Watch Dog Timer PostScalar 1:32768
#FUSES NOIESO //Internal External Switch Over mode enabled
#FUSES PR //Pimary oscillaotr enabled
#FUSES NOCKSFSM //Clock Switching is disabled, fail Safe clock monitor is disabled
#FUSES NOOSCIO //OSC2 is clock output
#FUSES HS
#define TEST_CHAR 0xB5
//Pin selects for reprogrammable pins
#pin_select U1TX = PIN_B6 //output serial
#pin_select U1RX = PIN_B7
//#use statements for uart
#use delay(clock=22118400)
#use rs232(UART1,baud=9600,parity=N,bits=8,DISABLE_INTS,ERRORS)
////////////////////////////////////////////////////////////////////////////////
// NAME: main()
// DESC: The entry point into this program. Devices and interrupts are setup.
// IN: NONE
// OUT: NONE
////////////////////////////////////////////////////////////////////////////////
void main(){
char char1 = 0xB5;
unsigned char char2 = 0xB5;
rs232_errors = 0; //removes warning
//Power up and see reason for power up
printf("\r\n ******** START ******** \r\n");
if(TEST_CHAR == char1){
printf("char1 match\r\n");
}else{
printf("char1 no match\r\n");
}
if(TEST_CHAR == char2){
printf("char2 match\r\n");
}else{
printf("char2 no match\r\n");
}
while(TRUE){}
}
|
In compiler revision 4.114, this would print a resultant match for both checks.
I recently upgraded to 4.121 and it changed the behavior of this type of check such that the first one does not show a match but the second one shows a match.
Looking at the LST output, I can see that what it appears to do is promote both the character and the unsigned character to 16 bit values to compare with the #define value. In 4.114, it treated the char as unsigned, but in 4.121 it uses sign extension on the char (I believe I remember a change note about this). This results in a comparison of 0xFFB5 to 0x00B5 (char to #define).
Code: |
.................... if(TEST_CHAR == char1){
002FE: MOV.B 802,W0L
00300: SE W0,W0
00302: MOV #B5,W4
00304: CP W4,W0
00306: BRA NZ,326
|
Code: |
.................... if(TEST_CHAR == char2){
00340: MOV 804,W4
00342: XOR #B5,W4
00344: BRA NZ,364
|
4.114 uses the 2nd method for both.
I am guessing a compiler upgrade changed how chars are approached by default (based on the change notes), but it did surprise me that it promoted everything to 16bit when both operands were supplied as 8bits (well, I don't know how #define's are handled honestly).
I can change the #define to in 4.121 to:
#define TEST_CHAR '\xB5'
and it does the BYTE check against char1, though surprisingly not against char2. I would have expected both to be the same:
Code: |
.................... if(TEST_CHAR == char1){
002FE: MOV 802,W4
00300: XOR.B #B5,W4L
00302: BRA NZ,322
|
Code: |
.................... if(TEST_CHAR == char2){
0033C: MOV 804,W4
0033E: XOR #B5,W4
00340: BRA NZ,360
|
It just seems odd that:
1. It promotes to 16 bit even if the value is only supplied as 8bit, but I guess it could just be easier to assume a default.
2. Even with a character supplied as the #define, it treats one of them as an 8bit comparison and the other as a 16bit comparison.
EDIT: Fixed the type for char2. See following post for updated LST entries.
Last edited by jeremiah on Tue Jun 21, 2011 9:09 am; edited 1 time in total |
|
|
RF_Developer
Joined: 07 Feb 2011 Posts: 839
|
|
Posted: Tue Jun 21, 2011 8:41 am |
|
|
Maybe a pertinent question is why it even compiles?!
char2 has no type, only the type modifier unsigned. That should throw up a syntax error, but doesn't. So, the compiler must be assuming a type, as it does indeed compile. What then is that assumed type? Int, as its the most basic of all the types? No, that doesn't seem likely given what its doing... int16? Who knows? Unsigned yes, hence no sign extension, but what precisely?
The fact is that this code shouldn't actually compile at all. That it generates inconsistent code when it does (and my PCWH 4.107 generated different code from both examples, using subtraction for the conparison, though it did treat both types the same) perhaps shouldn't come as that much of a surprise.
I had issues with byte variables declared via a typedef being handled differently to otherwise similar varaible declared directly. That too involved some sort of confusion between two bytes and one. |
|
|
jeremiah
Joined: 20 Jul 2010 Posts: 1362
|
|
Posted: Tue Jun 21, 2011 9:07 am |
|
|
That should have been "unsigned char", but it compiles because I believe unsigned is a type, it's the unsigned version of the default type for the compiler. That explains why the 2nd got promoted to 16 bits even after setting the #define to '\xB5'. I goofed on the type. However, there is still an oddity.
I fixed it to be unsigned char char2 and made the #define be 0xB5 (the original case) and noticed the following:
Code: |
.................... if(TEST_CHAR == char1){
002FE: MOV.B 802,W0L
00300: SE W0,W0
00302: MOV #B5,W4
00304: CP W4,W0
00306: BRA NZ,326
|
Code: |
.................... if(TEST_CHAR == char2){
00340: MOV 802,W4
00342: LSR W4,#8,W4
00344: XOR.B #B5,W4L
00346: BRA NZ,366
|
So with the #define at 0xB5, the "char" is treated as a 16bit value and the "unsigned char" is treated as a byte.
I then switched the define to '\xB5' and got:
Code: |
.................... if(TEST_CHAR == char1){
002FE: MOV 802,W4
00300: XOR.B #B5,W4L
00302: BRA NZ,322
|
Code: |
.................... if(TEST_CHAR == char2){
0033C: MOV 802,W4
0033E: LSR W4,#8,W4
00340: XOR.B #B5,W4L
00342: BRA NZ,362
|
So with the define at '\xB5' it treats both as bytes.
I wonder why with the define set at 0xB5 it treats the "unsigned char" as a byte but the "char" as a word...odd. |
|
|
RF_Developer
Joined: 07 Feb 2011 Posts: 839
|
Odd indeed |
Posted: Thu Jun 23, 2011 3:21 am |
|
|
Yes, odd indeed... however the CCS compilers are known to do this sort of thing occasionally :-(( As I wrote, I found another related anomily which in that case has since been fixed. Code generation is also dependant on the processor, which is your case is a 24F while I am using 18Fs. Is the size of int 8 or 16 bits in the 24F version of CCS C?
On the technical aspects, which I feel do have some significance here, there is no "default type" in C. "unsigned" is not a type, its a type *modifier* and cannot syntactially be used without a type, such as char or int etc, to modify. I repeat, the compiler should be flagging up the use of unsigned without a type as a syntax error... but doesn't in this case. Yes, it does appear to be assuming a "default" type, and it appears in your case to be something 16 bits, but what that type is is unknown, and we cannot rely on it. It shouldn't be doing this at all. In any case the meaning of char has changed over the evolution of C. K & R defined it as the smallest addressable unit of the machine. That generally meant 8 bits, and that characters were ASCII coded as PDP-11s were to commonest hosts for C. C was developed as part of the migration of Unix to the PDP-11 - 16 bit, byte addressed - from the PDP-7 - an 18 bitter, probably with 6 bit characters.
Its perhaps a somewhat arcane comment, but in terms of characters, what is the meaning and relevance of "signed" and "unsigned"? Its only as an abstracted unit of memory, and one on which arithmetic computations can be performed, does signedness come into play. That abstraction allows code such as the familar int digit = Number_Ch - '0'; to work, but what, exactly, is a 'negative character'? Ints are another matter, and were defined as the most natural memory unit if the machine. CCS C follows the K & R definition and makes int 8 bits... it then makes it *unsigned* which is odd in any sense. Odd as in not K & R nor ANSI. I know I'll offend someone by saying it but ANSI C attempts to force the x86 model on to C. Or at least makes it the default and the expected. Many people think an ANSI int is always, and must be 32 bit signed, but they needn't be and sometimes are not. How would ANSI C map on to the PDP-7, or any 18 or 36 bitter?
All that however is only peripheral to what's going on here. What's going on is that CCS C sometime generates inconsistent code in some circumstances. Yes, it does unfortuately. What we need to work out is how to avoid falling foul of that inconsistent behaviour. I suggest keeping types simple and direct. Such as avoiding using unsigned with char. If you want to deal with 8 bit numbers, then use int8 or signed int8, if you want to deal with characters then use char and forget about signing. Confine arithmetic to the cases it makes sense and be careful when mixing types. Use casts to force type conversions, even when not strictly syntactically and sematically necessary. Debug carefully, single stepping to make sure the generated code is really doing what you expect. I'm pretty sure you already do all this, as that most likely how you found this compiler bug in the first place.
Another possibly relevant technical point to bear in mind is that in C character literals, including '\xB5', are not defined as type char, they are ints. In CCS 18F C this should have no effect as int and char are both 8 bits. Also bear in mind that CCS C is NOT ANSI compliant, nor pretends to be. Nor even is it fully compliant even if we tell it to be, its just rather more so. Anyway, this technicality may have some revelance when it comes to 0xB5, a hex literal, and therefore *untyped* representation of a integer, and '\xB5' which is an int character literal. We might well expect both to fit nicely in a char, as I'm sure you and I do, but its not all that straightforward. Should we expect 0x00B5 to fit in a char? Well yes, we do, but SHOULD we? These are just some if the issues that langauge definers and compiler writer face. I'm reminded of C# where chars are unicode and definitely NOT 8 bits. Indeed explicit conversion is required to deal with ASCII coded byte streams, such as sent and received by our PICs. It came as a surprise to me that conversion was required, an d not assumed, but actually it makes sense when I consider the implications of how C# strings are defined and implemented. So we must be careful about assuming a simple one to one correspondence between 8 bits bytes and chars and between any particular form of int.
Happy coding! |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19605
|
|
Posted: Thu Jun 23, 2011 9:28 am |
|
|
Yes.
There are some comments though. If you read the #type manual. you find the comment:
"The #TYPE directive allows the keywords UNSIGNED and SIGNED to set the default data type."
Depending on how you read this....., it appears that this may also adjust the default 'size' allocated to a signed or unsigned variable.
If you try:
unsigned fred;
The compiler does not complain.
However add the #type declaration:
#type signed
and try the 'unsigned' declaration again, and now the compiler complains.
So it appears first as if CCS, is defaulting to a particular size for 'unsigned', unless you change this yourself....
Now, the comments about data sizes, are why the 'int8, int16' etc., types were launched, so that there _was_ a fixed size associated with a type, and why really these are the much safer forms to use.
A character literal _in single inverted commas_, _is_ defined as a character. Quote from K&R:
"A character constant is a sequence of one or more characters enclosed in single quotes. The value of a character constant with only one character is the numeric value of the character in the machine's character set at execution time. The value of a multi-character constant is implementation defined" Then it goes on to describe the multi-character constants like \n, and the octal and hex escapes.
So by enclosing the hex value in single inverted commas, the statement is effectively being converted to:
#define TEST_CHAR (char)(0xB5)
Now, it would be interesting to try using 'BYTE', rather than 'char'. BYTE, is #defined as an unsiged int8, giving a forced size. Char is defaulted to being the same type as an int, on CCS, but this charges on the PIC24's, where int increases to an int16, and char remains as int8. It is appearing as if in the change to make the signed directive also affect char, is also making it default to int16.....
You could also use a forced type statement, to make the sizes stay the way you want.
This sort of problem, is why it is posted again and again here 'treat new CCS releases as beta releases at best', and why the download page also offers 4.099. Basically unless you 'need' the latest, because support is added for a processor you want to use, or there is a specific bugfix for a fault you know you have, it is much easier to stick with slightly older releases.....
I'd report the change, and see if it is fixed in a few releases time.
Best Wishes |
|
|
jeremiah
Joined: 20 Jul 2010 Posts: 1362
|
|
Posted: Mon Jun 27, 2011 6:51 am |
|
|
@RF_Developer
default int type for mine is 16 bits atm, but characters are treated as 8bits. The promotion to 16 bits wouldn't really surprise me if it did it for both an unsigned char and a signed char types, which have the same size, just different representations (well other than it didn't use to promote and now it does, but that is fine now that I know).
@Ttelmah
Thanks for the response! I really appreaciate it.
Oddly enough, I myself tend to use only int8, int16, etc. One of our other programmers wrote the code that was affected. I just put it into a test program to supply to CCS and here.
I definitely reported the issue to CCS. Mostly concerning the promotion to 16 bits when using a signed char type versus not promoting to 16 bits when using an unsigned char type, which seems inconsistent. I would have figured promotion in both cases or no promotion in both cases since a char is 8 bits whether it is signed or not.
I agree and understand with the idea of only using a stable release. I was more posting here because I wanted to make sure it sounded like an actual problem rather than me misunderstanding how it is supposed to work. You guys/gals have worked with this compiler a lot longer than I have. We actually keep with a stable release on everyone elses copy. I am trying to figure out if the new releases will work for us and also reporting any bugs I may or may not find in the processs. Honestly, 4.121 (haven't tried 4.122 yet) has fixed a lot of bugs we have seen (mostly in how the compiler handles bit manipulation in structures with bit fields).
I kind of wish they kept an active list of known bugs so we could kind of work around that. |
|
|
miketwo
Joined: 04 Aug 2010 Posts: 24
|
|
Posted: Wed Aug 10, 2011 2:01 pm |
|
|
I'm not much of an assembly guy, but I think I've run across the same problem. When I try to compare a char to the hex literal 0xff, it comes back false when it should be true. Ditto for anything with a 1 in the top bit. (for example: 0xfe)
A signed int8 variable will fail the same way. An unsigned int8 works. This is all on a PIC24F, compiler version 4.122.
The workaround is to replace all my chars with unsigned int8's...I guess... but that makes me nervous. We have several thousand lines of code.
Any comments? Suggestions? Can you guys duplicate the problem? Does anyone know if CCS is working on this? Is it an error on my end (to assume the char is unsigned when, it seems, it isn't)?
Here is my test code:
Code: |
//==============================================
// Device Inits
//==============================================
#include <24FJ256GA110.h>
#device PASS_STRINGS=IN_RAM
#fuses NOPROTECT // Code not protected from reading
#fuses NOWDT // No automatic WDT -- it must be enabled by software.
#fuses XT // Primary Clock Select
#fuses FRC_PLL // Internal Fast RC Oscillator with Phase Lock Loop gives 32 MHz
#fuses WPOSTS12 // Watchdog Postscaler.
// At current processor settings: WPOSTS12 = ~10 seconds
// WPOSTS13 = ~20 seconds
// default = ~120 seconds
#fuses IESO // Internal-External Switchover
#fuses IOL1WAY
#fuses WRT
#fuses CKSFSM // Clock fail-safe monitor
#pragma case // Makes all code case-sensitive
#use delay(clock=32MHZ,internal=8M) // Tells compiler what the clock speed is
// Define UART
#pin_select U1TX = PIN_F2
#pin_select U1RX = PIN_F4
#use rs232(baud=115200, UART1, BITS=8, STREAM=COM_A, ERRORS)
//==============================================
// Dependencies
//==============================================
#include "string.h" // Standard string lib
#include "stdlib.h" // Standard lib
//==============================================
// The Test
//==============================================
// Comparison with Hex test
void Test12()
{
char a; // According to manual, should be equivalent to "unsigned int8"
signed int8 b;
unsigned int8 c;
signed d; // <--- why is there no compile error here?
unsigned e; // <--- why is there no compile error here?
// Assign all to 0xff
a=0xff;
b=0xff;
c=0xff;
d=0xff;
e=0xff;
fprintf(COM_A,"\r\n Comparison with Hex Test");
fprintf(COM_A,"\r\n");
fprintf(COM_A,"\r\n Testing %02x==%02x? ",a,0xff);
if(a!=0xff) fprintf(COM_A,"FAIL."); else fprintf(COM_A,"PASS."); // FAILS (char)
fprintf(COM_A,"\r\n Testing %02x==%02x? ",b,0xff);
if(b!=0xff) fprintf(COM_A,"FAIL."); else fprintf(COM_A,"PASS."); // FAILS (signed int8)
fprintf(COM_A,"\r\n Testing %02x==%02x? ",c,0xff);
if(c!=0xff) fprintf(COM_A,"FAIL."); else fprintf(COM_A,"PASS."); // PASSES (unsigned int8)
fprintf(COM_A,"\r\n Testing %02x==%02x? ",d,0xff);
if(d!=0xff) fprintf(COM_A,"FAIL."); else fprintf(COM_A,"PASS."); // PASSES (signed ...?)
fprintf(COM_A,"\r\n Testing %02x==%02x? ",e,0xff);
if(e!=0xff) fprintf(COM_A,"FAIL."); else fprintf(COM_A,"PASS."); // PASSES (unsigned ...?)
}
//==============================================
// MAIN
//==============================================
void main()
{
// Setup Port
setup_uart(115200, COM_A);
// Run test
Test12();
}
|
Thanks for any help guys. |
|
|
FvM
Joined: 27 Aug 2008 Posts: 2337 Location: Germany
|
|
Posted: Thu Aug 11, 2011 3:33 am |
|
|
With recent PCD versions, your code compiles as shown below:
Code: | CCS PCD C Compiler, Version 4.121, 52480 11-Aug-11 10:24
.................... if(a!=0xff) fprintf(COM_A,"FAIL."); else fprintf(COM_A,"PASS."); // FAILS (char)
00336: MOV.B 808,W0L
00338: SE W0,W0
0033A: MOV #FF,W4
0033C: CP W4,W0
0033E: BRA Z,35E
.................... if(b!=0xff) fprintf(COM_A,"FAIL."); else fprintf(COM_A,"PASS."); // FAILS (signed int8)
003CC: MOV.B 809,W0L
003CE: SE W0,W0
003D0: MOV #FF,W4
003D2: CP W4,W0
003D4: BRA Z,3F4
.................... if(c!=0xff) fprintf(COM_A,"FAIL."); else fprintf(COM_A,"PASS."); // PASSES (unsigned int8)
00462: MOV 80A,W4
00464: XOR.B #FF,W4L
00466: BRA Z,486 |
Does this imply a PCD bug? I don't think so. The behaviour is different form the documentation, but basically correct, I think. There are two problems involved.
1. CCS has apperently changed char type definition from unsigned to signed int8, which is not reflected in the help file. According to the C standard, the default char type is implementation dependent. As far as I'm aware of, signed int8 is the mostly used variant. Personally, I'm writing unsigned char or BYTE when I mean unsigned.
2. 0xFF is a positive constant. To get a constant within the range of signed char, write (signed char)0xff or -1.
The bad thing, as in many other cases, is the silent modification of CCS C behaviour, which is hardly acceptable for a professional tool. If you have started your project e.g. with PCD V4.098, you'll notice, that 0xff previously worked for signed int8 compare. In PCD V4.112, char has been still unsigned int8. |
|
|
RF_Developer
Joined: 07 Feb 2011 Posts: 839
|
A footnote |
Posted: Thu Aug 11, 2011 6:57 am |
|
|
Technically in C all integer literals are positive. Decimal, hex, octal and binary are simply different ways of expressing positive literals. Putting '-' in front of any literals adds the unary negation operator rather than chaning the literal itself. So -1 is -(1) and -(0xFF) is -(255) rather than +(1). What C compilers are meant to do (and clear CCS in its various forms isn't doing it...) is to barf when presented with:
signed int8 a = 0xFF;
as 0xFF is 255 and won't fit into a signed int8. The largest hex constant we ought to be able to fit in is 0x7F, after that we have to go negative. -1 in hex would be -0x01. Looks odd, but that's technically correct.
Now then, in comparisons such as:
if (a == 0xFF)...
what should happen is that as 0xFF is outside the range of a (its a signed int8) then both should be promoted to the next size up that they will fit before the comparison is made. That would be signed int16. a would still be -1 but the literal would now be 0x00FF because 0xFF is NOT signed, literals are always positive. The comparison therefore should fail. If chars are signed then this will happen with all literals greater than 0x7F, as indeed someone's posted. If chars are unsigned int8s then no promotion is required on either side.
I've tried this with my 4.123 PCWH and for what its worth all three cases, char, signed int8 an int8 generate the same code.
So CCS is compiling code that it shouldn't (is syntactically incorrect), is generatinging technically incorrect but intuitively "right" code is some versions and technically correct but non-intuitive promotions in others. What each compiler does is unpredictable...
So, everything perfectly normal...
RF_Developer. |
|
|
FvM
Joined: 27 Aug 2008 Posts: 2337 Location: Germany
|
|
Posted: Thu Aug 11, 2011 10:42 am |
|
|
Quote: | So CCS is compiling code that it shouldn't (is syntactically incorrect), is generatinging technically incorrect but intuitively "right" code is some versions and technically correct but non-intuitive promotions in others. What each compiler does is unpredictable...
So, everything perfectly normal... |
Yes. I didn't want to promote the assumption, that everything is O.K. with present CCS C behaviour. I particularly agree to the unpredicatble point. You can also ask, why it's O.K. to assign 0xFF to signed char but can't retrieve it on compare. A good standard provided by other compilers is to give a warning, if a constant exceeds the numeric range of a variable on assignment or compare. |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|