CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

5.092: how strings are stored in flash (const, etc.)

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
allenhuffman



Joined: 17 Jun 2019
Posts: 562
Location: Des Moines, Iowa, USA

View user's profile Send private message Visit poster's website

5.092: how strings are stored in flash (const, etc.)
PostPosted: Thu Feb 20, 2020 9:45 am     Reply with quote

Yesterday we reported a bug to CCS where sprintf() was mixing up strings and using the wrong one. Moving the location of strings around made it either work or fail. Basically:

Code:

char someString[] = "Hello world";

char buffer[80]; // some buffer
int value = 42;

sprintf (buffer, "Value: %d", value);


This works, but when other strings were added, the sprintf() was actually using the "Hello " part of the other string rather than the "Value: " part of the sprintf, producing output of:

"Hello 42"

I did a quick check today to see if there was some difference in doing "char *string" versus "char string[]" and they all seem to do the same thing in CCS and GCC, but I noticed this:

Code:
const char *string1 = "This is a test";

const char string2[] = "This is a test";

char *string3 = "This is a test";

char string4[] = "This is a test";


When I build that, I get:

Code:
>>> Warning 202 "...snip...\main.c" Line 3(13,20): Variable never used:   string1
>>> Warning 202 "...snip...\main.c" Line 7(7,14): Variable never used:   string3
>>> Warning 202 "...snip...\main.c" Line 9(6,13): Variable never used:   string4


string2 is also not being used, but there is no warning.

When I added code to use each string:

Code:
   // Prevent strings from being optimized out.
   printf ("%s", string1);
   printf ("%s", string2);
   printf ("%s", string3);
   printf ("%s", string4);


...same issue, same code.

Just a heads up in case anyone else has seen weirdness. We normally only use printf for debugging, but this project has an LCD and we sprintf formatted strings into buffers so they can then be turned into graphical fonts and send to the screen. We noticed wrong messages appearing and ended up here.


SIDE NOTE: the way a string is generated is different than what I am used to. Rather than storing the series of bytes in code space and using a loop routine to copy them into RAM, it generates a bunch of MOV commands to move them 16-bits at a time.

const char *string1 = "11111111111";

...turns into this (#31 is ascii for "1"):

Code:
00268:  MOV     #3131,W4
0026A:  MOV     W4,800
0026C:  MOV     #3131,W4
0026E:  MOV     W4,802
00270:  MOV     #3131,W4
00272:  MOV     W4,804
00274:  MOV     #3131,W4
00276:  MOV     W4,806
00278:  MOV     #3131,W4
0027A:  MOV     W4,808
0027C:  MOV     #31,W4
0027E:  MOV     W4,80A


Neat. This explains why adding 100 bytes of string grows the program much more than 100 bytes. :-)
_________________
Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?


Last edited by allenhuffman on Fri Feb 21, 2020 2:03 pm; edited 1 time in total
Ttelmah



Joined: 11 Mar 2010
Posts: 19544

View user's profile Send private message

PostPosted: Thu Feb 20, 2020 10:09 am     Reply with quote

You have to remember that on the PIC24, there are only three bytes per
word. Somehow you have to either handle the maths to pack stuff
this way, or only store 2 bytes per word. Using the MOV instruction is
actually quite an efficient (and very quick) way of doing this. However it
does mean you use a word of storage for every 2 bytes.

Must admit puzzled by your storage oddity. I use hundreds of strings and
have never seen even one byte of problem like you are having. Suggests
something odd is happening with the particular chip.
allenhuffman



Joined: 17 Jun 2019
Posts: 562
Location: Des Moines, Iowa, USA

View user's profile Send private message Visit poster's website

PostPosted: Thu Feb 20, 2020 10:35 am     Reply with quote

PIC24 in this case...

Update... I think I read a note in the manual about using const and a size to place data in ROM, like a version string, so that is probably intentional.

But I ran a few tests, and got different results. For instance, with long strings, I get them as DATA statements! Well, except for one, which I'm not sure where it's going -- I don't find it in the source.

Code:

const char *string1    = "1111111111111111111111111111111111111111111111111111111111111111111111111111111"; // 79


I can't find this anywhere as data or MOV statements, so I think the compiler is being clever and generating a series of 1's somewhere in a loop. It runs and displays the string.

Code:

const char string2[]   = "2222222222222222222222222222222222222222222222222222222222222222222222222222222"; // 79

0021C:  DATA    32,32,32
0021E:  DATA    32,32,32
00220:  DATA    32,32,32
00222:  DATA    32,32,32
00224:  DATA    32,32,32
00226:  DATA    32,32,32
00228:  DATA    32,32,32
0022A:  DATA    32,32,32
0022C:  DATA    32,32,32
0022E:  DATA    32,32,32
00230:  DATA    32,32,32
00232:  DATA    32,32,32
00234:  DATA    32,32,32
00236:  DATA    32,32,32
00238:  DATA    32,32,32
0023A:  DATA    32,32,32
0023C:  DATA    32,32,32
0023E:  DATA    32,32,32
00240:  DATA    32,32,32
00242:  DATA    32,32,32
00244:  DATA    32,32,32
00246:  DATA    32,32,32
00248:  DATA    32,32,32
0024A:  DATA    32,32,32
0024C:  DATA    32,32,32
0024E:  DATA    32,32,00
00250:  DATA    32,32,00


Earlier, this generated code. Maybe it does for smaller strings, and after a set size it changes to data. To be tested.

Code:

const char string3[80] = "3333333333333333333333333333333333333333333333333333333333333333333333333333333"; // 79

0026E:  DATA    33,33,33
00270:  DATA    33,33,33
00272:  DATA    33,33,33
00274:  DATA    33,33,33
00276:  DATA    33,33,33
00278:  DATA    33,33,33
0027A:  DATA    33,33,33
0027C:  DATA    33,33,33
0027E:  DATA    33,33,33
00280:  DATA    33,33,33
00282:  DATA    33,33,33
00284:  DATA    33,33,33
00286:  DATA    33,33,33
00288:  DATA    33,33,33
0028A:  DATA    33,33,33
0028C:  DATA    33,33,33
0028E:  DATA    33,33,33
00290:  DATA    33,33,33
00292:  DATA    33,33,33
00294:  DATA    33,33,33
00296:  DATA    33,33,33
00298:  DATA    33,33,33
0029A:  DATA    33,33,33
0029C:  DATA    33,33,33
0029E:  DATA    33,33,33
002A0:  DATA    33,33,00
002A2:  DATA    33,33,00


Again, this looks good as data. So I shorten the strings to 11 characters and it still makes DATA statements!

Odd. The behavior changed right in front of me Smile

Weird, huh?
_________________
Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?
allenhuffman



Joined: 17 Jun 2019
Posts: 562
Location: Des Moines, Iowa, USA

View user's profile Send private message Visit poster's website

PostPosted: Thu Feb 20, 2020 10:40 am     Reply with quote

I give up Smile I just did 11 byte examples, but having them a few lines higher in the file (global, outside of main) makes it suddenly create it with MOVs instead of a loop or whatever it was doing:

Code:
const char *string1     = "11111111111";

00280:  MOV     W4,800
00282:  MOV     #3131,W4
00284:  MOV     W4,802
00286:  MOV     #3131,W4
00288:  MOV     W4,804
0028A:  MOV     #3131,W4
0028C:  MOV     W4,806
0028E:  MOV     #3131,W4
00290:  MOV     W4,808
00292:  MOV     #31,W4
00294:  MOV     W4,80A

const char string2[]    = "22222222222";

0020C:  DATA    32,32,00
0020E:  DATA    32,32,00
00210:  DATA    32,32,00
00212:  DATA    32,32,00
00214:  DATA    32,32,00
00216:  DATA    32,00,00

const char string3[12]  = "33333333333";

00224:  DATA    33,33,00
00226:  DATA    33,33,00
00228:  DATA    33,33,00
0022A:  DATA    33,33,00
0022C:  DATA    33,33,00
0022E:  DATA    33,00,00


Yeah, just something weird going on. I guess I don't quite understand the rules for generating MOV versus DATA since I've seen it do it both ways with short strings.

My coworker "solved" his bug by just moving some things around, so that's probably what's going on here.
_________________
Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?
allenhuffman



Joined: 17 Jun 2019
Posts: 562
Location: Des Moines, Iowa, USA

View user's profile Send private message Visit poster's website

PostPosted: Thu Feb 20, 2020 10:46 am     Reply with quote

Ah, 78 characters is the magic value...

78:
Code:

const char *string1     =
"11111111111111111111111111111111111111111111111111111111111111111111111111111";

0027A:  CLR     84E
0027C:  SETM    32C
0027E:  MOV     #3131,W4
00280:  MOV     W4,800
00282:  MOV     #3131,W4
00284:  MOV     W4,802
00286:  MOV     #3131,W4
00288:  MOV     W4,804
0028A:  MOV     #3131,W4
0028C:  MOV     W4,806
0028E:  MOV     #3131,W4
00290:  MOV     W4,808
00292:  MOV     #3131,W4
00294:  MOV     W4,80A
00296:  MOV     #3131,W4
00298:  MOV     W4,80C
0029A:  MOV     #3131,W4
0029C:  MOV     W4,80E
0029E:  MOV     #3131,W4
002A0:  MOV     W4,810
002A2:  MOV     #3131,W4
002A4:  MOV     W4,812
002A6:  MOV     #3131,W4
002A8:  MOV     W4,814
002AA:  MOV     #3131,W4
002AC:  MOV     W4,816
002AE:  MOV     #3131,W4
002B0:  MOV     W4,818
002B2:  MOV     #3131,W4
002B4:  MOV     W4,81A
002B6:  MOV     #3131,W4
002B8:  MOV     W4,81C
002BA:  MOV     #3131,W4
002BC:  MOV     W4,81E
002BE:  MOV     #3131,W4
002C0:  MOV     W4,820
002C2:  MOV     #3131,W4
002C4:  MOV     W4,822
002C6:  MOV     #3131,W4
002C8:  MOV     W4,824
002CA:  MOV     #3131,W4
002CC:  MOV     W4,826
002CE:  MOV     #3131,W4
002D0:  MOV     W4,828
002D2:  MOV     #3131,W4
002D4:  MOV     W4,82A
002D6:  MOV     #3131,W4
002D8:  MOV     W4,82C
002DA:  MOV     #3131,W4
002DC:  MOV     W4,82E
002DE:  MOV     #3131,W4
002E0:  MOV     W4,830
002E2:  MOV     #3131,W4
002E4:  MOV     W4,832
002E6:  MOV     #3131,W4
002E8:  MOV     W4,834
002EA:  MOV     #3131,W4
002EC:  MOV     W4,836
002EE:  MOV     #3131,W4
002F0:  MOV     W4,838
002F2:  MOV     #3131,W4
002F4:  MOV     W4,83A
002F6:  MOV     #3131,W4
002F8:  MOV     W4,83C
002FA:  MOV     #3131,W4
002FC:  MOV     W4,83E
002FE:  MOV     #3131,W4
00300:  MOV     W4,840
00302:  MOV     #3131,W4
00304:  MOV     W4,842
00306:  MOV     #3131,W4
00308:  MOV     W4,844
0030A:  MOV     #3131,W4
0030C:  MOV     W4,846
0030E:  MOV     #3131,W4
00310:  MOV     W4,848
00312:  MOV     #3131,W4
00314:  MOV     W4,84A
00316:  MOV     #31,W4
00318:  MOV     W4,84C


79: (adding '*' -- 42 -- to the end)
Code:
const char *string1     =
"11111111111111111111111111111111111111111111111111111111111111111111111111111*";

00280:  DATA    C0,4D,08
00282:  DATA    00,31,00 <<<< 31 == '1'
00284:  DATA    02,2A,00 <<<< 0x2a == 42 == '*'
00286:  DATA    00,00,00


At 78 characters, it generates different code, and it's smart code at that.

One other compiler I worked with (Renesas or IAR) would error out if you tried to make a string longer than 80 bytes, so... yeah, okay.
_________________
Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?
allenhuffman



Joined: 17 Jun 2019
Posts: 562
Location: Des Moines, Iowa, USA

View user's profile Send private message Visit poster's website

PostPosted: Thu Feb 20, 2020 10:54 am     Reply with quote

Okay, one more... No variables at all. It's smart for long repeating strings. Very impressive!

BUT, check what it did for a short string ("333...") at the end:

Code:
....................    printf ("1111111111111111111111111111111111111111111111111111111111111111111111111111111");
0027C:  MOV.B   #4F,W5L
0027E:  MOV     #31,W0
00280:  MOV.B   W0L,804
00282:  CALL    23E
*
00286:  DEC.B   000A
00288:  BTSS.B  42.1
0028A:  BRA     27E
....................    printf ("*******************************************************************************");
0028C:  MOV.B   #4F,W5L
0028E:  MOV     #2A,W0
00290:  MOV.B   W0L,804
00292:  CALL    23E
*
00296:  DEC.B   000A
00298:  BTSS.B  42.1
0029A:  BRA     28E
....................    printf ("22222222222222222222222222222222222222222222222222222222222222222222222222222");
0029C:  MOV.B   #4D,W5L
0029E:  MOV     #32,W0
002A0:  MOV.B   W0L,804
002A2:  CALL    23E
*
002A6:  DEC.B   000A
002A8:  BTSS.B  42.1
002AA:  BRA     29E
....................    printf ("33333333333333333333333333333333");
002AC:  MOV     #0,W1
002AE:  MOV     W1,W0
002B0:  CLR.B   1
002B2:  CALL    200


Very nice!

But, for the shorter string of 3s, it still generated a data table:

Code:
0020C:  DATA    33,33,00
0020E:  DATA    33,33,00
00210:  DATA    33,33,00
00212:  DATA    33,33,00
00214:  DATA    33,33,00
00216:  DATA    33,33,00
00218:  DATA    33,33,00
0021A:  DATA    33,33,00
0021C:  DATA    33,33,00
0021E:  DATA    33,33,00
00220:  DATA    33,33,00
00222:  DATA    33,33,00
00224:  DATA    33,33,00
00226:  DATA    33,33,00
00228:  DATA    33,33,00
0022A:  DATA    33,33,00
0022C:  DATA    00,00,00


So if you print long repeating strings, you can save room by making them longer ;-)
_________________
Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?
allenhuffman



Joined: 17 Jun 2019
Posts: 562
Location: Des Moines, Iowa, USA

View user's profile Send private message Visit poster's website

PostPosted: Thu Feb 20, 2020 10:58 am     Reply with quote

Long non-repeating strings and short non-repeating strings both ended up as DATA at the top:

Code:
....................    printf ("12345678901234567890123456789012345678901234567890123456789012345678901234567890");
002C0:  MOV     #0,W1
002C2:  MOV     W1,W0
002C4:  CLR.B   1
002C6:  CALL    200
*
002CA:  INC     W1,W1
002CC:  MOV     W1,[W15++]
002CE:  MOV.B   W0L,802
002D0:  CALL    284
*
002D4:  MOV     [--W15],W1
002D6:  MOV     #4F,W0
002D8:  CPSGT   W1,W0
002DA:  BRA     2C2
....................    printf ("12345678901234567890");
002DC:  MOV     #0,W1
002DE:  MOV     W1,W0
002E0:  CLR.B   1
002E2:  CALL    252
*
002E6:  INC     W1,W1
002E8:  MOV     W1,[W15++]
002EA:  MOV.B   W0L,802
002EC:  CALL    284
*
002F0:  MOV     [--W15],W1
002F2:  MOV     #13,W0
002F4:  CPSGT   W1,W0
002F6:  BRA     2DE
....................


... that code is pulling out the data from the top:

Code:
0021C:  DATA    31,32,35
0021E:  DATA    33,34,36
00220:  DATA    35,36,37
00222:  DATA    37,38,38
00224:  DATA    39,30,39
00226:  DATA    31,32,30
00228:  DATA    33,34,31
0022A:  DATA    35,36,32
0022C:  DATA    37,38,33
0022E:  DATA    39,30,34
00230:  DATA    31,32,35
00232:  DATA    33,34,36
00234:  DATA    35,36,37
00236:  DATA    37,38,38
00238:  DATA    39,30,39
0023A:  DATA    31,32,30
0023C:  DATA    33,34,31
0023E:  DATA    35,36,32
00240:  DATA    37,38,33
00242:  DATA    39,30,34
00244:  DATA    31,32,35
00246:  DATA    33,34,36
00248:  DATA    35,36,37
0024A:  DATA    37,38,38
0024C:  DATA    39,30,39
0024E:  DATA    31,32,30
00250:  DATA    33,34,00
00252:  CLR     32
00254:  MOV     #25E,W3
00256:  ADD     W3,W0,W0
00258:  TBLRDL.B[W0],W0L
0025A:  CLR.B   1
0025C:  RETURN 
0025E:  DATA    31,32,00
00260:  DATA    33,34,00
00262:  DATA    35,36,00
00264:  DATA    37,38,00
00266:  DATA    39,30,00
00268:  DATA    31,32,00
0026A:  DATA    33,34,00
0026C:  DATA    35,36,00
0026E:  DATA    37,38,00
00270:  DATA    39,30,00
00272:  DATA    00,00,00
....................


Well, that's at least a better understanding of the rules.

I'm just not clear on what triggers MOV versus DATA. It's not length (though length changes the "compression" code being used).
_________________
Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?
Ttelmah



Joined: 11 Mar 2010
Posts: 19544

View user's profile Send private message

PostPosted: Thu Feb 20, 2020 12:08 pm     Reply with quote

Remember with the data statements, there are several lines
of code to actually handle the extraction from these. So the MOV will
result in smaller code for short strings. Sounds as if it has a switch
based on length, and something about the original placement is
confusing it!...
The extraction code is this for one of the strings you show:
Code:

00252:  CLR     32
00254:  MOV     #25E,W3
00256:  ADD     W3,W0,W0
00258:  TBLRDL.B[W0],W0L
0025A:  CLR.B   1
0025C:  RETURN 

So six instruction words. So it'd become more efficient to use the DATA
statement above about 30 characters. The MOV version uses chars/2
instruction words the DATA version 6+chars/3.
allenhuffman



Joined: 17 Jun 2019
Posts: 562
Location: Des Moines, Iowa, USA

View user's profile Send private message Visit poster's website

PostPosted: Fri Feb 21, 2020 10:25 am     Reply with quote

Ttelmah wrote:
Remember with the data statements, there are several lines
of code to actually handle the extraction from these. So the MOV will
result in smaller code for short strings.


Ah, yes. So there's probably a break-even where a bunch of MOVs versus code-and-DATA makes more sense. I guess 78 bytes is it, though I haven't counted the instructions to see.

And, since code generation (accessing data) changes based on data size (using different instructions to access data further away), I bet that could be tied to the issue where code generation changes based on moving things around.

Heck, it's reminding me of 6809 assembly where I'd always have to move things around to get access with short instructions (branch versus long branch). Same source code would change based on size if it moved routines further away and needed to go to long branches.
_________________
Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?
temtronic



Joined: 01 Jul 2010
Posts: 9244
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Fri Feb 21, 2020 12:44 pm     Reply with quote

I was thinking the magical '78' was a holdover from ASR33 days, then remembered they only printed 72 characters per line.
You'd think 'trip points' would be nice 'binary' numbers like 64, 128, etc.
Ttelmah



Joined: 11 Mar 2010
Posts: 19544

View user's profile Send private message

PostPosted: Fri Feb 21, 2020 1:10 pm     Reply with quote

The formulae I posted gave about 30. However I realise there is the
call instruction to actually call this, and the data has to be transferred
from the temporary variable on the return, while the direct mov can
put it directly into the required variable. So the formulae are:

chars/2
9+chars/3

Finding where these 'cross' gives just under 70 characters as the crossing
point. At 78, the MOV uses 39 instruction words, while the DATA
uses 35.
So looks as if CCS leaves the change over a few bytes beyond the
optimum, but not much.
allenhuffman



Joined: 17 Jun 2019
Posts: 562
Location: Des Moines, Iowa, USA

View user's profile Send private message Visit poster's website

PostPosted: Fri Feb 21, 2020 1:13 pm     Reply with quote

If this were Facebook, I'd click Thumbs Up on that. Cool.

(I wonder if the value was chosen based on a different PIC variation that generated a different amount of bytes.)
_________________
Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group