|
|
View previous topic :: View next topic |
Author |
Message |
allenhuffman
Joined: 17 Jun 2019 Posts: 554 Location: Des Moines, Iowa, USA
|
5.092: how strings are stored in flash (const, etc.) |
Posted: Thu Feb 20, 2020 9:45 am |
|
|
Yesterday we reported a bug to CCS where sprintf() was mixing up strings and using the wrong one. Moving the location of strings around made it either work or fail. Basically:
Code: |
char someString[] = "Hello world";
char buffer[80]; // some buffer
int value = 42;
sprintf (buffer, "Value: %d", value);
|
This works, but when other strings were added, the sprintf() was actually using the "Hello " part of the other string rather than the "Value: " part of the sprintf, producing output of:
"Hello 42"
I did a quick check today to see if there was some difference in doing "char *string" versus "char string[]" and they all seem to do the same thing in CCS and GCC, but I noticed this:
Code: | const char *string1 = "This is a test";
const char string2[] = "This is a test";
char *string3 = "This is a test";
char string4[] = "This is a test";
|
When I build that, I get:
Code: | >>> Warning 202 "...snip...\main.c" Line 3(13,20): Variable never used: string1
>>> Warning 202 "...snip...\main.c" Line 7(7,14): Variable never used: string3
>>> Warning 202 "...snip...\main.c" Line 9(6,13): Variable never used: string4
|
string2 is also not being used, but there is no warning.
When I added code to use each string:
Code: | // Prevent strings from being optimized out.
printf ("%s", string1);
printf ("%s", string2);
printf ("%s", string3);
printf ("%s", string4); |
...same issue, same code.
Just a heads up in case anyone else has seen weirdness. We normally only use printf for debugging, but this project has an LCD and we sprintf formatted strings into buffers so they can then be turned into graphical fonts and send to the screen. We noticed wrong messages appearing and ended up here.
SIDE NOTE: the way a string is generated is different than what I am used to. Rather than storing the series of bytes in code space and using a loop routine to copy them into RAM, it generates a bunch of MOV commands to move them 16-bits at a time.
const char *string1 = "11111111111";
...turns into this (#31 is ascii for "1"):
Code: | 00268: MOV #3131,W4
0026A: MOV W4,800
0026C: MOV #3131,W4
0026E: MOV W4,802
00270: MOV #3131,W4
00272: MOV W4,804
00274: MOV #3131,W4
00276: MOV W4,806
00278: MOV #3131,W4
0027A: MOV W4,808
0027C: MOV #31,W4
0027E: MOV W4,80A |
Neat. This explains why adding 100 bytes of string grows the program much more than 100 bytes. :-) _________________ Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ?
Last edited by allenhuffman on Fri Feb 21, 2020 2:03 pm; edited 1 time in total |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19539
|
|
Posted: Thu Feb 20, 2020 10:09 am |
|
|
You have to remember that on the PIC24, there are only three bytes per
word. Somehow you have to either handle the maths to pack stuff
this way, or only store 2 bytes per word. Using the MOV instruction is
actually quite an efficient (and very quick) way of doing this. However it
does mean you use a word of storage for every 2 bytes.
Must admit puzzled by your storage oddity. I use hundreds of strings and
have never seen even one byte of problem like you are having. Suggests
something odd is happening with the particular chip. |
|
|
allenhuffman
Joined: 17 Jun 2019 Posts: 554 Location: Des Moines, Iowa, USA
|
|
Posted: Thu Feb 20, 2020 10:35 am |
|
|
PIC24 in this case...
Update... I think I read a note in the manual about using const and a size to place data in ROM, like a version string, so that is probably intentional.
But I ran a few tests, and got different results. For instance, with long strings, I get them as DATA statements! Well, except for one, which I'm not sure where it's going -- I don't find it in the source.
Code: |
const char *string1 = "1111111111111111111111111111111111111111111111111111111111111111111111111111111"; // 79 |
I can't find this anywhere as data or MOV statements, so I think the compiler is being clever and generating a series of 1's somewhere in a loop. It runs and displays the string.
Code: |
const char string2[] = "2222222222222222222222222222222222222222222222222222222222222222222222222222222"; // 79
0021C: DATA 32,32,32
0021E: DATA 32,32,32
00220: DATA 32,32,32
00222: DATA 32,32,32
00224: DATA 32,32,32
00226: DATA 32,32,32
00228: DATA 32,32,32
0022A: DATA 32,32,32
0022C: DATA 32,32,32
0022E: DATA 32,32,32
00230: DATA 32,32,32
00232: DATA 32,32,32
00234: DATA 32,32,32
00236: DATA 32,32,32
00238: DATA 32,32,32
0023A: DATA 32,32,32
0023C: DATA 32,32,32
0023E: DATA 32,32,32
00240: DATA 32,32,32
00242: DATA 32,32,32
00244: DATA 32,32,32
00246: DATA 32,32,32
00248: DATA 32,32,32
0024A: DATA 32,32,32
0024C: DATA 32,32,32
0024E: DATA 32,32,00
00250: DATA 32,32,00 |
Earlier, this generated code. Maybe it does for smaller strings, and after a set size it changes to data. To be tested.
Code: |
const char string3[80] = "3333333333333333333333333333333333333333333333333333333333333333333333333333333"; // 79
0026E: DATA 33,33,33
00270: DATA 33,33,33
00272: DATA 33,33,33
00274: DATA 33,33,33
00276: DATA 33,33,33
00278: DATA 33,33,33
0027A: DATA 33,33,33
0027C: DATA 33,33,33
0027E: DATA 33,33,33
00280: DATA 33,33,33
00282: DATA 33,33,33
00284: DATA 33,33,33
00286: DATA 33,33,33
00288: DATA 33,33,33
0028A: DATA 33,33,33
0028C: DATA 33,33,33
0028E: DATA 33,33,33
00290: DATA 33,33,33
00292: DATA 33,33,33
00294: DATA 33,33,33
00296: DATA 33,33,33
00298: DATA 33,33,33
0029A: DATA 33,33,33
0029C: DATA 33,33,33
0029E: DATA 33,33,33
002A0: DATA 33,33,00
002A2: DATA 33,33,00
|
Again, this looks good as data. So I shorten the strings to 11 characters and it still makes DATA statements!
Odd. The behavior changed right in front of me
Weird, huh? _________________ Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ? |
|
|
allenhuffman
Joined: 17 Jun 2019 Posts: 554 Location: Des Moines, Iowa, USA
|
|
Posted: Thu Feb 20, 2020 10:40 am |
|
|
I give up I just did 11 byte examples, but having them a few lines higher in the file (global, outside of main) makes it suddenly create it with MOVs instead of a loop or whatever it was doing:
Code: | const char *string1 = "11111111111";
00280: MOV W4,800
00282: MOV #3131,W4
00284: MOV W4,802
00286: MOV #3131,W4
00288: MOV W4,804
0028A: MOV #3131,W4
0028C: MOV W4,806
0028E: MOV #3131,W4
00290: MOV W4,808
00292: MOV #31,W4
00294: MOV W4,80A
const char string2[] = "22222222222";
0020C: DATA 32,32,00
0020E: DATA 32,32,00
00210: DATA 32,32,00
00212: DATA 32,32,00
00214: DATA 32,32,00
00216: DATA 32,00,00
const char string3[12] = "33333333333";
00224: DATA 33,33,00
00226: DATA 33,33,00
00228: DATA 33,33,00
0022A: DATA 33,33,00
0022C: DATA 33,33,00
0022E: DATA 33,00,00
|
Yeah, just something weird going on. I guess I don't quite understand the rules for generating MOV versus DATA since I've seen it do it both ways with short strings.
My coworker "solved" his bug by just moving some things around, so that's probably what's going on here. _________________ Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ? |
|
|
allenhuffman
Joined: 17 Jun 2019 Posts: 554 Location: Des Moines, Iowa, USA
|
|
Posted: Thu Feb 20, 2020 10:46 am |
|
|
Ah, 78 characters is the magic value...
78:
Code: |
const char *string1 =
"11111111111111111111111111111111111111111111111111111111111111111111111111111";
0027A: CLR 84E
0027C: SETM 32C
0027E: MOV #3131,W4
00280: MOV W4,800
00282: MOV #3131,W4
00284: MOV W4,802
00286: MOV #3131,W4
00288: MOV W4,804
0028A: MOV #3131,W4
0028C: MOV W4,806
0028E: MOV #3131,W4
00290: MOV W4,808
00292: MOV #3131,W4
00294: MOV W4,80A
00296: MOV #3131,W4
00298: MOV W4,80C
0029A: MOV #3131,W4
0029C: MOV W4,80E
0029E: MOV #3131,W4
002A0: MOV W4,810
002A2: MOV #3131,W4
002A4: MOV W4,812
002A6: MOV #3131,W4
002A8: MOV W4,814
002AA: MOV #3131,W4
002AC: MOV W4,816
002AE: MOV #3131,W4
002B0: MOV W4,818
002B2: MOV #3131,W4
002B4: MOV W4,81A
002B6: MOV #3131,W4
002B8: MOV W4,81C
002BA: MOV #3131,W4
002BC: MOV W4,81E
002BE: MOV #3131,W4
002C0: MOV W4,820
002C2: MOV #3131,W4
002C4: MOV W4,822
002C6: MOV #3131,W4
002C8: MOV W4,824
002CA: MOV #3131,W4
002CC: MOV W4,826
002CE: MOV #3131,W4
002D0: MOV W4,828
002D2: MOV #3131,W4
002D4: MOV W4,82A
002D6: MOV #3131,W4
002D8: MOV W4,82C
002DA: MOV #3131,W4
002DC: MOV W4,82E
002DE: MOV #3131,W4
002E0: MOV W4,830
002E2: MOV #3131,W4
002E4: MOV W4,832
002E6: MOV #3131,W4
002E8: MOV W4,834
002EA: MOV #3131,W4
002EC: MOV W4,836
002EE: MOV #3131,W4
002F0: MOV W4,838
002F2: MOV #3131,W4
002F4: MOV W4,83A
002F6: MOV #3131,W4
002F8: MOV W4,83C
002FA: MOV #3131,W4
002FC: MOV W4,83E
002FE: MOV #3131,W4
00300: MOV W4,840
00302: MOV #3131,W4
00304: MOV W4,842
00306: MOV #3131,W4
00308: MOV W4,844
0030A: MOV #3131,W4
0030C: MOV W4,846
0030E: MOV #3131,W4
00310: MOV W4,848
00312: MOV #3131,W4
00314: MOV W4,84A
00316: MOV #31,W4
00318: MOV W4,84C |
79: (adding '*' -- 42 -- to the end)
Code: | const char *string1 =
"11111111111111111111111111111111111111111111111111111111111111111111111111111*";
00280: DATA C0,4D,08
00282: DATA 00,31,00 <<<< 31 == '1'
00284: DATA 02,2A,00 <<<< 0x2a == 42 == '*'
00286: DATA 00,00,00 |
At 78 characters, it generates different code, and it's smart code at that.
One other compiler I worked with (Renesas or IAR) would error out if you tried to make a string longer than 80 bytes, so... yeah, okay. _________________ Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ? |
|
|
allenhuffman
Joined: 17 Jun 2019 Posts: 554 Location: Des Moines, Iowa, USA
|
|
Posted: Thu Feb 20, 2020 10:54 am |
|
|
Okay, one more... No variables at all. It's smart for long repeating strings. Very impressive!
BUT, check what it did for a short string ("333...") at the end:
Code: | .................... printf ("1111111111111111111111111111111111111111111111111111111111111111111111111111111");
0027C: MOV.B #4F,W5L
0027E: MOV #31,W0
00280: MOV.B W0L,804
00282: CALL 23E
*
00286: DEC.B 000A
00288: BTSS.B 42.1
0028A: BRA 27E
.................... printf ("*******************************************************************************");
0028C: MOV.B #4F,W5L
0028E: MOV #2A,W0
00290: MOV.B W0L,804
00292: CALL 23E
*
00296: DEC.B 000A
00298: BTSS.B 42.1
0029A: BRA 28E
.................... printf ("22222222222222222222222222222222222222222222222222222222222222222222222222222");
0029C: MOV.B #4D,W5L
0029E: MOV #32,W0
002A0: MOV.B W0L,804
002A2: CALL 23E
*
002A6: DEC.B 000A
002A8: BTSS.B 42.1
002AA: BRA 29E
.................... printf ("33333333333333333333333333333333");
002AC: MOV #0,W1
002AE: MOV W1,W0
002B0: CLR.B 1
002B2: CALL 200 |
Very nice!
But, for the shorter string of 3s, it still generated a data table:
Code: | 0020C: DATA 33,33,00
0020E: DATA 33,33,00
00210: DATA 33,33,00
00212: DATA 33,33,00
00214: DATA 33,33,00
00216: DATA 33,33,00
00218: DATA 33,33,00
0021A: DATA 33,33,00
0021C: DATA 33,33,00
0021E: DATA 33,33,00
00220: DATA 33,33,00
00222: DATA 33,33,00
00224: DATA 33,33,00
00226: DATA 33,33,00
00228: DATA 33,33,00
0022A: DATA 33,33,00
0022C: DATA 00,00,00 |
So if you print long repeating strings, you can save room by making them longer ;-) _________________ Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ? |
|
|
allenhuffman
Joined: 17 Jun 2019 Posts: 554 Location: Des Moines, Iowa, USA
|
|
Posted: Thu Feb 20, 2020 10:58 am |
|
|
Long non-repeating strings and short non-repeating strings both ended up as DATA at the top:
Code: | .................... printf ("12345678901234567890123456789012345678901234567890123456789012345678901234567890");
002C0: MOV #0,W1
002C2: MOV W1,W0
002C4: CLR.B 1
002C6: CALL 200
*
002CA: INC W1,W1
002CC: MOV W1,[W15++]
002CE: MOV.B W0L,802
002D0: CALL 284
*
002D4: MOV [--W15],W1
002D6: MOV #4F,W0
002D8: CPSGT W1,W0
002DA: BRA 2C2
.................... printf ("12345678901234567890");
002DC: MOV #0,W1
002DE: MOV W1,W0
002E0: CLR.B 1
002E2: CALL 252
*
002E6: INC W1,W1
002E8: MOV W1,[W15++]
002EA: MOV.B W0L,802
002EC: CALL 284
*
002F0: MOV [--W15],W1
002F2: MOV #13,W0
002F4: CPSGT W1,W0
002F6: BRA 2DE
.................... |
... that code is pulling out the data from the top:
Code: | 0021C: DATA 31,32,35
0021E: DATA 33,34,36
00220: DATA 35,36,37
00222: DATA 37,38,38
00224: DATA 39,30,39
00226: DATA 31,32,30
00228: DATA 33,34,31
0022A: DATA 35,36,32
0022C: DATA 37,38,33
0022E: DATA 39,30,34
00230: DATA 31,32,35
00232: DATA 33,34,36
00234: DATA 35,36,37
00236: DATA 37,38,38
00238: DATA 39,30,39
0023A: DATA 31,32,30
0023C: DATA 33,34,31
0023E: DATA 35,36,32
00240: DATA 37,38,33
00242: DATA 39,30,34
00244: DATA 31,32,35
00246: DATA 33,34,36
00248: DATA 35,36,37
0024A: DATA 37,38,38
0024C: DATA 39,30,39
0024E: DATA 31,32,30
00250: DATA 33,34,00
00252: CLR 32
00254: MOV #25E,W3
00256: ADD W3,W0,W0
00258: TBLRDL.B[W0],W0L
0025A: CLR.B 1
0025C: RETURN
0025E: DATA 31,32,00
00260: DATA 33,34,00
00262: DATA 35,36,00
00264: DATA 37,38,00
00266: DATA 39,30,00
00268: DATA 31,32,00
0026A: DATA 33,34,00
0026C: DATA 35,36,00
0026E: DATA 37,38,00
00270: DATA 39,30,00
00272: DATA 00,00,00
.................... |
Well, that's at least a better understanding of the rules.
I'm just not clear on what triggers MOV versus DATA. It's not length (though length changes the "compression" code being used). _________________ Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ? |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19539
|
|
Posted: Thu Feb 20, 2020 12:08 pm |
|
|
Remember with the data statements, there are several lines
of code to actually handle the extraction from these. So the MOV will
result in smaller code for short strings. Sounds as if it has a switch
based on length, and something about the original placement is
confusing it!...
The extraction code is this for one of the strings you show:
Code: |
00252: CLR 32
00254: MOV #25E,W3
00256: ADD W3,W0,W0
00258: TBLRDL.B[W0],W0L
0025A: CLR.B 1
0025C: RETURN
|
So six instruction words. So it'd become more efficient to use the DATA
statement above about 30 characters. The MOV version uses chars/2
instruction words the DATA version 6+chars/3. |
|
|
allenhuffman
Joined: 17 Jun 2019 Posts: 554 Location: Des Moines, Iowa, USA
|
|
Posted: Fri Feb 21, 2020 10:25 am |
|
|
Ttelmah wrote: | Remember with the data statements, there are several lines
of code to actually handle the extraction from these. So the MOV will
result in smaller code for short strings. |
Ah, yes. So there's probably a break-even where a bunch of MOVs versus code-and-DATA makes more sense. I guess 78 bytes is it, though I haven't counted the instructions to see.
And, since code generation (accessing data) changes based on data size (using different instructions to access data further away), I bet that could be tied to the issue where code generation changes based on moving things around.
Heck, it's reminding me of 6809 assembly where I'd always have to move things around to get access with short instructions (branch versus long branch). Same source code would change based on size if it moved routines further away and needed to go to long branches. _________________ Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ? |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9243 Location: Greensville,Ontario
|
|
Posted: Fri Feb 21, 2020 12:44 pm |
|
|
I was thinking the magical '78' was a holdover from ASR33 days, then remembered they only printed 72 characters per line.
You'd think 'trip points' would be nice 'binary' numbers like 64, 128, etc. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19539
|
|
Posted: Fri Feb 21, 2020 1:10 pm |
|
|
The formulae I posted gave about 30. However I realise there is the
call instruction to actually call this, and the data has to be transferred
from the temporary variable on the return, while the direct mov can
put it directly into the required variable. So the formulae are:
chars/2
9+chars/3
Finding where these 'cross' gives just under 70 characters as the crossing
point. At 78, the MOV uses 39 instruction words, while the DATA
uses 35.
So looks as if CCS leaves the change over a few bytes beyond the
optimum, but not much. |
|
|
allenhuffman
Joined: 17 Jun 2019 Posts: 554 Location: Des Moines, Iowa, USA
|
|
Posted: Fri Feb 21, 2020 1:13 pm |
|
|
If this were Facebook, I'd click Thumbs Up on that. Cool.
(I wonder if the value was chosen based on a different PIC variation that generated a different amount of bytes.) _________________ Allen C. Huffman, Sub-Etha Software (est. 1990) http://www.subethasoftware.com
Embedded C, Arduino, MSP430, ESP8266/32, BASIC Stamp and PIC24 programmer.
http://www.whywouldyouwanttodothat.com ? |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|