CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

unlooped draw
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
Pyrofer



Joined: 13 Sep 2006
Posts: 16

View user's profile Send private message

unlooped draw
PostPosted: Wed Sep 13, 2006 7:28 am     Reply with quote

I need to improve the speed of a loop, and its been suggested that I take it from a C loop, to an unlooped goto in asm.

Code:

for (x=0; x<width; x++) mysend(data);


This can be done quicker in asm with
Code:

1 jump (10-width) lines forward
2 mysend(data)
3 mysend(data)
4 mysend(data)
5 mysend(data)
6 mysend(data)
etc


This is faster because your not messing around with checking x against width after every call of the mysend function.
Yes, speed is this critical for my routine, and yes there is enough of a difference to make it worthwhile.

I know nothing of assembly, please can somebody help with how I could acheive this as an inline ASM ?
asmallri



Joined: 12 Aug 2004
Posts: 1638
Location: Perth, Australia

View user's profile Send private message Send e-mail Visit poster's website

PostPosted: Wed Sep 13, 2006 7:39 am     Reply with quote

If x is declared as an unsigned int or unsigned char then the 4 loop is as optimized as you are going to get however mysend could be optimized. Is it small enough that you can add it inline? If so you will save on unneeded call and returns.
_________________
Regards, Andrew

http://www.brushelectronics.com/software
Home of Ethernet, SD card and Encrypted Serial Bootloaders for PICs!!
Pyrofer



Joined: 13 Sep 2006
Posts: 16

View user's profile Send private message

PostPosted: Wed Sep 13, 2006 7:49 am     Reply with quote

Are you sure?

The guy who suggested this was pretty clear that the for loop would slow things down.

I doubt that the compiler puts 128 calls to mysend in a line and jumps into that list, as its a waste of program space, but for me thats better than the time checking the for loop each time.

As for having mysend inline, see my other post on 9bit spi
asmallri



Joined: 12 Aug 2004
Posts: 1638
Location: Perth, Australia

View user's profile Send private message Send e-mail Visit poster's website

PostPosted: Wed Sep 13, 2006 7:57 am     Reply with quote

Yes I am sure. The little bit of overhead (and it is little) that the loop introduces is far outweighed by the loss in efficiency of making function calls. Also with the look there would be a singhle inline instance of your called routine.
_________________
Regards, Andrew

http://www.brushelectronics.com/software
Home of Ethernet, SD card and Encrypted Serial Bootloaders for PICs!!
Pyrofer



Joined: 13 Sep 2006
Posts: 16

View user's profile Send private message

PostPosted: Wed Sep 13, 2006 8:00 am     Reply with quote

Thanks for your help.
Ill put the existing mysend routine inline, but I still need to optimise that into asm as im sure it could be done better than how ive got it in C.
Ttelmah
Guest







PostPosted: Wed Sep 13, 2006 8:06 am     Reply with quote

The jump forward approach, can be made to work, but you are having to calculate the offset, adjust this for the size of the calls, and the total saving will be tiny (may actually be non-existent, since this approach will force a call for each of the subroutines). The 'for' loop will be fractionally quicker with:

for(x=width;x;--x)

The advantage is that you only have to access one variable, not two in the loop. If you combine this with declaring 'mysend' as inline, there may be a slight saving.
For the 'jump' approach, the problem is that each call will need to be setup with the 'data', so the total program space needed for each call will be a significant size, making the jump calculation more complex. However if no data is needed for the call, then something like:
Code:

int8 jump;
jump=(width-10)<<1;
#asm
movf jump
addwf PC,F
#endasm
mysend();
mysend();
mysend();
.....

With a suitable declaration of the PC storage register (depending on whether this is a 16, or 18 chip), and 'mysend' declared as separate, provided the routines all sit in one bank of memory, should be close.

Best Wishes
Pyrofer



Joined: 13 Sep 2006
Posts: 16

View user's profile Send private message

PostPosted: Wed Sep 13, 2006 5:31 pm     Reply with quote

ive done the suggested improvments, changed the format of the for loop and put the mysend inline. Its faster, but not dramatically so.

I will still try the inline asm I think. I will have to benchmark them and see what ends up being faster

Thanks for all the help guys, on both my topics!

Ive made lots of progress because of your answers. Much appreciated.
Check out
www.pyrofersprojects.com/3dcube.php

to see what its all gone towards.
Ttelmah
Guest







PostPosted: Thu Sep 14, 2006 9:18 am     Reply with quote

As a further comment, anything you can do to improve 'mysend', will have as big an effect. The actual overhead of the loop, is a few instructions, and just one instruction wasted in mysend, will have just as big an effect.

Best Wishes
Pyrofer



Joined: 13 Sep 2006
Posts: 16

View user's profile Send private message

PostPosted: Thu Sep 14, 2006 11:59 am     Reply with quote

mysend has now been optimised, its basically a 9bit spi routine, there is only so much that can be done.

Would having the data byte as a global so it doesnt need to get passed to mysend be any quicker?
Ttelmah
Guest







PostPosted: Thu Sep 14, 2006 2:52 pm     Reply with quote

Yes.
There is probably as much overhead from passing a variable, as is involved in the entire loop!...

Best Wishes
Pyrofer



Joined: 13 Sep 2006
Posts: 16

View user's profile Send private message

PostPosted: Fri Sep 15, 2006 2:43 am     Reply with quote

Thanks for that!

I was always taught when programming in C to avoid globals like the plague. I dont know why, my tutor came up with some excuses but I never really beleived them. I guess I just tried to avoid them because id been taught it was good programming practice, they never mentioned it slowed down performance!

Ill basically convert all my variables into globals now, I have enough ram and if there is a speed saving each time then I should be able to take the whole program up a notch.
ckielstra



Joined: 18 Mar 2004
Posts: 3680
Location: The Netherlands

View user's profile Send private message

PostPosted: Fri Sep 15, 2006 5:41 am     Reply with quote

It is good programming practice to keep local variables local as it helps to save RAM and makes your program easier to maintain (the variable declaration is close to where it is used and you don't run into accidentally using the same variable twice).

That said, global variables can help to speed up your program in some very specific situations. An example is where the same data is used by multiple functions (it saves passing of function parameters).

So in my programs I use some global variables, but only when I can point out for each variable that it has significant advantages over using a local variable. Don't make all variables global because someone told you it is faster, you are the one who has _know_ it makes a difference or not.

As a general speed optimization rule: The critical parts are often in less than 5% of the total program code. Identify this small part and then look for improvements.

As a possible optimization: You said the SPI routine is now 9-bits and I assume this is your own bit toggling routine? Why not use the inbuilt SPI hardware which is always faster than any routine you can create? I know the inbuilt hardware only accepts 8-bits, but there are several ways to cheat on this. (8-bits by hardware + 1 bit-bang bit, or concatenate multiple 9-bit words, or...)
libor



Joined: 14 Dec 2004
Posts: 288
Location: Hungary

View user's profile Send private message

PostPosted: Sat Sep 16, 2006 9:38 am     Reply with quote

In a similar situation (in a bit-toggling routine to send out bits of variables one-by-one with a fixed intrabit timing with no allowable intrabyte overhead) I use the intrabit 'idle' timeslots (thus I have 7 occasions of these) to prepare data needed by the loop's next iteration to save time at the loop's header. I can split this task into up to seven timeslots.

look at my pseudo-code:

Code:

for (i=0, i<length, ;)  {   //e.g. no increment at the iteration level to save time

  bit_toggle_7th_bit
  i++;                           //split task, increment loop variable now when
                                   //I have idle time between bits
 
  wait_for_timer_flag      //i use a timer as the bps timebase, no interrupts
                                    //just test the flag
  reset_timer_flag



  bit_toggle_6th_bit
  nextdata=buffer[i];         //split task done here when I have idle time

  wait_for_timer_flag
  reset_timer_flag


  etc.                     //overall I have 7 intrabit idle timeslots to move
                           //out as many code from the loop header as possible

}
Pyrofer



Joined: 13 Sep 2006
Posts: 16

View user's profile Send private message

PostPosted: Sat Sep 16, 2006 4:37 pm     Reply with quote

Ok, here is the routine that sends the data to the lcd

Code:

     SSPEN=0;
     output_low(LCDCLOCK);
     output_high(LCDDATA);//send data
     output_high(LCDCLOCK);
     output_low(LCDCLOCK);
     SSPEN=1;
     spi_write(color);


I think thats as good as its ever going to get.
libor



Joined: 14 Dec 2004
Posts: 288
Location: Hungary

View user's profile Send private message

PostPosted: Sun Sep 17, 2006 2:33 am     Reply with quote

spi_write(color);

this instruction puts 'color' into the SSPBUF and then waits doing nothing till all the bits have left the port, looping and testing until SSPSTAT.BF flag gets set (this is to avoid SSP buffer overwrites in consecutive spi_write instructions.) Your code continues only after SSPBUF has been completely sent by the hardware.

you can use this idle time to do more useful things by splitting up the spi_write() using assembly. e.g. wait for the BF flag before the bit-toggling part of your code, and then you'll have plenty of time for useful-code execution in the end of the routine while the PIC sends out the 8 bits in hardware.

Just by putting the wait before sending (bit-toggling 9th bit), all the code in the loop can go on with the execution up to the beginning of the next iteration, so no time will be waisted.

BTW Do you really need that much speed optimization ?
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group