View previous topic :: View next topic |
Author |
Message |
henriquesv
Joined: 20 Feb 2011 Posts: 8
|
Fast "SPI like" using assembly inside C |
Posted: Tue Aug 23, 2011 10:07 am |
|
|
Hello everyone.
I am working with some multiplexing hardware and I am sending serial bits to a shift register.
The thing is: pulling this data out is taking too much time of my window:
Code: |
for(d=0; d<8; d++)
{
y = dsp_x & BIT_0;
output_bit(SDA, y);
output_bit(CLK,1);
output_bit(CLK,0);
dsp_x = dsp_x >> 1;
}
|
Yes, it is as simple as that.
Here is what the compiler is generating (I think there is a better way of doing this):
Code: |
.................... for(d=0; d<8; d++)
0020: CLRF 27
0021: MOVF 27,W
0022: SUBLW 07
0023: BTFSS 03.0
0024: GOTO 03D
.................... {
.................... y = dsp_x & BIT_0;
0025: MOVF 28,W
0026: ANDLW 01
0027: MOVWF 29
.................... output_bit(SDA, y);
0028: MOVF 29,F
0029: BTFSS 03.2
002A: GOTO 02D
002B: BCF 05.6
002C: GOTO 02E
002D: BSF 05.6
002E: BSF 03.5
002F: BCF 05.6
.................... output_bit(CLK,1);
0030: BCF 03.5
0031: BSF 05.7
0032: BSF 03.5
0033: BCF 05.7
.................... output_bit(CLK,0);
0034: BCF 03.5
0035: BCF 05.7
0036: BSF 03.5
0037: BCF 05.7
.................... dsp_x = dsp_x >> 1;
0038: BCF 03.0
0039: BCF 03.5
003A: RRF 28,F
.................... }
003B: INCF 27,F
003C: GOTO 021
|
I plan to use a function such as this:
Code: |
int spi(int data){
int count;
#asm
PORTA equ 0x05
MOVLW 0x08
MOVWF count;
loop:
;XOR.B data,W0
;RRC data,W0
DECF count,1 ; Decrement f in file register f
;BRA NZ, loop
;MOV #0x01,W0
;ADD count,F
;MOV count, W0
;MOV W0, _RETURN_
#endasm
}
|
Could anyone help me out? I bet it is going to be useful to many others.
I have already worked with other CISC processors, which made my work easier.
Thanks!
Best Regards.
Henrique
Last edited by henriquesv on Wed Aug 24, 2011 10:15 am; edited 1 time in total |
|
|
SherpaDoug
Joined: 07 Sep 2003 Posts: 1640 Location: Cape Cod Mass USA
|
|
Posted: Tue Aug 23, 2011 10:29 am |
|
|
You could save 6 instructons per loop if you used fastIO. I would prefer that to embedded assembly. _________________ The search for better is endless. Instead simply find very good and get the job done. |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Tue Aug 23, 2011 1:15 pm |
|
|
In addition to the fast i/o, you could do the usual tactic of unrolling the
loop. You trade using more ROM, for more speed. Compile this and
look at the code. Within the bit code, I put in a delay_cycles(1) statement
to try to make the time be equal for either bit case (high or low data bit).
I also noted that you're sending the data LSB first. I don't know if you're
trying to emulate SPI, but SPI is usually sent MSB first. But anyway, I
kept it with LSB first in the example below.
Code: |
#include <16F877.H>
#fuses XT, NOWDT, NOPROTECT, BROWNOUT, PUT, NOLVP
#use delay(clock=4000000)
#use rs232(baud=9600, xmit=PIN_C6, rcv=PIN_C7, ERRORS)
#define SDA PIN_B4
#define CLK PIN_B3
void spi_write_sw(int8 data)
{
// Set TRIS to output and set SDA and CLK low.
output_low(SDA);
output_low(CLK);
#use fast_io(B) // Temporarily use fast i/o for speed
//---------------------
// Send bit 0
if(bit_test(data, 0))
{
output_high(SDA);
}
else
{
output_low(SDA);
delay_cycles(1);
}
output_high(CLK);
output_low(CLK);
//---------------------
// Send bit 1
if(bit_test(data, 1))
{
output_high(SDA);
}
else
{
delay_cycles(1);
output_low(SDA);
}
output_high(CLK);
output_low(CLK);
//-----------------
// And continue with sections for bits 2, 3, 4, 5, 6, 7
#use standard_io(B) // Return to standard i/o
}
//==========================================
void main()
{
spi_write_sw(0x55);
while(1);
} |
|
|
|
henriquesv
Joined: 20 Feb 2011 Posts: 8
|
|
Posted: Tue Aug 23, 2011 3:26 pm |
|
|
I want to thank both of you. Great strategies indeed.
I'll make a few tests, compare the results and let you know.
Best regards. |
|
|
henriquesv
Joined: 20 Feb 2011 Posts: 8
|
|
Posted: Wed Aug 24, 2011 6:14 am |
|
|
Ok, here's the whole scenario:
First I sent you guys just a pick of the whole. I thought that solving a small problem would do for something bigger.
The thing is: I don't really have to follow "SPI protocols". I am sending 16 bits to a shift register and I want it to happen as fast as possible.
I am using a PIC16F628 an here's where I got after your help:
Code: |
extern char channel;
extern char chunks[6];
union
{
int16 full_data;
struct
{
unsigned char control_x:8;
unsigned char data_x:8;
}parts;
} spi_data;
char d = 0;
spi_data.full_data = 0x0080;
#pragma use fast_io(A)
spi_data.parts.control_x = spi_data.parts.control_x >> channel;
spi_data.parts.data_x = decode_table[chunks[channel]];
// Stream de DSP
for(d=0; d<16; d++)
{
output_bit(SDA, (spi_data.full_data & 0x0001));
output_bit(CLK,1);
output_bit(CLK,0);
spi_data.full_data = spi_data.full_data >> 1;
}
channel++;
if (channell>5)
{
channel=0;
}
#pragma use standard_io(A)
|
This piece of code is taking me average 49 cycles.
I tried to follow your last trick unrolling the loop, but it takes about 10 cycles for each bit.
Do you guys see any other way to optimize this process?
Thank you!
Best Regards. |
|
|
SherpaDoug
Joined: 07 Sep 2003 Posts: 1640 Location: Cape Cod Mass USA
|
|
Posted: Wed Aug 24, 2011 9:21 am |
|
|
Something is wrong if the unrolled code is acutally slower. It may be very bulky but it should be very fast. _________________ The search for better is endless. Instead simply find very good and get the job done. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19535
|
|
Posted: Wed Aug 24, 2011 9:58 am |
|
|
I think the poster must be confusing codespace instructions, with operation cycles.
The code might well use 49 instructions, but when run, the centre twenty or so are repeated 16 times.
You need to run the code in something like MPLAB SIM, with the stopwatch function, and see how many cycles it uses. I'd guess about 350.
The unraveled version will be larger, but probably a good 50 to 100 instructions faster...
Have you also just tried using the CCS SPI software functions to do this?.
Best Wishes |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9241 Location: Greensville,Ontario
|
|
Posted: Wed Aug 24, 2011 10:07 am |
|
|
just a comment.
If you're NOT using SPI protocols you might consider renaming your variables, etc. to something other than 'SPI'. Having SPI mentioned puts some of us older guys into the mindset that you're really using SPI which is not the case here. It'll take us down the wrong paths trying to figure out 'modes', data lengths, timings...that are SPI based and not relevant to your shift register I/O. |
|
|
henriquesv
Joined: 20 Feb 2011 Posts: 8
|
|
Posted: Wed Aug 24, 2011 10:20 am |
|
|
Hi temtronic,
You're right. I even changed the topic's name.
Ttelmah,
What I really did was to check every instruction used for a code block and then taking in account the number of cycles each one takes (checking PIC Datasheet).
But I am willing to run the code in something like MPLAB SIM. Never used it though, any tip where I should start looking?
Once I tried to use CCS SPI software functions but they seemed to be slower...
Thank you guys.
Best Regards. |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Wed Aug 24, 2011 11:06 am |
|
|
The overhead of your for() loop is, by itself, about equal to the per bit
time of the example. There is no way your code is faster. Your
output_bit() line is, by itself, substantially longer than the per bit time
for my code. Compare the .LST files for each program. |
|
|
henriquesv
Joined: 20 Feb 2011 Posts: 8
|
|
Posted: Wed Aug 24, 2011 4:28 pm |
|
|
Bottom line:
I want to thank you all for the tips and tricks.
Using Stopwatch I got:
391us - Using for looping.
263us - Using spi_xfer() CCS api.
And as PCM programmer told us... unrolling the loop got the best result:
127us (of course in an expense of some extra ROM) !
Problem solved.
My best wishes! |
|
|
|