unlooped draw

Pyrofer · Joined: 13 Sep 2006 Posts: 16

I need to improve the speed of a loop, and its been suggested that I take it from a C loop, to an unlooped goto in asm.

asmallri · Joined: 12 Aug 2004 Posts: 1634 Location: Perth, Australia

If x is declared as an unsigned int or unsigned char then the 4 loop is as optimized as you are going to get however mysend could be optimized. Is it small enough that you can add it inline? If so you will save on unneeded call and returns.
_________________
Regards, Andrew

http://www.brushelectronics.com/software
Home of Ethernet, SD card and Encrypted Serial Bootloaders for PICs!!

Pyrofer · Joined: 13 Sep 2006 Posts: 16

Are you sure?

The guy who suggested this was pretty clear that the for loop would slow things down.

I doubt that the compiler puts 128 calls to mysend in a line and jumps into that list, as its a waste of program space, but for me thats better than the time checking the for loop each time.

As for having mysend inline, see my other post on 9bit spi

asmallri · Joined: 12 Aug 2004 Posts: 1634 Location: Perth, Australia

Yes I am sure. The little bit of overhead (and it is little) that the loop introduces is far outweighed by the loss in efficiency of making function calls. Also with the look there would be a singhle inline instance of your called routine.
_________________
Regards, Andrew

http://www.brushelectronics.com/software
Home of Ethernet, SD card and Encrypted Serial Bootloaders for PICs!!

Pyrofer · Joined: 13 Sep 2006 Posts: 16

Thanks for your help.
Ill put the existing mysend routine inline, but I still need to optimise that into asm as im sure it could be done better than how ive got it in C.

Ttelmah · Guest

The jump forward approach, can be made to work, but you are having to calculate the offset, adjust this for the size of the calls, and the total saving will be tiny (may actually be non-existent, since this approach will force a call for each of the subroutines). The 'for' loop will be fractionally quicker with:

for(x=width;x;--x)

The advantage is that you only have to access one variable, not two in the loop. If you combine this with declaring 'mysend' as inline, there may be a slight saving.
For the 'jump' approach, the problem is that each call will need to be setup with the 'data', so the total program space needed for each call will be a significant size, making the jump calculation more complex. However if no data is needed for the call, then something like:

Pyrofer · Joined: 13 Sep 2006 Posts: 16

ive done the suggested improvments, changed the format of the for loop and put the mysend inline. Its faster, but not dramatically so.

I will still try the inline asm I think. I will have to benchmark them and see what ends up being faster

Thanks for all the help guys, on both my topics!

Ive made lots of progress because of your answers. Much appreciated.
Check out
www.pyrofersprojects.com/3dcube.php

to see what its all gone towards.

Ttelmah · Guest

As a further comment, anything you can do to improve 'mysend', will have as big an effect. The actual overhead of the loop, is a few instructions, and just one instruction wasted in mysend, will have just as big an effect.

Best Wishes

Pyrofer · Joined: 13 Sep 2006 Posts: 16

mysend has now been optimised, its basically a 9bit spi routine, there is only so much that can be done.

Would having the data byte as a global so it doesnt need to get passed to mysend be any quicker?

Ttelmah · Guest

Yes.
There is probably as much overhead from passing a variable, as is involved in the entire loop!...

Best Wishes

Pyrofer · Joined: 13 Sep 2006 Posts: 16

Thanks for that!

I was always taught when programming in C to avoid globals like the plague. I dont know why, my tutor came up with some excuses but I never really beleived them. I guess I just tried to avoid them because id been taught it was good programming practice, they never mentioned it slowed down performance!

Ill basically convert all my variables into globals now, I have enough ram and if there is a speed saving each time then I should be able to take the whole program up a notch.

ckielstra · Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands

It is good programming practice to keep local variables local as it helps to save RAM and makes your program easier to maintain (the variable declaration is close to where it is used and you don't run into accidentally using the same variable twice).

That said, global variables can help to speed up your program in some very specific situations. An example is where the same data is used by multiple functions (it saves passing of function parameters).

So in my programs I use some global variables, but only when I can point out for each variable that it has significant advantages over using a local variable. Don't make all variables global because someone told you it is faster, you are the one who has _know_ it makes a difference or not.

As a general speed optimization rule: The critical parts are often in less than 5% of the total program code. Identify this small part and then look for improvements.

As a possible optimization: You said the SPI routine is now 9-bits and I assume this is your own bit toggling routine? Why not use the inbuilt SPI hardware which is always faster than any routine you can create? I know the inbuilt hardware only accepts 8-bits, but there are several ways to cheat on this. (8-bits by hardware + 1 bit-bang bit, or concatenate multiple 9-bit words, or...)

libor · Joined: 14 Dec 2004 Posts: 288 Location: Hungary

In a similar situation (in a bit-toggling routine to send out bits of variables one-by-one with a fixed intrabit timing with no allowable intrabyte overhead) I use the intrabit 'idle' timeslots (thus I have 7 occasions of these) to prepare data needed by the loop's next iteration to save time at the loop's header. I can split this task into up to seven timeslots.

look at my pseudo-code:

Pyrofer · Joined: 13 Sep 2006 Posts: 16

Ok, here is the routine that sends the data to the lcd

libor · Joined: 14 Dec 2004 Posts: 288 Location: Hungary

spi_write(color);

this instruction puts 'color' into the SSPBUF and then waits doing nothing till all the bits have left the port, looping and testing until SSPSTAT.BF flag gets set (this is to avoid SSP buffer overwrites in consecutive spi_write instructions.) Your code continues only after SSPBUF has been completely sent by the hardware.

you can use this idle time to do more useful things by splitting up the spi_write() using assembly. e.g. wait for the BF flag before the bit-toggling part of your code, and then you'll have plenty of time for useful-code execution in the end of the routine while the PIC sends out the 8 bits in hardware.

Just by putting the wait before sending (bit-toggling 9th bit), all the code in the loop can go on with the execution up to the beginning of the next iteration, so no time will be waisted.

BTW Do you really need that much speed optimization ?