|
|
View previous topic :: View next topic |
Author |
Message |
jecottrell
Joined: 16 Jan 2005 Posts: 559 Location: Tucson, AZ
|
Code Efficiency/Speed |
Posted: Sat Nov 25, 2006 1:27 am |
|
|
3.236
I've got a multiconditional statement in my main loop:
Code: |
if(x && y && z)
{
do this
}
|
Is it more efficient to use the single, multi-conditional 'if' statement or use a nested if, that has the simplest or quickest condition in the outermost if?
Code: |
if(x) //this is the least likely to be true
{
if(y && z)
{
do this
}
}
|
I would think it has to do with the product of the compilation and whether there is a 'guarantee' that the conditions in the 'multiconditional' if statement are checked in the order they are written?
As an additional question that may help with future problems like this, can anyone recommend a good source (book?) for learning PIC assembly, so that I may be able to decipher LST files?
Thanks,
John |
|
|
kender
Joined: 09 Aug 2004 Posts: 768 Location: Silicon Valley
|
|
Posted: Sat Nov 25, 2006 4:23 am |
|
|
If I'm not mistaken, the ANSI C spec requires that:
- These operators guarantee left-to-right evaluation.
- If the compiler can make an evaluation by examining only the left operand, the right operand is not evaluated. |
|
|
Ttelmah Guest
|
|
Posted: Sat Nov 25, 2006 9:00 am |
|
|
What does ANSI have to do with it?.
CCS, is not ANSI C....
The difference wll be almost immesurable for simple simple 8bit values and tests, since adding an extra test/jump, will take as long as just anding in the extra byte...
Best wishes. |
|
|
jecottrell
Joined: 16 Jan 2005 Posts: 559 Location: Tucson, AZ
|
|
Posted: Sat Nov 25, 2006 10:08 am |
|
|
Kender,
Thanks for the info. I had wondered about the ANSI-ishness of CCS as noted in Ttelmah's response.
Ttelmah,
I think I understand your response, but I'm not positive. To further explain the situation, the 'if' statement is used in my main loop and obviously is tested every time through which adds overhead to the loop. Now, I have one particular case that will evaluate to FALSE 92% of the time. If I test that case first I will save any unnecessary branching... So, is there any way to maximize the efficiency of the multiple case 'if' statement?
I think in your previous reply you are saying that to test all the cases it is here nor there whether it is done either way. But my brain tells me that there must be some overhead in fetching and comparing extra conditions and if you can eliminate any unnecessary branches you'll save processing time? So, the sooner you can get out of the conditional branch the better? |
|
|
Mark
Joined: 07 Sep 2003 Posts: 2838 Location: Atlanta, GA
|
|
Posted: Sat Nov 25, 2006 10:32 am |
|
|
The best way to know is to try them both and then look at the asm generated in the LST file. |
|
|
jecottrell
Joined: 16 Jan 2005 Posts: 559 Location: Tucson, AZ
|
|
Posted: Sat Nov 25, 2006 11:35 am |
|
|
Mark,
I had anticipated that response in my original post, that's why I asked the question about a good reference for PIC ASM. I found an OK online tutorial, but I always like to have a hard copy to dogear pages and markup. Do you know of any such animal?
BTW, I will try it and see if I can decipher the LST file.
Thanks,
John |
|
|
Ttelmah Guest
|
|
Posted: Sat Nov 25, 2006 4:17 pm |
|
|
Ok.
The point about my answer, relates to how things work on the PIC. On many processors (most), there is a significant timing difference between the true, and false branches on a test. On the PIC, there isn't. The 'straight through' path, using a normal test, still fetches the extra instruction for the branch, and takes an extra cycle.
Now if you take three numbers, and need to logically combine them, and branch,the basic sequence would be:
load number 'A' into W.
combine with 'B'.
combine with 'C'
test and branch.
This can actually be done in five machine cycles (plus any overhead for bank switching).
If instead you 'double test', you get:
load number 'A' into W.
combine with 'B'.
test and branch.
combine with 'C'
test and branch.
The 'best' case, drops to 4 machine cycles, but for everyting that needs the second test, the time now rises to 7 cycles.
For your 90% probability, it would then be worthwhile to do the shorter test, but in many case it wouldn't (anything with less than 66% probability for the first test, it is not worthwile).
However this makes a 'key' assumption in both cases, that the registers being tested, are in the same bank as is being used by the operational registers in the rest of the particular routine. If they are not, another four machine cycles need to be added (potentially for each switch). The problem here, is that because the registers need to be restored before the branch, or seperately afterwards, this more than doubles the cycles involved. Also, the 'combine' operation, is assumed to be a single machine operation. This is not the case for more complex logic, which can result in the 'all in one' operation taking the lead.
Whether shortcutting gains anything, depends particularly then on the logic operation involved. In fact in your case, the logic evaluates as:
if ((a!=0) && (b!=0) && (c!=0))
On the PIC, loading a value from a register, sets the 'Z' flag if the value is zero, so the compiler takes advantage of this, and does not combine the values at all, but instead simply reads each one in turn, and jumps past the code to be called if this value is non zero. Also on the latter 18 chips, there is a single instruction 'short' jump for the zero condition, and this can be used to make this really efficient, provided the code inside the test is short. This is by default done, and identical code will be generated for the combined, or the separate operation.
However it also becomes vital to use the .sym file, and verify that all registers used in the core 'fast' loop, reside in the same bank (change the order they are declared if not), and even better, make sure they are in the first bank (where scratch variables etc., are stored), since otherwise it is the bank switching which will cost the most in time. Also keep the total length of the code inside the routine called if the test is true short (if there is a subroutine, make sure it is not coded as 'inline'), or the longer (slower) jump will need t be used.
PCM programmer has published notes in the pst about how to optimise register layouts, and while these my seem not to apply to the 18 chips (where there are single instructions to fetch values from any part of the memory), they are still just as important, since these larger instructions, take two machine cycles...
Best Wishes |
|
|
jecottrell
Joined: 16 Jan 2005 Posts: 559 Location: Tucson, AZ
|
|
Posted: Sat Nov 25, 2006 5:09 pm |
|
|
Thank you very much for the in-depth explanation. I saw some of what you explained as a picked through the LST and SYM files. I even saw that one of my conditions was a function with a return value and the increased overhead that it cost.
I also saw the 'straight through' concept you explained. Now it seems rather obvious. On a processor of limited size, when testing a variety of conditions, A && !B && C, etc., that each will be done individually then branch. I DID NOT realize that similar conditions would be combined prior to testing. That is interesting, but makes complete sense.
What I've really learned is, is that I probably don't need to worry about this yet. But it is very interesting to start to understand the underlying ASM generated from the C code.
Thanks again,
John |
|
|
Humberto
Joined: 08 Sep 2003 Posts: 1215 Location: Buenos Aires, La Reina del Plata
|
|
Posted: Sun Nov 26, 2006 8:31 am |
|
|
Nice and interesting discussion. Let�s say that the CCS compiler handle very efficiently
all regarding bit/test and the generated code is as short and efficient like in ASM.
It doesn�t matter the coding style you use, it�ll give you the SAME generated code:
Code: |
// 1) Coding in this way:
if(x && y && z)
{ a = 1; }
// 2) Coding in this way:
if(x)
if(y)
if(z)
{ a = 1; }
// 3) Coding in this way:
if ((x!=0) && (y!=0) && (z!=0))
{ a = 1; }
|
The CCS Compiler generated the SAME and SHORTEST ASM code.
Code: |
.................... if(x && y && z)
0019: MOVF x,F
001A: BTFSC STATUS.2
001B: GOTO 024
001C: MOVF y,F
001D: BTFSC STATUS.2
001E: GOTO 024
001F: MOVF z,F
0020: BTFSC STATUS.2
0021: GOTO 024
.................... {
.................... a = 1;
0022: MOVLW 01
0023: MOVWF a
.................... }
................... if(x)
0024: MOVF x,F
0025: BTFSC STATUS.2
0026: GOTO 02F
.................... if(y)
0027: MOVF y,F
0028: BTFSC STATUS.2
0029: GOTO 02F
.................... if(z)
002A: MOVF z,F
002B: BTFSC STATUS.2
002C: GOTO 02F
.................... {
.................... a = 1;
002D: MOVLW 01
002E: MOVWF a
.................... }
....................
....................
.................... if ((x!=0) && (y!=0) && (z!=0))
002F: MOVF x,F
0030: BTFSC STATUS.2
0031: GOTO 03A
0032: MOVF y,F
0033: BTFSC STATUS.2
0034: GOTO 03A
0035: MOVF z,F
0036: BTFSC STATUS.2
0037: GOTO 03A
.................... {
.................... a = 1;
0038: MOVLW 01
0039: MOVWF a
.................... }
|
There is no way to do it in a shortest way, even in ASM.
Humberto |
|
|
jecottrell
Joined: 16 Jan 2005 Posts: 559 Location: Tucson, AZ
|
|
Posted: Sun Nov 26, 2006 8:55 am |
|
|
Humberto,
Thanks for the test. What part is that targeted for? I noticed in my experimentation for the 18F2525, that it uses a BRA (branch?) instruction where yours uses a GOTO. I'm wondering if the behavior that Ttelmah describes, combine and then branch is unique to 18F parts?
Even then, I still have decided that there is a slight gain to be realized depending on the test values. My example in my first post should have been more accurate and have looked more like this:
Code: |
if((!kbhit) && (!function1()) && (x == 5) && (etc.....))
{
do this
}
|
With a variety of conditions there is some minor optimization that can achieved manually in the code that is fed to the compiler.
Thanks again,
John |
|
|
Humberto
Joined: 08 Sep 2003 Posts: 1215 Location: Buenos Aires, La Reina del Plata
|
|
Posted: Sun Nov 26, 2006 10:31 am |
|
|
Quote: |
Thanks for the test. What part is that targeted for?
|
Hi John,
The posted code is valid for any mid-range PIC ucontroller using PCM Compiler V3.236
Regarding your preposition style it is a matter of looking in the resulting .lst file
generated and you will see the difference if so.
Humberto |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|