View previous topic :: View next topic |
Author |
Message |
phuongnt
Joined: 25 Apr 2013 Posts: 3
|
Fast Inverse Square Root in CCS |
Posted: Thu Apr 25, 2013 9:58 pm |
|
|
Hi all,
I use the following code to compute fast inverse square root, but it doesn't seem to work, result is always 0. Please help me.
Code: |
float imu_invSqrt(float x) {
unsigned int32 i;
float tmp, y;
i = 0x5F1F1412 - (*(unsigned int32*)&x >> 1);
tmp = *(float*)&i;
y =tmp * (1.69000231f - 0.714158168f * x * tmp * tmp);
return y;
} |
|
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19496
|
|
Posted: Fri Apr 26, 2013 12:51 am |
|
|
I'm guessing you are trying to use the Silicon Graphics function used in Quake?. If so it'll need to be re-written. Different constant, because as shown it is written to use IEEE floats.
Remember that the CCS format is not IEEE, so the constant will need to be recalculated. Note also that the constant you are trying to use, is the original one, but has been improved since, with versions that give better results.
Look for Chris Lomont's paper on how to derive the constant.
Best Wishes |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19496
|
|
Posted: Fri Apr 26, 2013 3:20 am |
|
|
Looking at it, the trick will only work using IEEE values, since it relies on the overflow of the exponent moving into the LSb of the mantissa.
So:
Code: |
//processor defines here
#include <ieeefloat.c> //need this
typedef union {
float fp;
signed int32 i32;
} combiner;
#define APPROX 0x5F375A86 //The first approximation constant
//using the later improved constant
float invroot(float val)
{
combiner i;
float x2;
float threehalfs=1.5;
x2 = val * 0.5F;
i.i32 = f_PICtoIEEE(val);//for the trick to work we need IEEE bit order
i.fp = f_IEEEtoPIC(APPROX - ( i.i32 / 2 ));
i.fp = i.fp * ( threehalfs - ( x2 * i.fp * i.fp ) );
return i.fp;
}
void main()
{
float result;
int16 val;
setup_timer_3(T3_DISABLED | T3_DIV_BY_1);
while(TRUE)
{
for (val=100;val>0;val--)
{
result=invroot(val);
printf("%ld,%2.2f\n",val,result);
}
}
}
|
The time needed to re-organise the bytes for IEEE layout, is only a very few instructions, and I save a lot more than this by avoiding the use of relative addressing (which is very slow on the PIC), by using a union to do the conversions of types.
Works surprisingly well. Takes just over 1900 instructions in total. Nearly four times as fast as 1/sqrt(val). Same result to two DP.
Best Wishes |
|
|
phuongnt
Joined: 25 Apr 2013 Posts: 3
|
|
Posted: Thu May 02, 2013 1:24 am |
|
|
I greatly appreciate your help. |
|
|
|