View previous topic :: View next topic |
Author |
Message |
future
Joined: 14 May 2004 Posts: 330
|
How to handle multiplication overflows? |
Posted: Sun Jun 20, 2004 3:11 pm |
|
|
In my application I need to multiply 4 integers and divide for 1000000..
If in the middle of process there's an overflow I get wrong results.
What I need is if some goes wrong I can set a predefined value.
The multiplication is like:
var1 * var2 * var3 * var4 = ??
----- ------ ------ ------
100 100 100 100
Is there a better approach for this? |
|
|
Neutone
Joined: 08 Sep 2003 Posts: 839 Location: Houston
|
|
Posted: Sun Jun 20, 2004 7:40 pm |
|
|
Use an Int32 to keep from overflowing. It will not overflow when multiplying 4 bytes. I assume you are talking about byte but the term integer is ambigous when refering to code for an 8 bit processor. |
|
|
future
Joined: 14 May 2004 Posts: 330
|
|
Posted: Mon Jun 21, 2004 4:58 am |
|
|
Yes I did cast everything to int32.
The speed of main loop drops by a factor of 4 when using these 32bit math
But the problem is that if the final result is greater than 255, the result which I show in a gauge meter goes crazy.
So I need a way to check is the result is overflowed or not.
I though about testing each division to not be greater than some number, but it has to be a better way |
|
|
SherpaDoug
Joined: 07 Sep 2003 Posts: 1640 Location: Cape Cod Mass USA
|
|
Posted: Mon Jun 21, 2004 7:10 am |
|
|
An int32 can not possibly overflow from the multiplication of four int8s. So if you do your multiplications, then divide, you have to get the right result. Your result could be as large as 4295 so you have to compare to 255 before casting back to an int8. _________________ The search for better is endless. Instead simply find very good and get the job done. |
|
|
future
Joined: 14 May 2004 Posts: 330
|
|
Posted: Mon Jun 21, 2004 10:39 am |
|
|
Ok, so I will do the multiplication, then the division and test if the result is greater than 255, if it is, I will set the result equal 255.
The only thing I was wondering if it is possible to do something to not have to DO the whole multiplication/division to verify if it will overflow or not.
The operation takes a lot of processing time. Before, the main loop was at 17k loop/s, with 2 calculations it is at 2k and there is a lot of code to add. |
|
|
Neutone
Joined: 08 Sep 2003 Posts: 839 Location: Houston
|
|
Posted: Mon Jun 21, 2004 11:51 am |
|
|
Try this, it should cut the time almost in half.
int8 x1,x2,x3,x4;
int16 y1,y2;
int32 z;
y1=x1;
y1*=x2;
y2=x3;
y2*=x4;
z=y1;
z*=y2;
z/=1000000;
The compiler should have a good method to perform 8x8=16 or 16x16=32 math but does not. The 8x8=16 can be done in assembly in just a few instructions. |
|
|
future
Joined: 14 May 2004 Posts: 330
|
|
Posted: Mon Jun 21, 2004 4:56 pm |
|
|
Neutone: Thank you!
I put your code into a function...
Code: | int8 big(int8 x1,int8 x2,int8 x3,int8 x4) {
/*******************************
* result = x1 * x2 * x3 * x4 *
* ---- ---- ---- ---- *
* 100 100 100 100 *
*******************************/
int32 z;
int16 y1,y2;
y1=x1;
y2=x3;
y1*=x2;
y2*=x4;
z=y1;
z*=y2;
z/=1000000;
if(z>255) { return(255); } else { return(z); }
} |
Size shrank about 200 bytes and speed went up from 890 to 1070hz
The function is called 2 times. |
|
|
SteveS
Joined: 27 Oct 2003 Posts: 126
|
|
Posted: Tue Jun 22, 2004 7:23 am |
|
|
If you really need to speed things up you could split the /1e6 into shift right 6 and /15625. This makes the divide routine a 32 by 16 divide, which you will need to write since I don't think it's available directly in CCS. I would think that be tighter code.
Along those lines, try to take advantage of anything that is known or fixed about your calculation. Say, if the input data is restricted in range, you might be able to streamline the calculations somehow. Or, if you can somehow scale your input data so you can divide by a power of 2 (like 2^20= 1048576).
Just a couple of ideas...
- SteveS |
|
|
future
Joined: 14 May 2004 Posts: 330
|
|
Posted: Tue Jun 22, 2004 4:35 pm |
|
|
It is a good idea but...
//z/=1000000;
z>>=6;
z/=15625;
It adds 50bytes and it is a bit slower. Results are the same. |
|
|
SteveS
Joined: 27 Oct 2003 Posts: 126
|
|
Posted: Wed Jun 23, 2004 6:40 am |
|
|
oops, sorry 'bout that. Did you use a 32 by 16 divide routine? If you use the 32 by 32 it would be slower and bigger.
- SteveS |
|
|
Guest
|
|
Posted: Wed Jun 23, 2004 7:04 am |
|
|
I let the compiler use its default routines as I am not good with assembly.
It does not need optimizations for now, the speed is sufficient.
Just one thing that is cool, you write some routine and there is always a way to make it faster.
I will be adding more code and making things slower... |
|
|
SteveS
Joined: 27 Oct 2003 Posts: 126
|
|
Posted: Wed Jun 23, 2004 3:19 pm |
|
|
You can always find something to optimize the code - it just takes time. I remember spending a month or so on about 50 lines of code (a filter) on the first TI DSP (oops I'm dating myself). But each line I took out or optimized made my filter run faster and better. There it was worth it. If you decide you need to tweak your code, post again with as much info as you can - we can probably come up with some ideas.
- Steve |
|
|
SteveS
Joined: 27 Oct 2003 Posts: 126
|
faster divide by 1e6 |
Posted: Thu Jun 24, 2004 1:57 pm |
|
|
Ok, here is a way to speed things up I think.
Basically division is slow (except for powers of 2). So, instead of dividing by 1e6, multiply by 4295/2^32 which is close to /1e6
But, since your four 8-bit numbers, when multiplied together, can fill up 32 bits, you don't have room (in a 32 bit value) to further multiply by 4295. So after getting the 4 bytes multiplied together, divide by 2^16 which means just take the upper two bytes of the result. Now you can multiply that by 4295. Again you need another divide by 2^16 (to get a total 2^32 divide), so take the upper two bytes again. I think this will preserve your accuracy and a quick test showed it to be twice as fast as /1e6.
- SteveS |
|
|
future
Joined: 14 May 2004 Posts: 330
|
|
Posted: Fri Jun 25, 2004 4:48 am |
|
|
In your test, did you use an union to get the upper bytes? |
|
|
SteveS
Joined: 27 Oct 2003 Posts: 126
|
|
Posted: Fri Jun 25, 2004 7:18 am |
|
|
Yes, I did it like this:
Code: |
int8 m0;
int8 m1;
int8 m2;
int8 m3;
int32 quot;
int32 quot2;
int16 proda16;
int16 prodb16;
union
{
int16 w[2];
int32 d;
} val;
union
{
int16 w[2];
int32 d;
} val2;
// set m0, m1, m2 , m3 to whatever
proda16 = (int16)m0 * (int16)m1;
prodb16 = (int16)m2 * (int16)m3;
val.d = (int32)proda16 * (int32)prodb16;
#ifdef STRAIGHT_FORWARD_WAY
quot = val.d/1000000;
// 16 bit answer is val.w[0]
#else // faster way
quot2 = val.w[1];
val2.d = quot2 * 4295;
// 16 bit answer is val.w[1]
#endif
|
BTW, I tested it on an 18F part which has hardware multiply, YMMV on a 16F part.
- SteveS |
|
|
|