CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

Advice on multiplying - use a PIC with hardware multiplier?

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
lindsay.wilson.88



Joined: 11 Sep 2024
Posts: 40

View user's profile Send private message

Advice on multiplying - use a PIC with hardware multiplier?
PostPosted: Sat Sep 14, 2024 11:43 am     Reply with quote

Apologies in advance for the long story, but I'd like to (hopefully) make it clear what I'm trying to do.

Short version: The PIC should adjust a stepper motor's position to match an incoming 16-bit value.

Long version: I have a source of 16-bit data, which the PIC is reading in from two of its ports as an unsigned long (0-65535). This is called galvo_position in the code example below. (The source of the data is a laser galvo scanner controller, hence the name.) The PIC also drives a stepper motor via step and direction signals.

The PIC is running a timer loop whose frequency corresponds to the maximum step rate of the motor - for example 3000 times per second. Every loop, it has to read the 16-bit value and multiply it by a scaling factor (for example, 3.510905) to convert it into a desired position in terms of motor steps. This is called desired_step_position and is a long long. It then compares the desired position with the current position. Depending on whether it's bigger, smaller, or the same, it then sets the direction and outputs a single pulse to the stepper motor driver. The PIC obviously has to do all this stuff in one timer loop, preferably in half to give it a bit of leeway.

The net effect of all this is that the steper motor position should track/follow the 16-bit input value. Yes, I'm aware the galvo controller will be able to change its data way faster than the stepper motor can hope to keep up, but it's not going to be doing that, trust me 😁

I'm using a 16F887 running at 18.432MHz.

My main gripe at the moment is trying to do the multiplication as fast as possible. I'm measuring the time each step takes by toggling a pin before and after and looking at it on a scope.

Let's try the "naive" approach of just doing things as floats:

Code:
long galvo_position;
float steps_per_bit_float;
long long desired_step_position;

galvo_position=40000;
steps_per_bit_float=3.510905;

desired_step_position=(float)galvo_position*steps_per_bit_float;


Note, although the example shows both galvo_position and steps_per_bit_float as fixed values, that's just for demonstration. In reality, it would constantly be reading galvo_position from the two ports, and the user is able to adjust the steps_per_bit scaling factor over the serial port.

The last line, where the multiplication happens, takes around 210-270us. If I'm running the loop at say 3000 times a second, I've only got 330us available to fit this in, so it's really pushing it.

Now, let's try the trick of multiplying by a larger int first, then dividing by shifting right:

Code:
long galvo_position;
long steps_per_bit_long;
long long desired_step_position;

galvo_position=40000;
steps_per_bit_long=57522;

desired_step_position=(long long)galvo_position*steps_per_bit_long;
desired_step_position>>=14;


The 57522 comes from multiplying the original scaling factor (3.510905) by 16384, which is 2^14. This is used to multiply the galvo_position by. Then, the result is divided by 16384 by shifting 14 bits to the right.

The long long multiplication takes 120-160us and the bit shift takes 10us, so we're down to 130-170us overall.

Added complication: it would actually make my life easier if I could have a signed output, which slows things down a little, but it's still under 190us.

All this, I can just about live with for now. It leaves 100us or so of leeway within each loop. However, I'd like room for improvement in future - for example, I might want to increase the stepping speed, and I'd then run out of free time with the current arrangement.

Now, I happened to read that better PICs have inbuilt "hardware multipliers". Looking around for a pin-compatible replacement for the 16F887, I found the 18F4520. Besides being able to work up to 40MHz, it has an 8x8 hardware multiplier.

If I used this instead, would it have an improvement on the sort of calculations I'm trying to do? Would it even speed up the float calculation sufficiently that I could just use it directly, rather than messing around with doing integer multiplication then a bit shift?

Does the CCS compiler know to use the hardware multiplier if that particular chip is selected?

Any advice appreciated as always!
PrinceNai



Joined: 31 Oct 2016
Posts: 478
Location: Montenegro

View user's profile Send private message

PostPosted: Sat Sep 14, 2024 12:57 pm     Reply with quote

Hi,

By all means, switch to a faster PIC. There must be a way of doing it with integers. Could you show an example with real world numbers? What you read and where you want the stepper to go?
temtronic



Joined: 01 Jul 2010
Posts: 9221
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Sat Sep 14, 2024 3:55 pm     Reply with quote

You should rethink the math and use 'scaled integers'.
You'll find them 5-10-20x FASTER and more accurate.
search this forum for past postings about it as well as 'googling'.
PrinceNai



Joined: 31 Oct 2016
Posts: 478
Location: Montenegro

View user's profile Send private message

PostPosted: Sat Sep 14, 2024 4:28 pm     Reply with quote

I hope I understood your problem correctly. Compare two numbers, two series of ones and zeroes. Then an action based on that. Float is just an interpretation, but the actual registers don't hold floats. I also don't understand non integer values in most any context. That is what you are dealing with. No amount of scaling will change the actual contents of input data. You just need to find the correlation between input and the desired output. Your 3.5003217 scaling input. Six, seven digits after the whole number. That's 0.00001%. It will be rounded. Only integers allowed. When should the motor move? Tolerances?
PrinceNai



Joined: 31 Oct 2016
Posts: 478
Location: Montenegro

View user's profile Send private message

PostPosted: Sat Sep 14, 2024 5:20 pm     Reply with quote

At odds to be an ignorant. You have a very fast changing input, telling you where the stepper should move. But no one knows compared to what. A kind of unusual scaling factor coming from an even slower serial, which no one knows what it does in terms of stepper motor movement. 3,5 means full step? Half step? What does 3,7 mean? How fast is the change? Also. An obsolete PIC running slow. It seemed as a challenge. Forgive me, it is a joke.
PrinceNai



Joined: 31 Oct 2016
Posts: 478
Location: Montenegro

View user's profile Send private message

PostPosted: Sat Sep 14, 2024 5:32 pm     Reply with quote

I'm afraid AI is attacking. Call me old, call me paranoid. But there is a movie called Strange Days. Where they say: It is not a question whether you are paranoid, the question is if whether you are paranoid enough.

Why I wrote this? The questions and answers from the OP don't add up.
lindsay.wilson.88



Joined: 11 Sep 2024
Posts: 40

View user's profile Send private message

PostPosted: Sat Sep 14, 2024 7:35 pm     Reply with quote

Thank you for the replies. Poking around with the compiler more, I've determined it will make use of hardware multiplication, so I've ordered a faster PIC which has this feature on it. I also found that the _mul() function is about 2x faster, even on my existing PIC, so I'll try and make use of that.

@temtronic - thanks, this is what I was attempting to do by multiplying my 3.5x factor by 16384 first, doing the multiplication with that, then bit-shifting to divide back again. It's unfortunately not as precise, because I'm limited by the need to keep things within a long long size.

@PrinceNai - frankly, I don't understand what you're talking about. I have explained the core of my solution as best as I can in the original post, without going into a lot of uneccessary background detail.

You want to know where I got the 3.5 scaling factor from? Fine. The laser scanner's galvo has a range of 65535 bits. With the particular lens that's on it, it covers 125mm at the focal plane, giving 522 bits/mm. I have a stepper motor with 200 steps per revolution, driven at 8x microstepping. It is connected to a rotary table with 72x gear reduction. Therefore there are 200*8*72=115200 steps per table revolution. The part diameter is 20mm, so has a circumference of 63mm. Divide 115200 steps by 63mm and you get 1828 steps per mm. Finally, divide 1828 by 522 and you get the about 3.5x factor to convert from galvo bits to stepper steps. Satisfied? As you can hopefully see, all of that is completely irrelevant to the problem at hand, and why I did not go in to this level of detail. It boils down to needing to multiply one number by a factor of about 3.5.

"But no one knows compared to what". Huh? Please re-read my original post, where I said "It then compares the desired position with the current position. Depending on whether it's bigger, smaller, or the same, it then ...."

My problem is NOT comparing two numbers. My problem is MULTIPLYING numbers together in a fast, efficient way.

"An example with real world numbers". Again, re-read what I posted. It gives an example.
Ttelmah



Joined: 11 Mar 2010
Posts: 19495

View user's profile Send private message

PostPosted: Sun Sep 15, 2024 1:35 am     Reply with quote

Even on a faster PIC, use the scaled integers. In the help for the compiler
under 'Common questions and answers', 'How much time do maths operations
take'. You will find a table of the times needed for each different maths
operation at different processor speeds. Also shows for different processor
types. Remember also though that if you work with a float, these are less
accurate than a scaled integer, and add the time to convert back to integer
to get a value to actually use. A PIC18, at the same clock rate as your
current processor will do int32 maths about 3* faster than your current
chip. Suddenly you have a margin!...
As you have found there are optimisations like mul. Also though there
are 'sneaky' ones commonly used to optimise something like this. For
example, you can have a union containing an int32 and a structure with
an int8, int16, int8. Then do the multiplication on the int32, and read the
int16. This gives a division by 256, costing basically no time at all (you just
read the int16 result out of the right bytes).
You may actually find it quicker, the do the calculation like this 'unsigned',
but store your own sign bit, and just sign the result.
temtronic



Joined: 01 Jul 2010
Posts: 9221
Location: Greensville,Ontario

View user's profile Send private message

PostPosted: Sun Sep 15, 2024 5:07 am     Reply with quote

re: the mechanical hardware.....
Curious about the stepper to rotary table design. Is it done with two gears or gears and timing belts ? In either case you can( will.) get 'slop' in the drive train that will affect positioning. It'll really show up when you reverse direction.
Also the rotary table itself won't be 100% accurate.
Your posts indicate you need really, really accurate positioning, so just want you to be aware that no matter how good the software is, any 'slop'
in the hardware HAS to be dealt with first.
lindsay.wilson.88



Joined: 11 Sep 2024
Posts: 40

View user's profile Send private message

PostPosted: Sun Sep 15, 2024 8:46 am     Reply with quote

@Ttelmah - Whaaaaat! My brain hurts. Never knew about unions in C, that's really powerful. I'll definitely need to have a look at them - is there a term for this sort of technique? Or is it just plain old sneaky ;-)

I actually found that doing it as a float resulted in better precision than what I could achieve with integers. Bit of an example.

Suppose my scaling factor is 3.5109. The input value is a long, so max. 65535. The output value is a long long, so max 4294967295.

The most I can increase my scaling factor by is 16384, so that the result still fits within the limit of a long long. So the "integer" scaling factor would be 3.5109 * 16384 = 57522. Then after I do the multiplication I can >> 14 to get back to the final value.

If my scaling factor happened to be larger (e.g. it increases for smaller part diameters, or for higher microstepping ratios), then it would be worse - I might only be able to multiply by 8192, and so on.

The achievable precision is limited by how much I can initially multiply the scaling factor by. Hope that makes sense.

Whereas, if I do stick with the float, I'm getting the 6-7 digits precision or whatever it can handle.

If it was possible to do all this with still larger integers, e.g. 48 or 64 bit, it would definitely help, but I don't know how you'd do that. Someone (maybe it was yourself) mentioned writing routines to do arithmetic on larger integers but I can't find it again.
Ttelmah



Joined: 11 Mar 2010
Posts: 19495

View user's profile Send private message

PostPosted: Sun Sep 15, 2024 9:29 am     Reply with quote

Seriously,, just use *256. 0.3%. Far better than you'll ever need.
I think you are over thinking this.
Larger maths libraries are built in once you go to the DsPIC's.

It's well worth understanding that micro stepping is not accurate.
Unless you use special steppers designed for microstepping, they tend
to 'cog' and you can get up to a couple of degrees error in the position.
Even motors specially designed for this will still not give better than about
2% position accuracy. You don't get better accuracy from microstepping
than from the basic motor. Microstepping does give smoother operation,
but does not give increased resolution.
lindsay.wilson.88



Joined: 11 Sep 2024
Posts: 40

View user's profile Send private message

PostPosted: Wed Sep 18, 2024 9:55 am     Reply with quote

Hi - unfortunately, it's a wee bit more complicated yet ;-) 256 would be an acceptable error for a standalone design, but I'm designing the system to do full wraps, where a laser engraver does completely around a part (e.g. [spam] ring blank) and it must align accurately after a full rotation.

At the moment, I'm doing 16384, which (once you work through steps/bits/part diameter etc) finally gives within about 0.01mm after a full rotation, so that's going to work nicely.

I got a new PIC (18F4520) and jacked the speed up to 39MHz. It's now doing the integer calculation stuff in only 20-30us which is fantastic. Even being lazy and just doing float, it's still under 90us or so, still easily usable.

Yeah, microstepping caught me out a few years back when I first got into this! Like most people with these laser engravers, I bought a directly-driven rotary axis, which relied on microstepping to supposedly give the resolution. Turns out they're fine for engraving drinks flasks, but are useless for precision engraving. Like you say, they're really no better than full-step resolution, and the inertia of the huge 3-jaw chuck caused it to bounce after every step. I ditched it and used my Sherline CNC rotary table instead - it has a 72x worm reduction on the stepper motor.

Incidentally, I wonder why they bother going up to microstepping ratios as high as 256x on some drives. For reducing noise/smooth running, I find that just 4x or 8x is quite enough.
Ttelmah



Joined: 11 Mar 2010
Posts: 19495

View user's profile Send private message

PostPosted: Wed Sep 18, 2024 11:44 am     Reply with quote

On your >>14, you can save quite a bit of time by using a union again to
give a 16 bit rotation with no overhead, then just using <<2.
lindsay.wilson.88



Joined: 11 Sep 2024
Posts: 40

View user's profile Send private message

PostPosted: Mon Sep 23, 2024 7:13 pm     Reply with quote

Sorry for the slow reply. I've been having a play with unions and structs - what an incredibly neat concept! I knew about structs, but had never come across unions before. I initially tried the int8-int16-int8 struct like you used, but this limits the size of the input number to 24 bits. I eventually settled on this:

Code:
struct s1{
   int32 a;
   int8 b;
};

struct s2{
   int8 a;
   int32 b;
};

union u1{
   struct s1 s1;
   struct s2 s2;
} myunion1;


Put the number in to myunion.s1.a and read the divided-by-256 value from myunion.s2.b.
Ttelmah



Joined: 11 Mar 2010
Posts: 19495

View user's profile Send private message

PostPosted: Mon Sep 23, 2024 11:06 pm     Reply with quote

Well done. It is a really powerful C trick. Smile
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group