[Fpga-synth] Terminology Question
Scott Gravenhorst
music.maker at gte.net
Thu Jun 7 18:53:28 CEST 2007
"The making of synthesizers in FPGAs." wrote:
>Scott Gravenhorst wrote:
>>
>> 1) Why is this called "pipelining"? I'm interested in a historical
>> perspective if there is one.
>>
>> 2) I see WebPACK telling me that the "performance of muliplier XYZ can be
>> improved by using pipelining". However, I don't understand what is meant
>> by "improved performance" when if the multipliers are strung together, the
>> result can actually be available sooner than if the multipliers are all
>> cascaded through pipelining.
>
>Pipelining is used on combinatorial paths with large delay to cut up the
>delay into a few (or many) shorter delays. This allows the overall
>through-put to increase.
Yeah, I get this - all up to the "overall through-put increase". If the logic were pure
combinatorial, it would actually finish sooner. So this really really depends on the
rest of the system and what else needs clocks and how fast those clocks are. But I see
that one can pretty easily scale the performance up to the point where at least one stage
takes too long. Where I don't see through-put increase is in the fact that each pipeline
"wastes" a bit of it's clock tick because the event is valid somewhat before the clock
edge. The way I look at this is that there is a sum of those "waiting for nothing
picoseconds" which would be a significant delay over a pure combinatorial chain of the
same logic. Again, I do understand that this example is extremely simple and disregards
other things that may be happening in the same design. I think pipelining becomes
paramount when you need to share resources and in systems that use multiple clocks.
In fact, I had such an issue with the GateMan project wherein I got timing constraint
violations because of a chain of combinatorial multipliers - the project worked because a
state machine was dividing the system clock and the state machine was responsible for
clocking the output of this chain. So while I had the timing figured correctly in my
head, I wasn't aware that ISE didn't really understand what I did so it flagged the chain
as "too long" even though it really wasn't. I pipelined to fix the timing constraint
errors rather than declare the path multicycle (because that was looking like a real PITA
to me, a noob). Not sure which would have been better because there will be no
requirement or desire to ever increase the system clock. Most of the project was really
bound by the system clock qualified by the enable signal from the DAC module. Pipelines
began to make more sense to me as I began sharing resources.
>Take for example a combinatorial logic function that takes 1uS to
>complete - with the clock at 1MHz you get 1 result every microsecond.
>
>Now divide the logic function into four stages (this isn't always
>easy!), each of which has a delay of ~250nS. Now the clock can run at
>4MHz and you get 4 results per microsecond.
>
>My first exposure to this was in the design of microprogrammed CPUs.
>The path from address fetch, to instruction decode, to ALU operation can
>all run in a single tick but it will be slow. Instead you can put a
>register after each stage and make it run much faster.
>
>I don't know the origin of the term but would guess it probably started
>in CPU design at one of the big iron mainframe manufacturers like Univac
>or IBM. Engineers often aren't the best at giving things names so I'm
>not surprised the term is a little ambiguous, but basically the term is
>acccurate. Consider you're not putting water into the pipe, but
>discrete sized things like ping-pong balls. At each tick you push a ball
>into the pipe, and one comes out the other end. The length of the pipe
>determines the number of balls or equivalently the number of pipe-stages.
>
>With regard to the Xilinx multipliers, if your design already registers
>the data going into the multiplier, and then registers the result as it
>pops out, then the design is effectively already pipelined - you
>minimized the amount of combinatorial logic between register stages. If
>on the other hand you had an adder that fed the input of the multiplier
> and the output of the multiplier fed another adder before the result
>was registered, then you could make the design faster by putting
>pipeline registers like this: reg->adder->reg->mult->reg->adder->reg.
>The best performance will be to use the dedicated registers that are
>built into the input and output of the multipliers (if they're present).
So I have read in their documentation and I do try to do this - I also hope that this is
automagically inferred by ISE.
-- ScottG
-------------------------------------------------------------
-- Scott Gravenhorst
-- GateMan I - Xilinx Spartan-3E Based MIDI Synthesizer
-- FatMan: home1.gte.net/res0658s/fatman/
-- NonFatMan: home1.gte.net/res0658s/electronics/
-- When the going gets tough, the tough use the command line.
More information about the Fpga-synth
mailing list