[Fpga-synth] Terminology Question
Dave Manley
dlmanley at sonic.net
Thu Jun 7 18:07:34 CEST 2007
Scott Gravenhorst wrote:
>
> 1) Why is this called "pipelining"? I'm interested in a historical
> perspective if there is one.
>
> 2) I see WebPACK telling me that the "performance of muliplier XYZ can be
> improved by using pipelining". However, I don't understand what is meant
> by "improved performance" when if the multipliers are strung together, the
> result can actually be available sooner than if the multipliers are all
> cascaded through pipelining.
Pipelining is used on combinatorial paths with large delay to cut up the
delay into a few (or many) shorter delays. This allows the overall
through-put to increase.
Take for example a combinatorial logic function that takes 1uS to
complete - with the clock at 1MHz you get 1 result every microsecond.
Now divide the logic function into four stages (this isn't always
easy!), each of which has a delay of ~250nS. Now the clock can run at
4MHz and you get 4 results per microsecond.
My first exposure to this was in the design of microprogrammed CPUs.
The path from address fetch, to instruction decode, to ALU operation can
all run in a single tick but it will be slow. Instead you can put a
register after each stage and make it run much faster.
I don't know the origin of the term but would guess it probably started
in CPU design at one of the big iron mainframe manufacturers like Univac
or IBM. Engineers often aren't the best at giving things names so I'm
not surprised the term is a little ambiguous, but basically the term is
acccurate. Consider you're not putting water into the pipe, but
discrete sized things like ping-pong balls. At each tick you push a ball
into the pipe, and one comes out the other end. The length of the pipe
determines the number of balls or equivalently the number of pipe-stages.
With regard to the Xilinx multipliers, if your design already registers
the data going into the multiplier, and then registers the result as it
pops out, then the design is effectively already pipelined - you
minimized the amount of combinatorial logic between register stages. If
on the other hand you had an adder that fed the input of the multiplier
and the output of the multiplier fed another adder before the result
was registered, then you could make the design faster by putting
pipeline registers like this: reg->adder->reg->mult->reg->adder->reg.
The best performance will be to use the dedicated registers that are
built into the input and output of the multipliers (if they're present).
HTH,
Dave
More information about the Fpga-synth
mailing list