Optimizing Floating-Point Multiplication in DSP/Math Processors: An Algorithmic Approach

Categories: Technology

 Abstract

Most widely used operation in DSP/Math processors is Floating point multiplication. Main aim of this multiplier is to implement it effectively to use less combinational delay to achieve high speed because of its vast areas of application. Floating point operations hard to implement on reconfigurable hardware’s because of their complexity of their algorithms. On the other hand, many scientific problems require floating point arithmetic with high level of accuracy in their calculations. In this paper I am going to discuss different types of algorithms which are used for implementation of floating-point multiplier.

Introduction

Although computer arithmetic is sometimes viewed as a specialized part of CPU design, still discrete component designing is also a very important aspect. A tremendous variety of algorithm’s have been proposed for use in floating-point systems. Actual implementations are usually based on refinements and variations of the few basic algorithms presented here. In addition to choosing algorithms for addition, subtraction, multiplication and division, the computer architect must make other choices.

Get quality help now
writer-Charlotte
writer-Charlotte
checked Verified writer

Proficient in: Technology

star star star star 4.7 (348)

“ Amazing as always, gave her a week to finish a big assignment and came through way ahead of time. ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

Our discussion of floating point will focus almost exclusively on the IEEE floating-point standard because of its rapidly increasing acceptance. Our discussion of floating point will take more detailed look at efficient algorithms and architectures. Floating point numbers are one possible way of representing real numbers in binary format; the IEEE 754 standard presents two different floating-point formats, Binary interchange format and Decimal interchange format. The range of floating-point number is from -∞ to +∞ including zero.

Floating-Point Multiplication Algorithm

Multiplying floating point numbers is a critical requirement for DSP applications involving large dynamic range.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

This paper focuses only on single precision normalized binary interchange format.Fig.1 shows the IEEE 754 single precision binary format representation, it consists of a one bit sign (S),an eight bit exponent(E),and a twenty three bit fraction.

Z=(-1S)*2(E-Bias)*(1M) (1)

M=m222-1+m212-2+m202-3+…….+m12-22+m02-23;

Bias=127

The sign bit (S) is the Most Significant Bit (MSB)

where 1 indicates negative number and 0 indicates positive number.

The next 8 bits are exponent (E).The following 23 bits are fraction (F) values. Multiplying two numbers in floating point format is done by:

  1. adding the exponent of the two numbers then subtracting the bias from their result.
  2. multiplying the significant of the two numbers.
  3. calculating the sign by XORing the sigh of the two numbers.

In order to represent the multiplication result as a normalized number there should be 1 in MSB of the result. Floating point implementation on FPGAs has been the interest of many researchers. In an IEEE 754 single precision pipelined floating point multiplier was implemented on multiple FPGAs.

The general format of the normalized number is:

(-1) ^S X 1.F X 2^ (E-127). Where, E range is from -126 to

+127.

Denormalized Form

(-1) ^S × 0.F × 2^ (-126) is the general format of the denormalized form. Where E=0. It is used to represent very small negative and positive values including zero.

Special Values

If E=0, it denotes either ±INFINITE or Not a Number

(NaN). Proposed design supports only for normalized form floating point numbers.

To multiply two floating point numbers following is done

Z=(-1S) *2(E-Bias)*(1M).

  1. Multiplying the significand; i.e. (1.M1*1.M2).
  2. Placing the decimal point in the result.
  3. Adding the exponents; i.e. E1+E2-Bias.
  4. Obtaining the sign; i.e. s1 xor s2.
  5. Normalizing the result; i.e. obtaining 1 at MSB of the results “significand”.
  6. Rounding the result to fit in the available bits.
  7. Checking for underflow/overflow occurrence.

Hardware of Floating-Point Array Multiplier

Multiplying two numbers results in a negative sign number if one of the multiplied numbers is of a negative value. By the aid of a truth table we find that this can be obtained by XORing the sign of two inputs.

Unsigned Adder (For Exponent Addition)

This unsigned adder is responsible for adding the exponent of the first input to the exponent of the second input and subtracting the Bias (127) from the addition result. The result of this stage is called the intermediate exponent. The add operation is done on 8 bits, and there is no need for a quick result because most of the calculation time is spent in the significand multiplication process(multiplying 24 bits by 24 bits); thus we need a moderate exponent adder and a fast significand multiplier.

An 8-bit ripple carry adder is used to add input exponents. As shown in fig. 3 a ripple carry adder is a chain of cascaded full adders and one half adders, each full adder has three inputs(A, B, Ci) and two outputs(S, Co).The carry out(Co) of each adder is fed to the next full adder. The addition process produces an 8 bit sum (S7 to S0) and a carry bit (Co,7). These bits are concatenated to form a 9 bit addition result (S8 to S0) from which the Bias is subtracted. The Bias is subtracted using an array of ripple borrow subtracted.

Half adders can be used to add two one-bit binary numbers. It is also possible to create a logical circuit using multiple full adders to add N-bit binary numbers. Each full adder input a Cin, which is the Cout of the previous adder. This kind of adder is a Ripple Carry Adder, since each carry bit “ripples” to the next full adder. The first (and only the first) full adder may be replaced by a half adder. The block diagram of 4-bit Ripple Carry Adder is shown here below figure.

Unsigned Subtractor

A normal subtractor has three inputs (minuend (S), subtrahend (T), Borrow in (Bi)) and two outputs (Difference (R), Borrow out (Bo)). The subtractor logic can be optimized if one of its inputs is a constant value.

One subtractor:

Binary numbers have only two values “1” and “0”. Therefore, the difference between two binary numbers can only be 1 or 0. Table1 illustrates he binary arithmetic difference values for all combinations of X-Y. It will be recognized that the correspondence shown in the Table1 is identical to a state table for an XOR gate having the values X and Y applied to its input terminals.

The remainder of the circuitry generates the borrow out values. The Borrow output must be a one whenever either the Bout1 or Bout2 values in Table1 are one. A little reflection will convince the reader that a Borrow Output “1” value should occur whenever the Borrow input is greater than X, the subtrahend, Y, is greater than minuend, X, or Y is equal to X and there is a Borrow Input. These conditions are all satisfied by the combination of Bout1 and Bout2 values.

Zero subtractor:

Binary subtractors are typically aliased with binary adder circuits. The minuend is applied to one input port of the binary adder. The subtrahend is complimented and applied to the second input port of the binary adder. The sum output of the adder is the difference between the minuend and subtrahend.

Ripple Borrow subtractor:

The design of a binary subtractor circuit requires considerations like those for binary adder circuits. The present invention is a dedicated single bit subtractor stage for subtracting a binary number Y from a binary number X with provision for applying a Borrow, Bin from a lesser significant subtractor stage against the minuend X and generating a Borrow Out signal Bout. The subtractor stage includes combinations logic for exclusive ORing the input numbers X, Y and Bin, the result of which corresponds to the difference of X minus Y including a borrow input.

Unsigned Multiplier (For Significant Multiplication)

This unit is responsible for multiplying the unsigned significant and placing the decimal point in the multiplication product. The result of significant multiplication will be called the intermediate product (IP). The unsigned significant multiplication is done on 24 bits. A 24x24 bit carry save multiplier architecture is used as it has a moderate speed with a simple architecture. In the carry save multiplier the carry bits are passed diagonally downwards. Partial products are made by ANDing the inputs together and passing them to the appropriate adder.

Carry Save Multiplier has three main stages:

  1. The first stage is an array of Half Adders
  2. The middle stages are arrays of full adders. The number of middle stages is equal to the significand size minus “2”.
  3. The last stage is an array of Ripple Carry Adders. This stage is called the vector merging stage.

Normalizer

The result of the significand multiplication (intermediate product) must be normalized to have a leading “1” just to the left of the decimal point (i.e., in the bit 46 in the intermediate product). Since the inputs are normalized numbers then the intermediate product has the leading one at bit 46 or 47. If the leading one is it 46 then the intermediate product is already a normalized number and no shift is needed. If the leading one is it 47 then the intermediate product is shifted to the right and the exponent is incremented by the product must be normalized such that it has a leading “1” just to the left of decimal point.

Vedic Multiplier

Vedic mathematics is mainly based on 16 sutras.VM is based on the 14th sutra which is Urdhva Tiryakbhyam (Vertically and Crosswise). Here 3x3 block is used as the Basic block. For a 3-bit multiplication the partial products and their summation of multiplication are obtained parallel in Urdhva Tiryakbhyam.

Results and Discussion

Implementing floating-point multipliers using the outlined algorithms and hardware components results in improved computational efficiency and accuracy. The Vedic multiplier, in particular, demonstrates significant advantages in speed and resource utilization over traditional array multipliers, making it a viable option for complex multiplications where system complexity is a concern.

Table 1: Comparison of Multiplier Algorithms

Algorithm Speed Area Complexity
Array Low High Simple
Vedic High Low Complex

Conclusion

This study explores various algorithms for floating-point multiplication, emphasizing the need for high-speed, accurate implementations in DSP and mathematical processors. By leveraging advanced algorithms and optimizing hardware components, it is possible to achieve significant improvements in computational efficiency and precision. The Vedic multiplier, in particular, offers promising prospects for future implementations, given its advantages in speed and resource efficiency. Further research could explore the integration of these algorithms into broader computational architectures to enhance overall system performance.

Updated: Feb 23, 2024
Cite this page

Optimizing Floating-Point Multiplication in DSP/Math Processors: An Algorithmic Approach. (2024, Feb 23). Retrieved from https://studymoose.com/document/optimizing-floating-point-multiplication-in-dsp-math-processors-an-algorithmic-approach

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment