Fault Tolerance Design Techniques For Parallel Matrix Vector Multiplications

Categories: MathScience

Abstract

Matrix vector multiplication is the process of multiplying a matrix with a vector. Multiplying matrices in parallel with a common input vector is Parallel Matrix Processing, and is most often performed in Modern digital signal Processing and Digital Communication Systems. In this paper Fault Tolerant Design techniques are proposed for Parallel Matrix Vector Multiplications. The Techniques combines ideas from error correction codes with the self checking capability of Matrix Vector Multiplication.

Hamming code is used for single error correction and Bose ChaudhariHocquenghem code (BCH) is used for Double error correction.

Synthesis results show that the proposed techniques can significantly reduce the overheads. Hence the proposed technique reduces the cost of providing fault Tolerance. The proposed design is synthesized in Xilinx 14.5

Introduction

Matrix vector multiplication is the process of multiplying a matrix with vector and it is one of the typical computation performed in domain transformation, projection or feature extraction. Parallel computing is done to improve the performance. Parallel Matrix Vector Processing is multiplying matrices in parallel with a common input vector, and its application is for precoding in multiuser multi in multi out systems and low density parity check decoders.

Parallel processing may run on Ᾱ large number of processing units e.g.: gross processing units or distributed systems and experiments show that soft errors can effect the outputs[10,11].

Get quality help now
Dr. Karlyna PhD
Dr. Karlyna PhD
checked Verified writer

Proficient in: Math

star star star star 4.7 (235)

“ Amazing writer! I am really satisfied with her work. An excellent price as well. ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

The number of Look Up Tables required for any design is interesting and had been calculated in [3]. Fault tolerance for the matrix processing has widely been studied[11-17], algorithm based fault tolerance (ABFT) techniques for matrix – matrix multiplication is focused in [11-14].

In ABFT technique the two matrices considered are converted in to row checksum and column check sum matrices, then fault detection and correction is done by constructing Ᾱ full check sum matrix.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

A generalized method for Algorithm Based Fault Tolerance Techniques. Algorithm Based Fault Tolerant

Technique detects and corrects single error. Since the input vector can’t be converted in to row check sum algorithm based fault tolerant technique can’t detect and correct errors in matrix – vector multiplication is proposed in [15]. Algorithm based fault tolerant technique proposed is used to locate the fault elements in matrix vector multiplication[16-17].Error correction is achieved with partial recomputation

Hence the scheme introduces some delay which can’t be tolerated in real time applications, such as high throughput communication receivers where inputs are taken from Analog to Digital converters in real time. A general model to protect linear dynamic systems. Part of the model can be extracted as a part of matrix vector multiplication in [18-19]. A method to protect parallel digital filters[20]. The protection is achieved by considering each filter as a bit in a error correction code (ECC) hence parity check filters are added which help to detect and correct errors.

Redundant rows are added are added to the processing matrix by combination of the original rows of the matrix and the relations among the outputs are used to detect and correct errors[18-20]. This method is limited to protection of single matrix vector multiplication. Parallel processing is more attractive these days as it computes in very less time. protection of parallel fast Fourier transform operations on various input vectors is considered[21]. In this a self checking property of the fast Fourier transform is combined with the error checking capability. Hence the overhead is reduced.

parallel Matrix Vector Multiplications is done by combining the self checking property (checksum) of matrix vector multiplication with the error checking capability[22-23] .In the model a Detection matrix (D) and Sum matrix (Ᾱ) are calculated as redundant matrices for error correction and detection.Detection matrix is generated according to the self checking property of matrix vector multiplication. Sum matrix is just the sum of the entire matrix. As hamming code is capable of detecting and correcting single error, double error detection is not possible. This paper is an extension of [23] where hamming code ,Linear block code [6,3] and Linear Block code [8,4] are used for single error correction and detection. For double error correction BCH code is used.

In this paper section 2 explains the single error protection of matrix vector multiplication and protection of parallel matrix vector multiplications. Section 3 explains double error correction using BCH code. Section 4 gives comparison table of number of Look up Tables (LUT) and Registers considered. Section 5 concludes the paper.

Separate Protection and Parallel MVM’s

In this Section Single error Protection of Matrix Vector Multiplication and Parallel MVM’s is explained through Hamming code.

Single error protection of Matrix Vector Multiplication.

Consider Ᾱ matrix Ᾱ with n rows a̅̅1, a̅̅2, a̅̅3,….. a̅̅n. The size of the matrix is [NXM] matrix .let the vector matrix be represented by u̅ , of size [MX1] with u̅1 , u̅2, u̅3,….., u̅n. let z̅ be the output vector obtained by multiplying input vector u̅ and matrix Ᾱ, and is of size [NX1].s̅1, s̅2, s̅3,…. s̅Rs are the redundant rows generated according to the hamming code by considering each row of the matrix Ᾱ as one bit of the hamming code. These redundant rows are multiplied with the vector and the results obtained are s1, s2, s3,….., sRs respectively. The constraint between the original outputs and the redundant outputs helps for error detection and error correction.

Ap X u̅ = z̅p (1)

In which

a11 a12 … a1N u1

a21 a22 … a2N u2

A = a31 a32 … a3N u̅ = :

…. …. …. … :

aM1 aM2 … aMN un

z1

z2

z = : (2)

zp

For single error correction the relation between N and Rs would be given by

N = f(Rs) = 2Rs-Rs-1. (3)

If the matrix considered has 4 rows then three redundant rows are generated according to (7,4) hamming code[20]. Consider the four rows of input matrix be a̅ 1,a̅2 ,a̅3, a̅4. Then the three redundant rows s̅1, s̅2, s̅3 are generated as

s̅ 1 = a̅ 1+ a̅2+ a̅3

s̅2 = a̅1 + a̅ 2+ a̅4 (4)

s̅3 = a̅1 + a̅ 3+ a̅4

the hamming code is applied here by replacing XOR operation by addition. Hence the error detection can be done by the following equations.

S 1= z 1+ z 2+ z3

S 2= z 1+ z 2+ z4 (5)

S 3= z1 + z 3+ z4

The two sides of (5) are sr = s̅rTu̅ and zn = a̅rTu̅ respectively .

the error between the two sides of (5) is denoted by (p1,p2,p3) According to table 1 where 0 denotes no error and 1 denotes an error. Correction is done by reconstructing the failing output from correct output.

For example , an error on z2 is corrected by making

y2 =s2 – z 1- z4.

P1 P2 P3 ERROR POSITION
000 - - NO ERROR
111 - - Z1
110 - - Z2
101 - - Z3
011 - - Z4
- 010 - S1
- - 001 S2
- 001 010 S3

Fault Tolerance For Parallel Matrix Vector Multiplication:

The basic structure of parallel fault tolerance design for parallel matrix vector multiplications is shown in fig 2, where A1, A2, A3,…., Ap are original matrices considered. The Detection matrix and sum matrix are the redundant modules generated through basic operations on the original matrices. Detection matrix is used for error detection and sum matrix is used for error correction. From figure 2 z̅1, z̅2, z̅3,…., z̅p are the output vectors obtained by multiplying matrices

A1, A2,A3,…, Ap with input vector u̅. S̅ is the redundant output vector obtained by multiplying detection matrix with vector matrix.Z̅ is the redundant output vector obtained by multiplying sum matrix with vector matrix. By comparing p outputs with the redundant output vector S̅ the faulty vector can be detected. Correction is done by subtracting the correct vectors from z̅..

Implementation of the matrix vector multiplication:

The implementation structure of matrix vector multiplication is shown in fig 3 . For each cycle one column of matrix is read in to registers and is multiplied with corresponding element of vector matrix . before the counter reaches M port 1 of each counter is selected so that result is accumulated, once the counter reaches M an [NX1] vector is generated. The input vector size considered is [20X30] and vector matrix size is [30X1] the elements of both the vector matrix and input matrix are of 8 bit size . the elements of the output vector are of 21 bit integers. For an matrix vector multiplication between an [MXN] matrix and [NX1] vector N adders and N multipliers are needed.

Generation of the detection matrix and Sum Matrix:

The number of rows of detection matrix D(Rp) depend on the number of parallel matrices considered (p) , the relation between p and Rp is given by

P = f(Rp) = 2Rp - Rp – 1

the sum matrix is just the direct sum of the original matrix given by [22].

The detection matrix is generated according to the hamming code. First checksum row for each matrix is calculated . in checksum matrix each element is calculated by the sum of the elements of the entire column. the rows of the detection matrix are calculated according to hamming code where each original matrix is considered as a single bit.

Let us assume p = 4 then the checksum row for each matrix is generated as follows[22].

From the checksum matrix the rows of the detection matrix are generated according to the following equations (7,4) hamming code

d̅1 = c̅ 1+ c̅ 2+ c̅3

d̅2 = c̅1 + c̅ 2+ c̅4 (9)

d̅3 = c̅ 1+ c̅ 3+ c̅4

then the error detection matrix can be defined as

D = [d̅1d̅2d̅3]T € Z3XM (10)

Error Correction and Detection

The results of detection matrix is given by

s̅ = Du̅ = [d̅1 , d̅1 , d̅1]T (11)

According to (8-11) if there are no errors then

s1 = sum(z̅1 + z̅2+ z̅3)

s2 = sum(z̅1 + z̅2 + z̅4) (12)

s3 = sum(z̅1 + z̅3 + z̅4)

sum(x) performs addition of all items with in vector x and

z̅ = z̅1 + z̅ 2+ z̅ 3+ z̅4 (13)

if any of the equation fails then there will be an error .the error can be detected according to table 1 and the values of (p1,p2,p3) are given by

p1 s1 – sum(z̅̅ 1+ z̅̅ 2 + z̅̅3 )

p2 = s2– sum(z̅̅1 + z̅̅ 2 + z̅̅4 ) (14)

p3 s3– sum(z̅̅ 1+ z̅̅ 3 + z̅̅ 4 )

once the faulty output vector is detected then it can be recovered by subtracting the correct output vectors from z̅, where z̅ is given by

for example if z̅1 is corrupted then it can be recovered by

y̅1 = z̅ - (z̅2 + z̅3 + z̅4)

in a similar way z̅2 ,z̅3, z̅4 can be recovered by

y̅2 = z̅ - (z̅1+ z̅3 + z̅4)

y̅3 = z̅ - (z̅1 + z̅2 + z̅4)

y̅4 = z̅ - (z̅1 + z̅2 + z̅3)

since the error detection is done on the sum of output vectors and as the whole word can be reconstructed hence multiple errors in Ᾱ single output vector can be detected.

Implementation of the Separate protection

For [20X30] matrix protection, according to [20,25] hamming code 5 redundant rows are generated and then matrix vector multiplication between [25X30] matrix and [30X1] vector is implemented according to figure 3.

The five redundant rows [s̅0, s̅1, s̅2, s̅3, s̅4] are generated as

s̅ 0= a̅0 +a̅ 5+ a̅9 + a̅12 + a̅ 14+a̅15 + a̅ 18+ a̅19

s̅ 1= a̅ 1+a̅ 5+ a̅6 + a̅10 + a̅13 +a̅15 + a̅16 + a̅18

s̅ 2= a̅ 2+a̅ 6+ a̅ 7+ a̅9 + a̅11 +a̅15 + a̅16 + a̅17 (15)

s̅ 3= a̅ 3+a̅ 7+ a̅8 + a̅10 + a̅12 +a̅16 + a̅17 + a̅19

s̅4 = a̅ 4+a̅ 8+ a̅ 11+ a̅ 13+ a̅14 +a̅ 17+ a̅ 18+ a̅19

the elements in each vector are of 11 bit integers.

The fault detection is achieved by the following equations. z0 - z19 are 21 bit integers and s0 – s14 are 24 bit integers.

For example if the fourth and fifth equations do not hold and the remaining three equations hold then we know that z8 is wrong.

s 0= z 0+ z 5+ z 9+ z 12+ z 14+ z 15+ z 18+ z19

s 1= z 1+ z 5+ z 6+ z 10+ z 13+ z 15+ z 16+ z18

s 2= z 2+ z 6+ z 7+ z 9+ z 11+ z15 + z16 + z17 (16)

s 3= z 3+ z7 + z 8+ z 10+ z12 + z 16+ z 17+ z19

s 4= z 4+ z 8+ z 11+ z 13+ z14+ z 17+ z 18+ z19

The error correction is done by subtracting all other items from s3 according to the fourth equation in (16). Five such structures are needed to correct any of the 20 items in the result vector.

The correction procedure is implemented using the structure shown in fig 5 . port 1 is selected for z8 port 0 is selected for all others for the recovery of z8.

Single Error Correction Using [6,3] and [8,4] Linear Block Code:

In this Section Single Error Protection of MVM’s is done in a similar way as done using hamming code, where the relations between the matrices are different in both the cases, and comparision is done for the number of LUT’s required using hamming code ,[6,3] linear block code and [8,4] Linear Block Code.

Fault Correction and Detection of an MVM using [6,3] Linear Block Code:

Error Correction and Detection in this method is done by considering three matrices ,and three rows of detection matrix are generated .consider the three matrices taken are a1,a2,a3. The three redundant rows generated are r1,r2,r3. The redundant rows are given by

r1 = a2 + a3

r2 = a1 + a3

r3 = a1 + a2

the outputs Obtained by multiplying the matrices with vector matrix are y1,y2,y3 and the outputs obtained after multiplying the redundant rows with vector matrix are p1,p2,p3. Then error detection is done by checking the following equations

p1 = y2 + y3

p2 = y1 + y3

p3 = y1 + y2

If the First two equations fail then there will be an error in y3,and if last two equations fail then y1 fails and first and last equation fails then y2 fails. After Detection Error Correction is done.

Fault Correction and Detection of an MVM using [8,4] Linear Block Code:

Error Correction and Detection in this method is done by considering Four matrices ,and Four rows of detection matrix are generated .consider the three matrices taken are a1,a2,a3,a4. The four redundant rows generated are r1,r2,r3, r4. Redundant rows are given by

r1 = a1 + a2 + a4

r2 = a1 + a2 + a3

r3 = a1 + a3 + a4

r4 = a2 + a3 + a4

the outputs Obtained by multiplying the matrices with vector matrix are y1,y2,y3,y4 and the outputs obtained after multiplying the redundant rows with vector matrix are p1,p2,p3,p4. Then error detection is done by checking the following equations

p1 = y1 + y2 + y4

p2 = y1 + y2 + y3

p3 = y1 + y3 + y4

p4 = y2 + y3 + y4

If the First three equations fail then there will be an error in y1,and if last three equations fail then y3 fails and first two equations and last equation fails then y2 fails. If last two equations and first equation fails then y4 fails.After Detection Error Correction is done.

Fault tolerance using BCH code:

BCH codes are very powerful random error correcting cyclic codes.

Discovered by hocquenghem in 1959 and independently by bose and chaudhari in 1960.

When Ᾱ (n,k) BCH code is considered it has the following parameters

Block length n = 2m – 1

Number of check bits : n – k < mt

Minimum distance dmin 2t + 1

t < (2m – 1) / 2 random errors detected and corrected.

Hence it is also called as t error correcting BCH code.

Encoding of BCH code : Matrix vector multiplication between Ᾱ [20X 30] matrix and Ᾱ vector [30X1] is done according to the figure 3 where each element in the matrix is of 2 bit and each element in vector matrix is of one bit in size .

The elements in the resultant vector is of 7 bit in size . hence these 7 bit vectors are converted to code words of 15 bit size by multiplying with the generator polynomial of 9 bit size here considered galois field (24) and (15 , 7) Bch code which is capable of correcting two errors. The generator polynomial is given by (111010001).

The code word is generated by multiplying information bits with generator polynomial and final code word is generated by xor operation of partial products.

Decoding of BCH codes: Once the code word is received it is considered as s1( ) and then s3() is calculated.

Calculation of S3() : For example consider S1() as Then S3( is given as After calculation of S3( , both S1( and S3(are simplified according to addition of galois fields (24) given in table.

If both S1( and S3(are zero then the received code word is error free, else error detection is done by substituting the values of S1( and S3( in the following equation.Division is equivalent to inverse of multiplication.

The multiplication relation between elements of galois field (24) is used. the two roots obtained by solving the polynomial gives the location of error positions in the received codeword.

Hence correction is done by just negating the detected bits in the received codeword.

Conclusion

Ᾱ fault tolerant design for parallel mvms is poposed single error correction is done using hamming code ,linear block code [6,3] and [8,4] where xor operation is replaced by addition .Double error correction is performed by BCH code.The decoding process is complex in BCH codes, hence any other code which can detect and correct two errors with less decoding complexity and reduces the overhead is an interesting topic for future work.

References

  1.  I.S.Haque and V.S.Pande, “Hard data soft errors: Ᾱ large –scale assessment of real –world error rates in GPGPU,” in proc.10th IEEE/ACM INT.Conf.Cluster,Cloud Grid Comput.(CCGrid),May 2010,pp.691-696.
  2. P.Rech,C.Aguiar,C.Frost,and L.Carro,”An efficient and experimentally tuned software- based hardening strategy for matrix multiplication on GPUs,” IEEE Trans. Nucl.Sci.,vol60.no.4,pp.2979-2804,Aug.2013.
  3. K.Rama Devi ,MD.Zaheer,”Design and Functional Verification of Vlsi Based Router 1X 5 Architecture”IJSETR,vol.04,Issue 53,Dec 2015.
Updated: Feb 23, 2024
Cite this page

Fault Tolerance Design Techniques For Parallel Matrix Vector Multiplications. (2024, Feb 13). Retrieved from https://studymoose.com/document/fault-tolerance-design-techniques-for-parallel-matrix-vector-multiplications

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment