Graphics Processing Units Computer Science Essay


GPU ( GRAPHICS PROCESSING UNIT ) is used for 3-D applications. It consists of a individual bit that creates lightning consequence and transforms objects every clip a 3D scene is redrawn. GPU lifts the load from the CPU and frees up the rhythms that can be used for other procedures. In a personal Computer GPU can be present on Video Card or on the Motherboard.

GPU consists of several micro chips which contain particular mathematical operations used in artworks rendering. A A GPU implements a figure of graphicsA primitiveA operations that makes running them much faster than pulling straight to the screen with the host CPU.

GPU is used to play 3D games.

GPU has the capableness of back uping heavy artworks which CPU can non back up.

GPU was foremost introduced in Nvidia Geforce 256 in 1999 and at that clip it was capable of treating a lower limit of 10 million polygons per second and 480M pixels/second of public presentation. The transform and buoy uping intense procedure heighten the Photo-realism and creates a realistic image on the screen.

Get quality help now
Verified writer
4.7 (657)

“ Really polite, and a great writer! Task done as described and better, responded to all my questions promptly too! ”

+84 relevant experts are online
Hire writer

CUDA is the programming linguistic communication of GPU ‘s which helps in Parallel Computing.


A Graphic Processing Unit ( GPU ) besides known as Ocular Processing Unit is designed to portion and rush up the operations, bids and applications executed by the CPU. A GPU can pull strings the undertakings much faster and expeditiously than a general cardinal micro-processor. Super computing machines, game consoles, personal computing machines, work Stationss and nomadic phones consist of GPUs for bring forthing high quality artworks and fast use of informations.

Get to Know The Price Estimate For Your Paper
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

GPUs are non merely used for 3D artworks but they are besides used for implementing algorithms and work outing mathematical jobs.

TIANHE-1A is the fastest ace computing machine in China incorporating 14,336 XeonA X5670 processors and 7,168 Nvidia Tesla M2050 GPU ‘s. TINHE-1A usage Linux operating system. The entire memory size of the system is 262 TBs.


Fig ( 1.1 ) TIANHE-1A Supercomputer

GPUs are fast because they have 100s of nucleuss built inside which helps to treat high quality 3D artworks. Group of 2 or more than 2 GPUs signifier Clusters which are needed to do Supercomputers. A GPU is designed for analogue scheduling through which every direction is sent to different processors at a same clip for fast executing. It is besides called multi threaded processing unit. The term ‘multi yarn ‘ means that one or more than one direction can be processed and executed at a same blink of an eye of clip through parallel scheduling. CPUs on the other manus have several consecutive nucleuss where as GPUs consist of 100s of parallel nucleuss. The parallel processing of GPU ‘s makes them much faster than CPU ‘s. Modern GPUs run 10,000s of togss at the same time.


Inner View of CPU:

A Central Processing Unit ( CPU ) is the encephalon of every computing machine. It executes, solves, manages and controls all computations, jobs, instructions and input/output devices of the computing machine. A CPU is composed of different constituents but some of the chief constituents are control unit, arithmetic and logical unit ( ALU ) , read merely memory ( ROM ) , random entree memory ( RAM ) and cache memory. Cache memory does non be physically like other constituents of the CPU but still it plays a really of import function in the public presentation of the CPU. Cache memory is virtually situated between the processor and chief memory of the system. Processor first sends the petition for salvaging or recovering the information to the cache memory. If informations is non found or saved in the cache than this procedure is called cache girl and if informations is found or saved successfully by cache than this procedure is called cache hit. Cache miss enables the processor to direct the several petition to the chief memory of the CPU. More cache memory size, faster will be the velocity of the computing machine. So in CPU merely one cache memory is located which load the full information and instructions of the computing machine. This affects the public presentation of the CPU.

Inner View of GPU:

A Graphic Processing Unit ( GPU ) boosts up the velocity and public presentation of the computing machine. It computes the instructions parallel with the processor of the CPU. A CPU has merely one control unit and one cache with four ALU ‘s where as a GPU has several nucleuss, each equipped with one control unit, one cache memory and several ALU ‘s. A individual direction or computation is solved by interrupting it into little instructions and so these little instructions are assigned to respective nucleuss of a GPU. Subsequently the consequence of these nucleuss combined and put frontward the solution of the job at a really high velocity. This type of processing is called parallel processing. ( Fig 1.2 ) shows the skeleton position of CPU and a GPU.

A GPU have more transistors than a CPU. Programmers use ANSYS Language besides known as ‘CUDA ‘ for GPU scheduling. CUDA is the chief linguistic communication which is used for GPU scheduling. GPU ‘s can be used in the production of Heterogeneous Systems. Heterogeneous Systems are those systems in which GPU is combined with a CPU. This type of systems serves as both a programmable artworks processor and a scalable analogue calculating platform.

Two or more GPU ‘s can unite to work out many big computations into seconds. It provides real-time ocular interaction with computed objects via artworks images, and pictures.

Modern computing machines and laptops consist of build inn GPU ‘s but these GPU ‘s are much slower than those which are build on picture cards. GPU fetch the in writing undertakings from the CPU to pull strings them expeditiously and direct them back to the CPU. A GPU performs 3D operations which enable it to bring forth high artworks for 3D games and 3D rendition.

The first GPU was introduced by NVIDIA in 1999 which was built on a individual bit processor. It offered a notably big springs in 3D gaming public presentation. GeForce 256 at that clip had the capableness of back uping 3D artworks. GeForce provided up to a 50 % or greater betterment in frame rate. Subsequently in 2002 ATI engineering introduced Radeon 9700 and changed in writing processing unit ( GPU ) to ocular processing unit ( VPU ) .

GPU Architecture/Pipeline:

Modern GPU ‘s plants ab initio by C/C++ sentence structure plans. The 3D-images are compiled and produced by traveling through a peculiar grapevine. Following is the architectural design of a GPU:

3D Application / Program

API Commands


Central processing unit

GPU data/ bids


Assembled Polygons, Lines & A ; Points

& A ; Point

Pixel Location Stream

Vertex Index Stream


Rasterization Interpolation

Raster Operations

Primitive Assembly

GPU Front End

Pre transformed Vertexs


Transformed Vertexs

Transformed Fragments

Pre transformed Fragments

Pixel Updates

Programmable Fragment Processor

Frame Buffer

Programmable Vertex Processor

( FIG 1.3 )


In this phase of GPU grapevine there are applications or plans input by the user to the CPU. The application or the plan is in C/C++ sentence structure. The CPU compiles and executes the application or plan of the user. If any mistake or bug is detected during the digest and executing stage, so the application arrests and closed by the CPU. Otherwise the application is compiled and executed successfully so Application Program Interface ( API ) bids are transferred to the driver of the CPU.

The sentence structure of the application is given below:


aˆ¦ .

glBegin ( GL_TRIANGLES ) ;

glTexCoord2f ( 2,1 ) ; glVertex3f ( 1,2,1 ) ;

glTexCoord2f ( 3,6 ) ; glVertex3f ( -3, -3,1 ) ;

glTexCoord2f ( 2,2 ) ; glVertex3f ( 4, -2,3 ) ;

glEnd ( ) ;



Driver of the CPU converts the Application Program interface ( API ) commands into GPU informations and bids. The Input buffer sends the plan or application of the user to the driver of the CPU. Then it converts the application commands into GPU plan bids. These GPU plan bids are valid for farther executing and procedures. The GPU informations or bids are so transferred to the front terminal of the GPU.


01001001100aˆ¦ .

( FIG 1.4 )

GPU Front End:

GPU front terminal manipulates the API commands. This organizes the information of the application in a sequence so that the vertices of different images can be produced and execute easy. It receives the informations and bids from the driver of the CPU. The information is farther sent to the Programmable Vertex Processor to change over pre-transformed vertices into transformed vertices.PCI ( Peripheral Component Interconnect ) express is besides used at this phase.


01001001100aˆ¦ .

( FIG 1.5 )

Programmable Vertex Processor:

Programmable vertex processor is programmed to make and form the vertices of the application. Every form in the application plan is composed of many different vertices. A specific plan concepts and designs each vertex which is used to do the needed forms of the application. Than these vertices are sent to the vertex processor, where texture and shader generate artworks of each and every vertex. Now transformed vertices are sent to Primitive Assembly phase of the GPU.

( FIG 1.6 )

Crude Assembly:

Crude Assembly arranges and compiles the generated vertices into points, lines and polygons. This stage of GPU transform different types of geometrical forms utilizing points, lines and polygons for the application. These forms or images are merely the abstract design of the given application. This stage merely combines and assembles different random vertices to bring forth geometrical skeleton designed forms, trigons or other primitives. It links one component to another to do the needed images for the application. The attendant stuff is so sent to Rasterizer and Interpolation phase.

( FIG 1.7 )

Rasterizer and Interpolation:

Rasterizer and Interpolation receives the assembled lines, points and trigons from the crude assembly program. This stage determines the specific country of the primitive assembled forms with the aid of Barycentric Coordinates. Barycentric Coordinates are those co-ordinates in which the place of a point is defined as the Centre of multitudes assign at the vertices of the forms. This co-ordinate system besides resembles to homogeneous co-ordinate system. Interpolation about determines the maps by utilizing the values between the points of the retransformed fragments.

( FIG 1.8 )

Programmable Fragment Processor:

Programmable Fragment Processor obtains the forms and maps from the Rasterizer and Interpolation. This portion of GPU plans the fragments of the images by utilizing the trigons and maps made in the cherished phase. To do the whole image or image of the application, foremost many little fragments of that image are formed. The plan concepts and designs the fragments. These fragments are than shaded and textured. Each fragment is filled with the designated colour. Fully transformed fragments are sent to Raster Operation province.

( FIG 1.9 )

Raster Operationss:

Raster operations put together the fragments to bring forth the concluding form of the application. This province of GPU besides checks the frame buffer. If the completed concluding image is composed of high pels and texture, which the frame buffer is non eligible for this type of image than Raster operation lowers the pels and texture so that the frame buffer can easy back up the image.

( FIG 2.0 )

Frame Buffer:

Frame Buffer gives the concluding touching to the image and signal the image to the I/O devices to for expounding of the manufactured 3D images of the application..

( FIG 2.1 )



Developers foremost tried to make parallel calculating with the aid of GPU. In the first stage developers were limited to utilize the map of some hardware such as buffering and reasterization but when the shaders appeared, they were able to speed up matrix computation. This effort was named as GPGPU ( General Purpose Computing on GPU ) .


CUDA aka ( Computer Unified Device Architecture ) is a programming linguistic communication for GPU ‘s.

CUDA was foremost introduced in NOV 2006.Cuda is a package hardware calculating architecture through CUDA we can entree to GPU direction and control centre of video memory for parallel computing.Cuda is specially used for parallel calculating with Nvidia GPU. CUDA contains the simple C scheduling linguistic communication. Through CUDA we can cipher the jobs and the operation by utilizing libraries e.g FFT and BLAS. Through CUDA linguistic communication we can optimise the informations transportation rate between GPU ‘s and CPU ‘s.

CUDA is capable of running in both 32-bit and 64 spot runing system which means that it is supported by both Windowss and LINUX and even by MACOS X.

CUDA besides requires tools for implementing the given instructions. NVCC compiler is used for CUDA linguistic communication compilation.

( FIG 2.2 )

This is a position of treating phases in a in writing grapevine. Triangles are foremost generated by the geometry unit and moves to the following stage where pels are generated by the raster unit and displayed on the screen.

In this illustration the two vectors X and Y are added and so ensue will be shown on the screen. The pel shader calculates the colour of each pel and the figure is reasterized.The informations is first taken and the given informations is read by the plan, it so calculated the given informations. The consequence is given to end product buffer.

In this illustration the vectors are added the Pixel shader is merely able to enter these expressions with C like sentence structure. CUDA application interface is based on standard C linguistic communication. CUDA gives us entree to 16KB of memory which increases the transportation rate between system and video memory.


CUDA is the present linguistic communication used for the GPU ‘s. The old GPGPU method did non utilize vertex shader units in old non-unified architecture. The information which were given stored in texture and end product was given to the screen buffer. The Hardware characteristics were non wholly accessible by GPGPU method. The new manner of GPU computer science does n’t utilize artworks APIs. CUDA gives entree to 16KB memory which can be accessed by yarn blocks. CUDA is used for Linear algebra and image processing filters and allows cache the most often used informations and supply higher public presentation. CUDA provides optimized informations exchange rate between CPU and GPU.CUDA besides offers assembly program to acquire entree to different scheduling linguistic communication.

CUDA consists of two APIs

1 ) . High Level API

2 ) . Low degree API

( FIG 2.3 )

High LEVEL API ( CUDA RUNTIME API ) remains at the top of the Low degree API ( CUDA driver API ) .the procedure can non be done at the same clip which means that one API can run at the same clip they can non run in analogue. The direction are translated into simpler linguistic communication instructions and processed by the CUDA driver API.


( FIG 2.4 )

GPU consists of several bunchs. Each bunch has a big block of texture fetch units and 2-3 cyclosis multiprocessor which has a entire 8 calculating units and two ace functional units. SIMD ( individual Instruction multiple informations ) rule is used for the executing of direction. A group of 32 togss are working at the same clip. Processors are arranged parallel to one another. The best characteristic of GPU is that it is Power efficient. It has more transportation and treating rate than the CPU. The processors works harmonizing to the undertakings given to the GPU. If the declaration is low and a game is running on 800×600 declarations so the burden on the processors will be less. If the Resolution is 1680×1050 so the burden on the GPU is dual and its processors will work fast to demo the needed artworks on the show.

The executing method is called Single direction Multiple yarn ( SIMT ) . A sum of 16KB shared memory is available to each multiprocessor. For the exchange of informations between togss of a individual block shared memory is used because it allows the informations to be transferred. Multiprocessor are able to entree video memory but it involves high latencies and worse throughput. Multiprocessor does non fit with the multi-core processor, it is designed for those operations back uping up to 32 deflection. The rhythms are selected by the hardware and so executed.CPU nucleuss has the capableness of put to deathing merely one plan at a clip but GPU can put to death more than one plan at a time.GPU is able to treat thousand of togss at the same time. The Present GPU ‘s have the capableness of running 3D Artworks with Full resolutions.GPUs are high latency and high throughput processors.

High throughput means that the processors must treat 1000000s of pels in a individual frame. GPU ‘s usage user watercourse processing to accomplish high throughput. They are designed to work out jobs that tolerate high latencies in this instance lower cache is required.GPU does n’t necessitate big cache they dedicate more of the transistors country to calculation HP.

( FIG 2.4 )

The above diagram shows the Compiler Stages of CUDA application


CUDA scheduling makes the GPU ‘s more utile so they are. It gives the characteristics of High public presentation Computing and makes it much simpler and adds a batch of new characteristics which involves the shared memory, thread syncing, dual preciseness and whole number operation. CUDA is the best method for increasing the public presentation of the GPU ‘s.

Every coder can utilize CUDA for parallel Computing. If the algorithm lucifers good and runs with parallel so the consequence will be surprising. Parallel Computing raises the Performance of the GPU. CPU ‘s can non vie with the GPU ‘s. GPU ‘s can non work like the CPU ‘s. Developers are working twenty-four hours and dark to do GPU ‘s stronger twenty-four hours by twenty-four hours. CPU ‘s are non that fast as compared to GPU ‘s and they are non much capable of treating fast like CPU ‘s. GPU ‘s are now seeking to travel towards CPU ‘s and same is the instance with CPU ‘s they are besides seeking to go more parallel.

Cite this page

Graphics Processing Units Computer Science Essay. (2020, Jun 02). Retrieved from

Graphics Processing Units Computer Science Essay

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment