To install StudyMoose App tap and then “Add to Home Screen”
Save to my list
Remove from my list
Due to huge boom in Big Data, Parallel computing technologies have become a must needed technological architecture to deal with analysis of Big Data. This platform architecture (Parallel Computing) discovers useful content easily with different data mining and analyze that data. This term paper is to give only an outline of a latest and promising topic of Parallel Computing Technologies of Big Data. For an individual who is planning to learn more about big data or have an intention to work as the data analyst, this term paper would prove to be enough helpful.
The topic of parallel computing is very huge so the information from the paper couldn’t be enough to complete the topic of parallel computing. The paper starts with the historical aspect of big data than it is about the basis of parallel computing technologies some terminologies related to it and finally its future aspects and development, In the report there is discussion on parallel computer memory architecture followed by parallel programming models and finally, a brief overview on a number of issues and aspects of parallel computing technologies.
The futuristic application and prospects of parallel computing is also discussed.
The term “Big Data” is gaining popularity around for some time now, but then also the idea regarding the Big Data, what it actually means ,its properties and aspects is somewhat not clear. Since the beginning of 80’s there is an idea of huge data storage and since then there has been extensive work regarding this field.
In big data the total data we have produced till date , approximately 90% have been generated in past 5 to 7 years. Today, every year we create much more data than what we would have created in last 50 to 60 years. And the amount of data users creating continues to increase rapidly and will be increasing with an accelerating speed, as in next few decades the amount of digital data present and growing would be exponentially high around at least 50-60 zettabytes.
The term Big data refers to huge chunk of data which could be used by our skills and present day technologies. Also Digitally Big Data may be defined as storage of any important and vital information which must contain the 3 basic V’s (later defined)of Big Data. Big Data Analytics is the procedure to analyse large set of data(Big Data) using any relevant software to reveal some useful information. The data is classified mainly in three types as:
Structure data may be defined as type of information which is arranged and sorted in a proper order. Structure data could be easily processed by a machine by is extremely difficult for an ordinary person to understand. Structure data is easily computed, used or analysed by implementing some of most basic algorithms. Some most basic examples of structured data are: Binary codes(0 or 1), spreadsheets, standard query languages (sql).
In technical language unstructured data is data which couldn’t be processed easily by a machine. Unstructured data has usually more importance on human understanding. It is not based on RDBMS and could only be used or analysed using some deep , complex algorithms. Its examples include Word Documents, any text Message, data generated from social media such as GIF’s, videos etc
Semi-structured data is the data which is not structured or gets operated on in relational data base management system like structured data but contains some tags or other elements which could define hierarch to some extent . It is also called as Self Describing data also. Some of the examples of semi-structured data are: CSV,JSON or even XML. Now to handle and analyse this huge data we need some analytic tools and these tools are known as “ Big Data Analytic Tools”. The analytic tools uses different algorithms to analyses and use data.
The market research firm Gartner has categorized big data analytics tools into mainly four different types:
Velocity variety volume:
There is also some focus on development of other V’s, such as big data’s “veracity” and “value.”
In data parallel there are no of parallel tasks and with least dependency amongst them. Data parallel requires the details of data under consideration (data mapping)and which is an extremely important step. In data parallel the data is programmed with messaging passing. An classic Example of data parallel is: – Build one data base for each state of India simultaneously, when customer data is distributed by customer code.
Since there are plenty of programs in today’s world which are developed to deal with problems of day to day problems like hospitality data, banking , payroll processing etc than these software become more and more vast and. For this problem the solution is programming methodology in which there is analyzing and control of development process of any software. One of the examples is modular programming.
Task Parallel is defined as the processing of a task in a machine on multiple cores which have varying functions .In task parallel the data stored could be in either same or in different data base. In task parallel the problem is divided into sub-topics, all the sub-topics of a task are then processed simultaneously. In task parallel it is much complicated and difficult to balance load. Example Of Task Parallel – Build one data base in parallel for entire India, though customer data is state code.
In Distributed Memory every single processor contains its own temporary memory (local memory) in the main computer in which all parallel processors are connected. In distributed memory system a processor have the ability and property that its memory remains classified. The domain of a processor regarding memory allocation ,edition ,deletion or manipulation is restricted to itself only. Any synchronization can only be done by passing explicit massages between processors.
Examples: Cray T3E, IBM SP2
In shared memory, to a processor only 1 address regarding space is give. In shared memory all processes in parallel computer doesn’t have the domain to gain entry to the hub of shared memory.
Some Examples of Shared Memory are: SGI origin, SunE10000
A connection topology is a schematic way in which different peripheral devices are connected over a network. A computer with each processor linked to every other processor would be an ideal case. But since that would be expensive and complex to handle, so computers or processors are made to be interlinked with such variation of network, such as torus or even hypercube etc because handling the interlinked processors system is not financially viable and very difficult to handle. The main issues involved in most of the networks design are the bandwidth of the network, communication involved and even the network latency. The bandwidth is defined as the total capacity (in Bits) that a specific medium can carry from one point to another. The network latency can be defined as delay that happens in data communication over a network.
It is also known as Direct Dedicated. Nodes connected directly using static point-to-point links. Such networks include: Fully connected networks, Rings, Meshes, Hypercube etc.
In dynamic interconnects there are some switches employed which are there to analyse the dynamic links(virtual circuits) between different nodes. It is entirely different from point-to-point communication and far more ahead and advanced than that. Each node is connected to the specific subset of switches. These are established by configuring switches based on configuration demands.
Hadoop is mainly produced to manage data generated in bulk or Big Data, we have emails, documents, pdf media files etc. Handling such a huge amount of data would be extremely difficult or impossible. It’s the best example of parallelism(parallel computation) as in HADOOP It is a requirement that different types of machines having an entirely different processing system and data required are made to work as a single machine to obtain any desired result ,outcome or output. Whenever we search anything on internet or any web browser then there are hundreds and thousands of million pages are searched and generated in extremely less time. In fact almost over ninety percent of total data which have been generated in last decades are a result of search and analysis. In another words we might be heading towards a data explosion, since data are mainly of 3 types:
Now it’s not difficult to imagine the amount of data created in the world every day. To deal with this problem efficiently many search engines and social networking sites have employed their own HDFS ( Hadoop Distributed File System ). For Example: Google uses its own HDFS such as Google File System and Map Reduce. Facebook has its own and the world’s largest Hadoop clusture and it generates at least 0.5 PETABYTE in 1 day
HDFS stands for Hadoop Distributed File System. It (Hadoop) is nothing more than software and is intended to spread and solve data storage management which is used for really big data sets. Other advantage of HDFS is that it is scalable and is very much tolerable towards faults.
Hadoop basically have of two most important components in it:
How can we write programs that run faster on a multicore pc??
How can we write a program that do not crash on a multicore pc??
The answer is right model!
Explicit Programming gives an essential and important engineering point balancing modularization and segregation in (at least) two cases.
Multithreading – a message-passing program comprises of various operations, every one of which possess its own them of control and may execute different code.
Big Data Analytics in the form of parallel computation has immense application in daily life such as:
Wireless sensor platforms and Big Data would enable predictive analysis at remote location monitoring approx. In every industry. Wireless sensors combined with data analytic tools(which will be able to process the data generated from different sensors) , it could help and would work as a catalyst for artificially intelligence. With gradual collection of data at battle fields, parallel computing techniques can be used to analyse all the strategies, plans and patterns of either chemical, Biological and Radioactive or any foe force detection across war zones as well as vast areas. Applying parallel technologies and harnessing its potential will allow weather fore cast authorities to tackle the un-predictive nature of environment and will a boon for renewable energy.
In today’s world electronics have intruded every aspect of life and superconductors would be a game changer for the world and for superconductivity many aspects are to be considered simultaneously. So a parallel acomputer would be a best choice for the production and future development in super conductors.
The further work and development of technologies in big data and its analysis would bring a sudden boom and many new forms of industries. As after the industrial revolution world saw the immense change and the development of countries and technology was increased by many folds with a rapid pace similarly if world is able to harness the concept and basic idea behind big data it would revolutionize world with greater consequences as it would help to create a digital economy which would be able to compete all other forms of economies combined. Though there are many deficiencies in present day world to safeguard, analyses, store and process data generated from different parts and methods by present day technology. Since big data involves dealing with data so many developers and other user have an impression of it as a BACKEND work which is entirely wrong due to this misconception and lack of expertise in analyzing big data that the workforce in big data industry in just a handful. As the importance and application of big data is growing the methods and techniques to examine and analyse it will also have huge implication and will have an exponential growth.
Parallel Computing Technologies Of Big Data. (2024, Feb 02). Retrieved from https://studymoose.com/parallel-computing-technologies-of-big-data-essay
👋 Hi! I’m your smart assistant Amy!
Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.
get help with your assignment