A Data Placement Algorithm for Data Intensive Applications in Cloud

Categories: ScienceTechnology

Abstract

In literature, Data layout is one of the critical problems which goes for reducing data advancements among data centers to improve the capability of the whole cloud system. In This research paper, author has showcased a data intensive layout algorithm. Here Datasets with fixed region have been considered and both the offline and the online procedure for data organization have been given in this research paper. As this proposed framework is for diminishing the overall proportion of data transmissions and the phenomenal assent of the datasets has been displayed, the cost of data transmission can be evaluated progressively reasonable.

The results exhibit that differentiated and two old style procedures, our estimation can diminish the proportion of data transmission even more satisfactorily.

Research Problem

With the landing of Big Data, numerous logical research fields have amassed tremendous measures of logical information. Cloud figuring gives tremendous measures of capacity and registering assets. In any case, in light of the fact that the cloud is architected in a dispersed condition, an enormous number of activities can be very tedious.

Get quality help now
Writer Lyla
Writer Lyla
checked Verified writer

Proficient in: Science

star star star star 5 (876)

“ Have been using her for a while and please believe when I tell you, she never fail. Thanks Writer Lyla you are indeed awesome ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

The main research problem is the time-consuming movements of data. There are sub problems too which are discussed in this paper and they are like for example: Complex conditions frequently exist among the errands and datasets in the cloud. A few datasets are required by various errands, while a few undertakings need a few datasets. Truth be told, Data placement is a NP-difficult issue, since it could be decreased to the Knapsack Packing Problem.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

So how to disperse these information things sensibly is one of the issues.

Another strategy, which is a hashing algorithm is used these days but it is only taken into consideration because it considered balance conditions. Someway, Hadoop strategy isn't reasonable for the work process applications, since the size of the information things is unique, and there is no conspicuous connection between the assignments' registering load and their information required.

The Proposed Solution

A. Data Correlation Calculation

To reduce the total data movement, modeling the dependencies between datasets is usually the first step. The applications work in our systems using tasks assigned to the dataset and vice versa. So, we can intricate data dependency in a work process framework, thus, setting the datasets with a cozy relationship on a similar center as conceivable as possible, can successfully lessen data transfers in the system. In this way, we assemble a data connection framework, which records the relationship of each two data things, and after that as per these conditions, the data collections will be partitioned recursively into bunches until every one of them can be spared to one of the server farms.

We will go through certain cases while taking one case as there will be no fixed position of the dataset. While other case would be the opposite as we will allocate some fixed position to the datasets.

B. Offline Data placement algorithm / Partition algorithm

In this the BEA - Bond Energy Algorithm is utilized to convert the correlation matrix CM and after applying it the things with comparative qualities in the framework gather together. After the above BEA task, the recursive segment will be performed. The apportioning calculation has a superior execution than the current ones. The benefit of this calculation is that it has a sensible proportional division inclination, neither too high nor excessively low. Since extreme identical division propensity may cause the information things with high reliance to be isolated rashly.

They follow the same partition on each of the subtrees until the 2-constraint given in the research paper is satisfied:

  1. Constraint 1 - In the leaf node there are all things considered to keep at most one fixed position of a data group. The fixed position has been assembled ahead of time as far as their requirement. Subsequently, every leaf node ought to incorporate close to one fixed-position gathering else the storage area conditions can't be met. While the second constraint said in the above line.
  2. Constraint 2 - There are a few servers that can suit the majority of the datasets in the leaf node. After each separation, the recently produced sub-bunches will be attempted to put on one of the servers in the cloud. In the event that there is a fix-position information bunch in this hub, it will be attempted to dispense to that predetermined server. On the off chance that the staying accessible space of this server is more than the complete size of datasets in the leaf node then this allocation works. In the event that no node can oblige, it will be partitioned once more. So the parcel activities will be done recursively until the majority of the data gatherings have been distributed as per their capacity prerequisite.

C. Online Data Placement Algorithm

So, after performing on the cloud, the data which are generated are needed by one ore task. So, in certain circumstances, these recently produced information things could be enormous. In this way, the data relationships among them and other prior data are profitable, since circulating these recently created information sensibly can further decrease the runtime information transmission.

So now, the new generated datasets will be spared to a server farm that can limit these information things related with resulting data developments. The data development sum related with this middle information comprises of two sections:

  1. The last undertaking requires the halfway data as input, and not the majority of the errand's input datasets which are put away in a similar node, henceforth data developments are unavoidable.
  2. On the other hand, the re-circulation of the transitional dataitself increment a specific measure of data developments.

What's more, the second part is a significant enhancement to the initial segment since the transitional data might be simply the enormous size and the re-design itself of another middle of the road data may cause huge information developments. Accordingly, the connection between a middle of the road dataset and a server ought to be altered.

At that point the server farm which has the most extreme reliance and furthermore has adequate capacity limit will be chosen to store this dataset.

Critique

Result Analysis

Certain experiments were done by using our initial stage algorithm for the initial datasets while the runtime stage algorithm was applied to the new datasets. Our goal was to the recreation programming will compute the absolute data development sum on the datasets design. As indicated by whether considered the fixed-position datasets, the tests can be partitioned into 2 gatherings: no fixed position information and 10% fixed-position information.

In experiment 1 we took no fixed position data into consideration and turned out from the reliable hashing calculation and DongYuan's calculation, our proposed methodology without PSO portion can diminish data development measure of 29.8% and 8.8% on normal, and the best execution happens at 45-datasets, for example diminished 44.0% and 15.1% separately.

By presentation of the PSO allotment, 34.3% and 13.0% measure of data developments can be decreased contrasted and the steady hashing's and DongYuan's strategy.

In experiment 2 we kept 10% to fixed position to the datasets to check if our technique can be used to existing fixed position datasets or not. So, the results for this case was diverge from the steady hashing calculation and DongYuan's calculation, our proposed procedure can lessen data development measure of 26.1% and 8.4% all things considered. By applying the PSO allocation methodology, these decline rates of data developments can be expanded to 29.1% and 12.1% individually by and large contrasted and the reliable hashing calculation and DongYuan's calculation. A comparative outcome is appeared by and large, contrasted and the predictable hashing calculation and DongYuan's calculation, 34.9% and 13.5% measure of information development has declined separately.

Table 1: Experiment Results Summary

Experiment Scenario Data Movement Reduction (%)
1 Without Fixed-Position Data 29.8 - 44.0
2 With 10% Fixed-Position Data 26.1 - 29.1

In this way, we can reach the determination that with 10% fixed area datasets added to the framework, our calculations can even now work great.

Conclusion

A data arrangement calculation dependent on information relationship various leveled bunching has been proposed, and both the disconnected data arrangement technique and the online procedure have been given. we set the level of fixed area datasets to 10%, our proposed calculations can in any case diminish the data developments clearly. Also, Recreation results show that, contrasted and the two old style techniques, for example the steady hashing calculation and Dong Yuan's technique can diminish the measure of data developments more adequately.

Results indicate that our data placement strategy can effectively reduce data movement amount and time consumption during execution. In the future work, heterogeneous system structure is the focal point of research, since the move time utilization not just relies upon the data measures of developments, yet additionally depends on the system transfer speed between the server farms. Along these lines making the continuous data moves occur on rapid channels might be a decent procedure.

Updated: Feb 17, 2024
Cite this page

A Data Placement Algorithm for Data Intensive Applications in Cloud. (2024, Feb 17). Retrieved from https://studymoose.com/document/a-data-placement-algorithm-for-data-intensive-applications-in-cloud

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment