Project Voldemort: Features, Comparison with other Databases and Application

Abstract

To start with, everyone is aware of how technology is at its peak. This technology gets a new feature every new minute. Although Relational databases do not satisfy all its feature since it is structured as well as cannot grow or shrink in accordance to storage. So, in order to overcome this limitation, the databases should be able to accommodate huge datasets also not only proper structured but it should also accommodate semi-structured or non-structured datasets. This is when databases like project Voldemort are significant.

In this review paper, we would discuss about a non-structured distributed database namely Project Voldemort which follows key value pairing. In this article, our prime focus will be on the evolution of Voldemort and the current scenario of it.

Keywords: key value storage system, Comparison to relational database, QuickStart, Pros and Cons, Queries

Introduction

Project Voldemort is an open source database it follows amazon’s key value pairing. There are many distributed databases but not every database has data replication property.

Get quality help now
Sweet V
Verified writer

Proficient in: Comparison

4.9 (984)

“ Ok, let me say I’m extremely satisfy with the result while it was a last minute thing. I really enjoy the effort put in. ”

+84 relevant experts are online
Hire writer

This data replication property is the core feature of project Voldemort, which we will have a detailed point in coming paragraph. Along with data replication there are few more features in which project Voldemort excels in they are pluggable serialization, data partitioning, failure detection and handling, data versioning etc. Although project Voldemort restricts itself in many components which will be discussed further in this article.

Origination of Project Voldemort:

Project Voldemort is developed by LinkedIn. Its initial release was in 2009 whereas it was released as a stable database product by 2017.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Check writers' offers

You won’t be charged yet!

The source code is available under the Apache 2.0 license. Anyone can actively fix the bugs report and also generate new updates Project Voldemort is written in java and it is available in English worldwide. Name Voldemort is derived from the fictional character from the famous harry potter. LinkedIn uses project Voldemort for high scalability.

Prime Features:

The prime features of project Voldemort are as follows:

Data Replication:

Data can be automatically replicated or copied to various servers. This is helpful as it saves time as code as well

Data Partitioning:

Data can be divided and only the needed part of the entire database can be sent to the required server. Hence each server will have a subset of the entire huge database which is also used for Load Balancing

Handling Failures:

Each data item is versioned that is it is looked optimistically in order to avoid any failure cases. Also, it binds the data integrity to avoid failures. Each data node is independent to avoid integral failure point also known as central failure point. Voldemort has the property that when a server fails load will distribute equally over all remaining servers in the cluster.

Pluggable serialization

Pluggable serialization includes rich keys and their values and tuples with field names, it checks data against an expected schema which avoid severe errors. It supports pluggable data placement strategies which helps in distribution of data across various data centres that are difficult to distribute considering geographical region

Data Versioning

Versioning technique is just a simple step of optimistic locking. We store a unique counter or “clock” value with each piece of data and only allow updates when the update specifies the correct clock value. Its helps in high efficiency which works well in centralized but this feature sometimes is not compatible with distributed since it proceeds to data replication and data redundancy as well.

Differentiation with Relational Databases

Voldemort is not a relational Database nor it an object-oriented database. Hence it does not satisfy ACID properties and it can also not attempt to map the object reference graphs nor does it show the abstraction like document orientation.

In comparison to relational database, project Voldemort is huge, persistent, distributed and also contains fault tolerant hash table. Project Voldemort supports horizontal scaling and has much higher availability but not to forget this causes great loss of convenience.

Project Voldemort does not have caching tier, since it combines the memory caching with the storage system. Hence, separate caching tier is not required because storage system is all enough.

Project Voldemort does both read operation as well write operation in a horizontal scalable manner hence differs from other relational databases.

One major difference of Project Voldemort in accordance to any other relational database is Data Portioning, i.e.it allows for cluster expansion and shrinking as well without rebalancing all data.

Voldemort database also practises unit testing, since storage layer is mockable.

Design

To enable high performance of this distributed database it allows only very simple key-value data access.

The important part of design is to remember that both keys and values can be Complex objects which can include maps as well as lists.

The only supported queries which are effectively executed are:

value = store.get(key)

store.put(key,value)

store.delete(key)

store.delete(key)

System Architecture

This architecture follows a layered architecture. Each layer is responsible for a particular task say get or put and many more which is similar to the TCP/IP network.

For instance, routing layer is responsible for taking an operation like PUT whereas the conflict resolution layer will release resources when multiple data sets are accessing same resources

The architecture of Voldemort database has following layers Client API, Conflict Resolution, Serialization/Composition, Routing Layer, Consistency Layer and Storage Engines.

To elaborate, these layers are divided into client and server component and these components communicate using a network.

This layered system give rise to good flexibility.

This flexibility makes high performance configurations possible. Flexibility is recognized using network hops, this enables high performance to be achieved.

Pros of Project Voldemort

  1. It has predictable unlike other databases.
  2. The storage and logic sections are separated which helps in avoiding mixing of this sections into business logic.
  3. Datasets are easy to distribute over the cluster and you don’t need to store the entire database on all components of the cluster.
  4. Relational Databases forces to use caching layer which converts the into key value pair which is not required in project Voldemort since caching layer is combined with storage section
  5. It prohibits from object-relational miss match at its most possible way compared to relational databases.

Cons of Project Voldemort

  1. It does not support complex query filters.
  2. Joins only work well if they are written in the code.
  3. Project Voldemort has no foreign key constraints.
  4. Any event triggers do not run under Voldemort database.

Performance

This database is more often used just to dump the basic details of any datasets.

It shows predictable performance.

Network is the second biggest bottleneck after disk.

As we know, LinkedIn is the prime application of Project Voldemort.

After analysis the records generated are

Reads: 19,384 req/sec

Writes: 16,559 req/sec

Note: It is a single node cluster hence its replication factor is 1.

Queries

Project Voldemort is based on hash table semantics, so a single value can be updated or modified at a time and the value can be retrieved by primary key. This makes the easier distribution in distributing data across data servers since all the data items can be accessed with primary key.

Although project Voldemort does not support one to many relations but it supports lists as values which fills the on to many relation absences.

The simplicity of the queries can be an advantage, since each has very predictable performance, it is easy to break down the performance of a service into the number of storage operations it performs and quickly estimate the load. In Contrary SQL execution plans can be data dependent and queries are often opaque, so it becomes difficult to estimate whether a given query will perform well with realistic data under load.

QuickStart

Downloading and building of code:

git clone

cd voldemort

./gradlew build -x test

The above code will provide you with latest version of Project Voldemort after downloading it from GitHub

Well, you can also get the it from archived of the release.

Start the single node Cluster

> bin/voldemort-server.sh config/single_node_cluster > /tmp/voldemort.log &

Start the command prompt to test and execute few syntaxes

> bin/voldemort-shell.sh test tcp://localhost:6666

Established connection to test via tcp://localhost:6666

> put “hello” “world”

> get “hello”

version(0:1): “world”

> delete “hello”

> get “hello”

null

> help

> exit

k k thx bye.

Conclusion and Future scope

In this review paper, we have discussed about project Voldemort databases, why we need them, its types. At last we have compared relational databases with project Voldemort. After reviewing all this we come to know that Voldemort databases have few advantages over relational databases that’s why it is currently adopted by LinkedIn. Due to the availability of huge data storage ability which is the key factor of concern for technology. But this is also true that Voldemort needs a lot of updating if it wants to survive because its query processing system is below every other NoSQL database. If efficiency and throughput is under consideration then project Voldemort should not be considered. According to situation, we can also combine these databases in one application for better performance. The future scope of project Voldemort database seems to be very limited considering the present stable release.

References

  1. www.project_voldemort.com
  2. Features of project Voldemort database and it’s queries.
  3. www.wikipedia.com
  4. Queries of project Voldemort.
  5. www.w3.com
  6. Design and implementation of Voldemort database.
  7. www.LinkedIn.com
  8. Analysis of read/write records.
  9. Facebook Newsroom: www.Engineering.LinkedIn
  10. Limitation, overall conclusion and futures scope.
  11. www.vschart.com
  12. Comparison between project Voldemort and relational database.

Cite this page

Project Voldemort: Features, Comparison with other Databases and Application. (2019, Nov 26). Retrieved from https://studymoose.com/project-voldemort-features-comparison-with-other-databases-and-application-essay

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment