To start with, everyone is aware of how technology is at its peak. This technology gets a new feature every new minute. Although Relational databases do not satisfy all its feature since it is structured as well as cannot grow or shrink in accordance to storage. So, in order to overcome this limitation, the databases should be able to accommodate huge datasets also not only proper structured but it should also accommodate semi-structured or non-structured datasets. This is when databases like project Voldemort are significant.
In this review paper, we would discuss about a non-structured distributed database namely Project Voldemort which follows key value pairing. In this article, our prime focus will be on the evolution of Voldemort and the current scenario of it.
Keywords: key value storage system, Comparison to relational database, QuickStart, Pros and Cons, Queries
Project Voldemort is an open source database it follows amazon’s key value pairing. There are many distributed databases but not every database has data replication property.
This data replication property is the core feature of project Voldemort, which we will have a detailed point in coming paragraph. Along with data replication there are few more features in which project Voldemort excels in they are pluggable serialization, data partitioning, failure detection and handling, data versioning etc. Although project Voldemort restricts itself in many components which will be discussed further in this article.
Project Voldemort is developed by LinkedIn. Its initial release was in 2009 whereas it was released as a stable database product by 2017.
The source code is available under the Apache 2.0 license. Anyone can actively fix the bugs report and also generate new updates Project Voldemort is written in java and it is available in English worldwide. Name Voldemort is derived from the fictional character from the famous harry potter. LinkedIn uses project Voldemort for high scalability.
The prime features of project Voldemort are as follows:
Data can be automatically replicated or copied to various servers. This is helpful as it saves time as code as well
Data can be divided and only the needed part of the entire database can be sent to the required server. Hence each server will have a subset of the entire huge database which is also used for Load Balancing
Each data item is versioned that is it is looked optimistically in order to avoid any failure cases. Also, it binds the data integrity to avoid failures. Each data node is independent to avoid integral failure point also known as central failure point. Voldemort has the property that when a server fails load will distribute equally over all remaining servers in the cluster.
Pluggable serialization includes rich keys and their values and tuples with field names, it checks data against an expected schema which avoid severe errors. It supports pluggable data placement strategies which helps in distribution of data across various data centres that are difficult to distribute considering geographical region
Versioning technique is just a simple step of optimistic locking. We store a unique counter or “clock” value with each piece of data and only allow updates when the update specifies the correct clock value. Its helps in high efficiency which works well in centralized but this feature sometimes is not compatible with distributed since it proceeds to data replication and data redundancy as well.
Voldemort is not a relational Database nor it an object-oriented database. Hence it does not satisfy ACID properties and it can also not attempt to map the object reference graphs nor does it show the abstraction like document orientation.
In comparison to relational database, project Voldemort is huge, persistent, distributed and also contains fault tolerant hash table. Project Voldemort supports horizontal scaling and has much higher availability but not to forget this causes great loss of convenience.
Project Voldemort does not have caching tier, since it combines the memory caching with the storage system. Hence, separate caching tier is not required because storage system is all enough.
Project Voldemort does both read operation as well write operation in a horizontal scalable manner hence differs from other relational databases.
One major difference of Project Voldemort in accordance to any other relational database is Data Portioning, i.e.it allows for cluster expansion and shrinking as well without rebalancing all data.
Voldemort database also practises unit testing, since storage layer is mockable.
To enable high performance of this distributed database it allows only very simple key-value data access.
The important part of design is to remember that both keys and values can be Complex objects which can include maps as well as lists.
The only supported queries which are effectively executed are:
value = store.get(key)
This architecture follows a layered architecture. Each layer is responsible for a particular task say get or put and many more which is similar to the TCP/IP network.
For instance, routing layer is responsible for taking an operation like PUT whereas the conflict resolution layer will release resources when multiple data sets are accessing same resources
The architecture of Voldemort database has following layers Client API, Conflict Resolution, Serialization/Composition, Routing Layer, Consistency Layer and Storage Engines.
To elaborate, these layers are divided into client and server component and these components communicate using a network.
This layered system give rise to good flexibility.
This flexibility makes high performance configurations possible. Flexibility is recognized using network hops, this enables high performance to be achieved.
This database is more often used just to dump the basic details of any datasets.
It shows predictable performance.
Network is the second biggest bottleneck after disk.
As we know, LinkedIn is the prime application of Project Voldemort.
After analysis the records generated are
Reads: 19,384 req/sec
Writes: 16,559 req/sec
Note: It is a single node cluster hence its replication factor is 1.
Project Voldemort is based on hash table semantics, so a single value can be updated or modified at a time and the value can be retrieved by primary key. This makes the easier distribution in distributing data across data servers since all the data items can be accessed with primary key.
Although project Voldemort does not support one to many relations but it supports lists as values which fills the on to many relation absences.
The simplicity of the queries can be an advantage, since each has very predictable performance, it is easy to break down the performance of a service into the number of storage operations it performs and quickly estimate the load. In Contrary SQL execution plans can be data dependent and queries are often opaque, so it becomes difficult to estimate whether a given query will perform well with realistic data under load.
./gradlew build -x test
The above code will provide you with latest version of Project Voldemort after downloading it from GitHub
Well, you can also get the it from archived of the release.
> bin/voldemort-server.sh config/single_node_cluster > /tmp/voldemort.log &
> bin/voldemort-shell.sh test tcp://localhost:6666
Established connection to test via tcp://localhost:6666
> put “hello” “world”
> get “hello”
> delete “hello”
> get “hello”
k k thx bye.
In this review paper, we have discussed about project Voldemort databases, why we need them, its types. At last we have compared relational databases with project Voldemort. After reviewing all this we come to know that Voldemort databases have few advantages over relational databases that’s why it is currently adopted by LinkedIn. Due to the availability of huge data storage ability which is the key factor of concern for technology. But this is also true that Voldemort needs a lot of updating if it wants to survive because its query processing system is below every other NoSQL database. If efficiency and throughput is under consideration then project Voldemort should not be considered. According to situation, we can also combine these databases in one application for better performance. The future scope of project Voldemort database seems to be very limited considering the present stable release.