The use of parallel databases for data management has become more common as it has more benefits than those of its precursor—the mainframe computers. It is more beneficial both to the programmer and owner of the system because of its low cost and high-standard performance as compared to traditional systems. However, maintaining such system requires the technologist to solve several problems that relates to both hardware and software.
In hardware, the main problem is on the architecture of the parallel database mainly the microprocessor and disk-storage device. In the software, issues include operating system support and parallel data processing languages. Although these problems can be partially solved by providing newer database systems capable of higher functionalities, it incurred more questions and difficulties in parallel database.
Problems Related to Parallel Databases and Counter-measures
In this Information Age, science and technology was able to push the limits of mankind, particularly through the development of computers and databases. Automated processing has become the benchmark for both companies and households soon after the public realized the importance of information. However, as information technology grew, we needed to build faster and more powerful machines in the form of databases that will be used to store information, handle queries and generate a response to that query. From the database viewpoint, there is a need to build database servers that can handle different types of queries efficiently (with speed and accuracy) that can come from very large on-line databases.
Technologists found their hope on using large-scale parallelism in which the raw power of individual components is magnified upon integration in a complete system coupled with the appropriate parallel database system. In this system, the so-called parallel database system, the modern multiprocessor architecture is utilized to the full extent using software-oriented solutions. This system promises high-performance, high-availability and extensibility at a much lower price-performance ration compared to mainframe databases. Add to that, parallel databases are the only viable solution in handling databases at terabyte scales contained within a single system.
Scope of the Issues
Although there are SQL-based products who found success in the market, there are still many problems that limit the full exploitation of the capabilities found in parallel databases. These include choice of architectural designs, data placement, parallel database languages, and parallel query processing.
Problems and Solutions
The first issue is on the parallel database’s architectural configuration which we choose whether we will have shared-memory, shared-disk, or shared-nothing. Our choice depends upon the number of functionalities the database will carry. Shared-memory and shared-disk architectures are best suitable for small-to-medium systems because they can connect at high speeds using lesser number of processors.
Shared-nothing, on the other hand, is best suitable for high-end processes as such can be capable of working on very large (terabyte) databases. Aside from these choices, another thing to consider is technology hybridization. This utilizes two or more of the basic parallel systems. However, this promotes complexity and can easily create more problems in the system.
Another problem faced upon using parallel databases is the data replacement. Data placement is essential because it is the key to balancing the load the server receives. Here, the question lies on the data placement techniques that will allow queries and query operations to operate on different partitions. In interquery parallelism, declustering is useful in dividing queries and allows them to work on different partitions. In intraquery parallelism, declustering is useful by assigning the query’s operations into different partitions.
The choice for parallel database processing language is another open issue regarding parallel databases. However, there are general ideas that can be used to extend parallel database processing by the use of appropriate language. One of such is the use of divide-and-conquer computations. In this way, the problem is divided into a series of nodes (fan-out) while upon conquering, we are able to gather results into a set (through independent parallelism), a stream (through piple parallelism), and an aggregate (through fan-in parallelism).
Another concern is on parallel query processing. It refers to stepwise, automatic translation of a query that has a centralized execution model, an efficient execution plan, and its parallel execution. Parallel databases can utilize declusterization of queries and query operations. However, in processing queries, there is a need for cost-based optimization and automatic parallelization so that mixed workloads and complex queries can be tackled easily and accordingly.
Finally, the problem with parallel databases also comes from introduction of higher functionality in parallel systems (i.e. knowledge-based or object-oriented capabilities). This requires extensive and intensive revision on the data placement and parallel query processing in order to meet the demand for increased functionalities of parallel systems.
Knowledge-based management systems (KBMSs) can be used to shift from data management into a more general knowledge management system that allows the use of application programs in queries. Object-oriented databases (OODBMS) is another futuristic parallel database system that allows advanced functionalities such as incorporating object-oriented programming into database technologies. Although addition of functionalities can help exploit the multiprocessor’s potential, they can also add more problems into parallel databases.
Parallel database systems proved their worth in this Information Age. The increase in demand for larger database systems drives the need to build faster and more powerful databases. In parallel systems, this is accomplished through exploitation of modern multiprocessor architectures as well as coupling with the most appropriate database software. However, there are open issues that results from the wide use of such system that ranges from parallel processing to distributed database management. To solve these issues, there must be a broader research centered on maximizing the use of parallel systems. Introduction of higher functionality can partially solve these issues but can also add more if not studied carefully.
Ailamaki, A. and S. Harizopoulos. (2003). A Case for Staged Database Systems. Retrieved Nov 7, 2007, from http://nms.csail.mit.edu/~stavros/pubs/staged.pdf
Chapter Three: Database machines, IDIOMS and the problem. Retrieved Nov 7, 2007, from http://www.techscribe.co.uk/thesis/chapter3.html
DeWitt, D.J. and J. Gray. (1992). Parallel Database Systems: The Future of High Performance Database Processing. ACM, 36(6).
Ghogomu, H.T. and C.H.C. Leung. (1993). A high-performance parallel database architecture. International Conference on Supercomputing: Proceedings of the 7th international conference on supercomputing, 377-386).
Valduriez, P. (1992). Parallel Database Systems: Open Problems and New Issues. Parallel Databases, 1, 137-165.
Wolf, J., Dias, D., Yu, P., and J. Turek. (1991) An effective algorithm for parallelizing has joins in the presence of skews. International Conference of Data Engjeeninring.