24/7 writing help on your phone
Save to my list
Remove from my list
There has been keen interest in skyline queries on relational databases. Skyline queries also known as Pareto queries search for best tuples compared to relational queries. In this work, we explore the implementation of a preference evaluation system for a relational database system. The skyline or Pareto, operator selects those tuples that are not dominated by others.
E cient algorithms are needed for skyline. In this paper we propose skyline algorithms like SFS (Sort Filter Skyline Algorithm) and the optimized version of SFS known as LESS (Linear Elimination Sort for Skyline).
The evaluation of preferences according to [GSG05] is a generalization of the skyline evaluation and maximal vector problem. The maximal vector problem has been rediscovered in the database context with the introduction of skyline queries. This is to nd the subset of vectors such that each is not dominated by any other vectors from the set.
The ob jective of this paper is to analyze the runtime performance of SFS and LESS algorithms for maximal vector computation.
We show that the scan based algorithm LESS achieves better performance than SFS .PostgreSQL is chosen for the integration of the preference system. It is an open source platform which runs in all operating system.
This paper focuses on the following research questions.
This work is divided as follows: Chapter 2 focuses on the existing algorithms runtime performances and then focus on the newer external skyline algorithms.
Chapter 3 presents the details of implementation. The rst section is about preference queries and its syntax. The Second section deals with the implementation of preference queries in PostgreSQL; the overall structure and working of Preference queries in PostgreSQL is explained over here.
Chapter4 describes the evaluation of the skyline algorithms, experimental setup and dataset used.
Chapter 5 summarizes the work and gives an idea on future implementations.
The following chapter is intended to provide a basis for the understanding of this work and the issues it deals with. The algorithms and tools used to integrate preference queries are also explained in this section.
BNL (Block Nested Loop) a multi pass algorithm for skyline computation keeps a window in the bu er pool for collecting candidate tuples.A page of tuples (A) is compared with each tuple in the window (B) [Cho+03].If B dominates A then A is discarded. Likewise it compares with all tuples in the window and output as skyline tuple. Furthermore BNL is not guaranteed to work in a single pass when the window is large enough to hold skyline tuples. So a more e cient algorithm was introduced known as Sort Filter Skyline algorithm. In SFS the table is initially sorted. In the rst pass a tuple is added to the window which is skyline. No tuple following it can dominate according to the below theorem.
Theorem Any total order of the tuples of R with respect to any monotone scoring function (ordered from highest to lowest score) is a topological sort with respect to the skyline dominance partial relation [Cho+03].
Runtime Performance SFS is guaranteed to perform in the optimal number of passes since any tuple added to the window is skyline. It does not su er from CPU boundedness [Cho+03].
The new external skyline algorithm LESS (Linear Elimination Sort for Skyline) is an optimized version of SFS [KT17].It has two optimizations.
Therst optimization in the rst pass makes use of a bu er called Elimination lter(EF) which keeps the best entropy scores. The input points are compared with EF.If the point is dominated by EF it is discarded and if its incomparable or dominates other points in EF it is inserted in the EF.
The second optimization combines the nal pass of the external sorting process with the rst pass of the skyline lter which eliminates the remaining dominated tuples inorder to get the skyline tuple.If SF becomes full then an over owle is created.
Runtime Performance EFlter reduces the size of the input points.The combination of the nal pass of the EF process with the rst pass of SF process saves one pass while computing.
The implementation phase explains about the preference evaluation system in PostgreSQL.The syntax used for a preference query with di erent examples are also explained in this section.
The lifecycle begins by converting string to a Parse tree using the syntax described with Bison.The tra c cope decides for each parse tree whether it is a utility command or not.Rewrite query step apply some rules and split the query.Generate path will generate all possible pro-
cessing paths that lead to the result of a query.This path contains an estimate of probable computing time and memory thoughput.The Generate Plan step selects the path tree with lowest costand convert to a plan tree.The execute plan uses a recursive method which calls the initialization functions and generates a state for corresponding state machine.
A slightly modi ed version of the data generator based on [Joh09] was used to create the test data.Bash scripts were used to generate test data in the form of les by calling the data generator several times.With the help of another script the test data can be inserted from the les to the PostgreSQL.Another scripts perform the experiments and a nal script transform the result into a CSV le.
Timeline will be decide once the scope of the thesis is nalized.
👋 Hi! I’m your smart assistant Amy!
Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.get help with your assignment