Query Evaluation Using Algorithms in PostgreSQL

Motivation

There has been keen interest in skyline queries on relational databases. Skyline queries also known as Pareto queries search for best tuples compared to relational queries. In this work, we explore the implementation of a preference evaluation system for a relational database system. The skyline or Pareto, operator selects those tuples that are not dominated by others.

E cient algorithms are needed for skyline. In this paper we propose skyline algorithms like SFS (Sort Filter Skyline Algorithm) and the optimized version of SFS known as LESS (Linear Elimination Sort for Skyline).

The evaluation of preferences according to [GSG05] is a generalization of the skyline evaluation and maximal vector problem.

The maximal vector problem has been rediscovered in the database context with the introduction of skyline queries. This is to nd the subset of vectors such that each is not dominated by any other vectors from the set.

Ob jective of the Master Thesis

The ob jective of this paper is to analyze the runtime performance of SFS and LESS algorithms for maximal vector computation.

Get quality help now
Writer Lyla
Writer Lyla
checked Verified writer

Proficient in: Computers

star star star star 5 (876)

“ Have been using her for a while and please believe when I tell you, she never fail. Thanks Writer Lyla you are indeed awesome ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

We show that the scan based algorithm LESS achieves better performance than SFS .PostgreSQL is chosen for the integration of the preference system. It is an open source platform which runs in all operating system.

Research Questions

This paper focuses on the following research questions.

  • Which algorithms can be e ciently used for maximal vector computation in large data sets?
  • How can we avoid complete scanning of sorted dataset?
  • How can we reduce the maximal to maximal comparisons?

Preliminary Structure of the Master Thesis

This work is divided as follows: Chapter 2 focuses on the existing algorithms runtime performances and then focus on the newer external skyline algorithms.

Chapter 3 presents the details of implementation.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

The rst section is about preference queries and its syntax. The Second section deals with the implementation of preference queries in PostgreSQL; the overall structure and working of Preference queries in PostgreSQL is explained over here.

Chapter4 describes the evaluation of the skyline algorithms, experimental setup and dataset used.

Chapter 5 summarizes the work and gives an idea on future implementations.

Methods

The following chapter is intended to provide a basis for the understanding of this work and the issues it deals with. The algorithms and tools used to integrate preference queries are also explained in this section.

SFS Algorithm

BNL (Block Nested Loop) a multi pass algorithm for skyline computation keeps a window in the bu er pool for collecting candidate tuples.A page of tuples (A) is compared with each tuple in the window (B) [Cho+03].If B dominates A then A is discarded. Likewise it compares with all tuples in the window and output as skyline tuple. Furthermore BNL is not guaranteed to work in a single pass when the window is large enough to hold skyline tuples. So a more e cient algorithm was introduced known as Sort Filter Skyline algorithm. In SFS the table is initially sorted. In the rst pass a tuple is added to the window which is skyline. No tuple following it can dominate according to the below theorem.

Theorem Any total order of the tuples of R with respect to any monotone scoring function (ordered from highest to lowest score) is a topological sort with respect to the skyline dominance partial relation [Cho+03].

Runtime Performance SFS is guaranteed to perform in the optimal number of passes since any tuple added to the window is skyline. It does not su er from CPU boundedness [Cho+03].

LESS Algorithm

The new external skyline algorithm LESS (Linear Elimination Sort for Skyline) is an optimized version of SFS [KT17].It has two optimizations.

Therst optimization in the rst pass makes use of a bu er called Elimination lter(EF) which keeps the best entropy scores. The input points are compared with EF.If the point is dominated by EF it is discarded and if its incomparable or dominates other points in EF it is inserted in the EF.

The second optimization combines the nal pass of the external sorting process with the rst pass of the skyline lter which eliminates the remaining dominated tuples inorder to get the skyline tuple.If SF becomes full then an over owle is created.

Runtime Performance EFlter reduces the size of the input points.The combination of the nal pass of the EF process with the rst pass of SF process saves one pass while computing.

Implementation

The implementation phase explains about the preference evaluation system in PostgreSQL.The syntax used for a preference query with di erent examples are also explained in this section.

Lifecycle of a query in PostgreSQL Implementation

The lifecycle begins by converting string to a Parse tree using the syntax described with Bison.The tra c cope decides for each parse tree whether it is a utility command or not.Rewrite query step apply some rules and split the query.Generate path will generate all possible pro-

cessing paths that lead to the result of a query.This path contains an estimate of probable computing time and memory thoughput.The Generate Plan step selects the path tree with lowest costand convert to a plan tree.The execute plan uses a recursive method which calls the initialization functions and generates a state for corresponding state machine.

Evaluation

Experimental setup

A slightly modi ed version of the data generator based on [Joh09] was used to create the test data.Bash scripts were used to generate test data in the form of les by calling the data generator several times.With the help of another script the test data can be inserted from the les to the PostgreSQL.Another scripts perform the experiments and a nal script transform the result into a CSV le.

Time Schedule

Timeline will be decide once the scope of the thesis is nalized.

Bibliography

  1. [BKS01]S Borzsony, D Kossmann, and K Stocker. The Skyline Operator". In: 2001. url: [Cho+03]Jan Chomicki et al. Skyline with Presorting". In: 2003. url: : / / www.researchgate.net/publication/4053418_Skyline_with_presorting .
  2. [GSG05]Parke Godfrey, Ryan Shipley, and Jarek Gryz. Maximal Vector Computation in Large Data Sets". In: 2005. url: 0a3a063edb70bf5068e7dccc20db81bbba09.pdf (visited on 09/02/2005).
  3. [Joh09]Eder Johann. On extending postgreSQL with the skyline operator". In: 2009. url: : / / repositum . tuwien . ac . at / obvutwhs / content / titleinfo / 1595771 .
  4. [KT17]Christos Kalyvas and Theodoros Tzouramanis. A Survey of Skyline Query Pro- cessing". In: 2017. url: [Pos]PostgreSQL. Backend Flowchart". In: url: : / / www . postgresql . org / developer/backend/ .
Updated: Aug 12, 2021
Cite this page

Query Evaluation Using Algorithms in PostgreSQL. (2019, Dec 07). Retrieved from https://studymoose.com/query-evaluation-using-algorithms-in-postgresql-essay

Query Evaluation Using Algorithms in PostgreSQL essay
Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment