Data over the world wide web

Categories: Data Information World Wide Web

Essay, Pages 20 (4854 words)

Views

Abstraction

XML has going the standard manner for stand foring and transforming informations over the World Wide Web. The job with XML paperss is that they have a really high ratio of redundancy, which makes these paperss demanding big storage capacity and big web band-width for transmittal.

Because of their widely used, XML paperss could be retrieved harmonizing to obscure questions from naif users with hapless background in composing a good question.

This study tries to get by with the old two jobs by planing a system with two phases.

Don't use plagiarized sources. Get your custom essay on

“ Data over the world wide web ”

Get custom paper

NEW! smart matching with writer

The first phase is the design of a new compaction technique called XQPoint. This technique separates the XML papers into containers and compresses each container utilizing a back-end compressor which is suited to the type of the information in this container. The other portion of the proposed system is to plan the obscure question processor which separates each question into different sub-queries and recover relevant information from the tight XML paperss consequently.

Merely the most relevant information will be decompressed and returned to the user. This research expects that the XQPoint will accomplish better compaction ratio and the question processor will be the first processor to cover with tight XML paperss to recover information harmonizing to obscure questions.

1. Introduction

The extensile Markup Language ( XML ) is a W3C criterion which adopted and sustained by both the industry and the research community. In the recent old ages, we have witnessed an increasing volume of XML digital information that either created straight as an XML papers or converted from another type of informations representation.

The importance of XML comes from different factors, its ability to stand for different informations types in one papers, work outing the job of long-run handiness, and to going the solutions to interoperability job. ( Al-Hamadani et al. , 2009 )

Due to the reproduction of the XML scheme in each record, XML papers is considered to be one of the self-describing information files, which means that these sorts of files have a batch of informations redundancy in both of its tickets and properties ( Ray, 2001 ) . For all the above grounds the demand to compact XML paperss going progressively dramatic. Furthermore, an extended demand evolved to recover information straight from the tight paperss and so uncompress merely the retrieved information. ( Ferragina et al. , 2006 )

Because the broad scope of XML paperss use and with different sorts of users, it is become an of import issue to cover with all sorts of questions. Some of these questions may hold imprecise restraints which can non be processed straight due to the grammar restriction of the question languages. However, these types of questions, which are known as obscure questions, appear to be common when the users of the XML paperss have a small cognition about the papers construction, or they lack the accomplishments on how to compose a precise and meaningful question.

The remedy to the old two quandary is to plan a compaction technique that has the capableness to recover information from the compressed version harmonizing to obscure questions. Two types of XML compressors have been used. The first type is the non-queriable compressors which used to compact XML paperss for archival intents. The 2nd type is the queriable compressors which used to question the tight XML paperss. All the compressors belonging to the 2nd type did non work out the job of obscure questions.

This study proposes a new XML compressor technique called `` XQPoint '' which consists of two phases. In the first phase, it separates the informations portion of the XML papers from the construction portion, so compresses the informations portion utilizing suited compressors depending on the type of the informations, while the construction portion is compressed utilizing the fixed-point dictionary-based compressor. The 2nd phase is to treat the obscure questions by break uping them into multiple sub-queries, retrieve information from the tight XML papers harmonizing to each sub-query, unite the retrieved consequences into groups, and eventually return merely the most relevant groups.

2. Theoretical position

This subdivision describes the background of the research. It composes of three parts ; XML compaction techniques, a brief definition to XPath and NEXI, question types, and replying obscure questions.

The first sub-section describes the differences between XML compressors, gives a brief description of some of these compressors, and sets a comparing between them. Since our system will utilize NEXI question linguistic communication, a brief description will be given about the construction of this type of quires. We will discourse the types of questions and consternate on the obscure question type.

2.1 XML compaction techniques

Holocene, big Numberss of XML compaction techniques have been proposed. Each of which has different features. This subdivision discusses the differences between these compressors and their chief characteristics.

XML compressors can be classified into two categories either to be XML-blind or XML-conscious compressors. XML-blind or general purpose compressors deal with the XML papers as a traditional text papers disregarding its construction and use the general intent text compaction techniques to compact them. These techniques can be classified into two chief categories, either to be arithmetic compressors or dictionary compressors ( Augeri et al. , 2007, Augeri, 2008 ) . The arithmetic compressors represent each twine of characters utilizing a fixed figure of spots per character. PPM, CACM3, and PAQ are illustrations of this sort of compressors ( Moffat. , 1990, Cleary and Witten, 1984, Alistair et al. , 1998 ) . On the other manus, dictionary compaction techniques substitute each twine in the input by its mention in a dictionary maintained by the encoder. WinZip, GZIP, and BZIP2 are illustrations of this compaction category ( BZip2, 1996, GZip, 1992, WinZip, 1990 ) .

On the other manus, XML-conscious compressors try to use the structural behavior of XML paperss in order to accomplish better compaction ratio and less clip in comparative with the XML-blind type and to bring forth a useable XML compressed paperss without the demand to uncompress these paperss. XML-conscious compressors can be classified harmonizing to their ability to questioning the tight paperss into two chief sub-classes ; these are queriable and non-queriable compressors.

2.1.1. Non queriable XML compressors:

This sort of compressors showed good compaction public presentation but the resulting papers can non be queried without uncompressing it. The chief intent of these processors it to accomplish highest compaction ratio for archival intents. Examples of this type are:

XMill ( Liefke and Suciu, 2000 ) : This technique depends on compacting the construction ( i.e. tickets and properties ) of the XML papers individually from its informations by encoding the construction in a dictionary-based manner and so go throughing it to a back-end compressor. All elements and properties name are assigned with an whole number figure to be considered as a key to the lexicon. The informations portion grouped into containers depending on the type of that informations and its way from the root. Each container is compressed individually utilizing an appropriate compaction technique suited for the informations type in that container.

Millau ( Girardot and Sundaresan, 2000 ) : In order to compact the construction of the XML paperss, Millau takes advantage of the papers scheme if this scheme is available. It is an extension of the WBXML ( Wireless Application Protocol Binary XML ) format, which is designed to cut down the size of XML paperss for transmittal intents.

XWRT ( Skibinski et al. , 2007 ) : This technique has similar thoughts of XMill with a little difference. The ticket and attribute names in an XML papers, which are usually have high frequence within the same papers, are encoded by utilizing semi-dynamic lexicon. The XML papers foremost scanned to find the frequent ticket and properties and set them in the lexicon. Another scan to the papers should be processed in order to replace all the happenings of the words in the lexicon with their dictionary index.

RNGzip ( League and Eng, 2007 ) : This technique depends on compacting the XML papers on Relax NG scheme by [ 13 ] . First the scheme should be accepted by the transmitter and the receiving system. It acts as a key in the encoding and decoding procedure. Using this scheme, RNGzip builds a tree zombi for the specific XML papers. Merely small information should be transmitted and the receiving system so reconstructs the complete XML papers.

2.1.2. Queriable XML compressors:

The chief end of this type of compressors is to supply the ability to the tight version of the XML papers to be queried without decompress them. The compaction ratio for these compaction techniques is much lower than the blind-XML or the non-queriable techniques. However, these techniques are of import when covering with resource-limited applications and with Mobiles. In the following subdivision a brief description to some of these techniques will be given.

XGrind ( Tolani and Haritsa, 2000 ) : In 2000, Tolani et Al. introduced the first queriable XML compressor that has the ability to question the tight file without full decompress it. It is considered to be a homomorphic compressor in which the compressed XML papers can be viewed like the original XML papers except that its ticket, elements and attribute names are replaced with their corresponding encryption, which is a dictionary-based encryption. The informations portion of the papers is encoded utilizing Huffman encoding. For the intent of questioning the tight papers, XGrind 's question processor finds the simple way to look into whether it satisfies the way in the given question. The chief drawback with XGrind is that it can manage merely exact-match and prefix lucifer questions on the tight paperss.

Xpress ( Min et al. , 2003 ) : This technique uses the contrary arithmetic encoding method to encode the labels and waies of the XML papers. Alternatively of stand foring each ticket as a alone identifier, as XGrind did, Xpress encodes a label way as a distinguishable interval between 0.0 and 1.0. To encode the informations portion of the XML papers, Xpress utilizations different compaction techniques depending on the type of the informations and without the demand to the human interface.

XQZip ( Cheng and NG, 2004 ) : Unlike XQueC, XQZip groups the XML information into blocks and so applies gzip compressor on them. To treat questions, it decompresses a specific block in order to recover its contents. This technique removes the extra structures occur in an XML papers in order to better question public presentation. Although XQZip procedures different types of XPath questions, it is slower than other compressors because of its partial decompression.

XSeq ( Lin et al. , 2005 ) : This technique adapts Sequitur, which is a grammar-based text compaction, to compact the containers. Sequitur is a linear-time algorithm that makes a context-free grammar for the input twine. XSeq uses this grammar to treat the information values that match the given question and avoid scanning irrelevant informations. Furthermore, the context-free grammar gives the ability to XSeq to treat questions without even partial decompression.

XQueC ( Arion et al. , 2007 ) : This technique uses the separation between informations and construction of XML paperss. The information stored on containers harmonizing to their way location within the papers. Each container component is separately compressed. This procedure will positively impact the retrieval technique, since the complete container could be retrieved as a response to a question. With this thought, XQueC has the ability to treat more types of questions on the tight version without the demand of the partial decompression that has been used in some old compressors.

QXT ( Skibinski and Swache, 2007 ) : It is an extension of XWRT adding query-friendly constructs in order to treat questions by partial decompression. This technique scans the XML file twice. In the first base on balls, a dynamic-dictionary created with the frequences of its points. This dictionary is stored within the tight file. In the 2nd base on balls, QXT encodes the informations and topographic points them into the containers. When the size of a specific container exceeds a given threshold, the container should be compressed utilizing a all-purpose compressor and written to harrow. To treat a question, QXT- question processor foremost searches the lexicon to find which container should be decompressed. After uncompressing a specific container, merely the relevant information will be decoded to XML format.

2.2 XPath and NEXI question linguistic communications

XPath is an look linguistic communication non a scheduling or question linguistic communication per Se ( Kay, 2008 ) . Its chief object is to return a node or several nodes from an XML papers harmonizing to a specific look. XPath 's three informations theoretical account classs and the three operations classs are the chief edifice block of XPath.

Harmonizing to ( Kay, 2004 ) , a typical way look in XPath, consists of a sequence of stairss, separated by the «/» operator. Each measure works by following a relationship between nodes in the papers ( Holman, 2002, Andrew Watt, 2002 ) . Furthermore, there are several marks that are used in XPath look, such as:

«*» indicates the choice of all elements and properties.
« @ » indicates an property
«//» indicates all the nodes below the current node.

Path looks therefore provide a really powerful mechanism for choosing nodes within an XML papers, and this power lies at the bosom of the XPath linguistic communication ( Kay, 2004, Sigurbjornsson and Trotman, 2003 ) .

Narrowed Extended XPath I ( NEXI ) is an XML question linguistic communication that follows the stairss of XPath with some alterations. First, the NEXI retrieval engine designed to infer the semantics from the question in contrary to XPath which has predefined semantics. Furthermore, NEXI extended the usage of the contains ( ) map, which is used by XPath to bespeak an component that is incorporate a specific content, to be about ( ) map to bespeak the component to be about the content. This alteration allows NEXI to treat fuzzed questions. The linguistic communication has extensions for inquiry answering, multimedia searching, and seeking heterogenous papers aggregations. ( Trotman and Sigurbj?ornsson, 2005 )

Which requires a certain context ( i.e. , way ) should be relevant to a specific content description ( i.e. , cont ) ( Trotman and Sigurbj?ornsson, 2005 ) .

2.3 Types of questions

There are different types of questions and most of them had been processed by the old compaction techniques in order to recover information from the tight XML paperss ( Lin et al. , 2005 ) . Table 2 shows that chief types of questions with brief description for each one.

2.4 Vague questions

Since obscure questions are the cardinal issue in our research, this subdivision will give a brief description on such questions and how can be appeared in information retrieval sphere

Many imprecise and unsure informations exist in the existent universe. Since it is of import to reply any user 's question with exact or approximative replies even if these questions have obscure conditions, the demand to treat obscure questions is increased quickly. ( Zhao and Ma, 2009 )

Vague logic is the generalisation of fuzzed logic ( Kumar and Biswas, 2009 ) . Harmonizing to obscure set theory by ( Gua W. L. and Buehrer, 1993 ) , obscure hunt is a combination of the undermentioned hunt techniques:

a-vague-equality hunt, and
Vague-proximity hunt

Vague set ( VS ) is a combination of two sets: ( 1 ) 'evidence for ' , or truth rank tantalum ( x ) for the component ten in the vague set A, and ( 2 ) 'evidence against ' , or false rank fa ( x ) for the component ten in the vague set A, such that:

Furthermore, each rank µ ( u ) in a obscure set A should be graded by the subinterval [ tau, 1-fAu ] , i.e 0=µ ( U ) =1. ( Liu et al. , 2008 )

There are several sorts of questions that considered being vague. Some of these sorts are:

Variation in spelling ( ex: documents published at International Conference on Internet Computing, which can be spelled in ICIC )
Low correlativity between the question constituents ( ex: Leader of the University of Huddersfield )
Comparative words ( ex: cheaper, most beautiful )
Statistical constructs ( ex: norm, average ) .

3. Current province and Problem designation

Although obscure questions have been processed before, utilizing different attacks, all of them were covering merely with the original XML paperss. Some of these attacks depend on the tree form of the XML papers ( Sihem Amer-Yahia et al. , 2002, P. Mark Pettovello and Fotouhi, 2006 ) , while others depend on break uping the obscure question into two sub-queries and recover information depending on the nested tickets that distinguish XML paperss from other text paperss. ( Vojkan Mihajlovi?c et al. , 2006, Pehcevski, 2006, Andrew Trotman and Mounia Lalmas, 2006 )

Neither the tree construction nor the nested tickets still exist in the tight XML paperss. This makes it impossible for the bing techniques to recover information from the tight paperss.

Many systems presents are change overing the field XML paperss to a tight one before they answer the user 's question, but none of them handled the obscure question. Table 3 shows some illustrations of well-known XML compressors and the question types they processed. Figure 1 explains the experiment that has been done on all the queriable XML compressors.

For this ground, our research is concentrating on how to manage a obscure question in recovering information from a tight XML papers.

3.1 Purposes and Aims

Harmonizing to the literature reappraisal, the chief purpose of this research is to develop a system that solves the job of recovering information from compressed files harmonizing to obscure questions. The aims drawn from this research are:

Develop XQPoint as a new compaction technique. The input to XQPoint is an XML papers and the end product is the compaction version of this papers.
Develop the question processor which procedure vague questions and recover information consequently from the tight XML papers.

4. Initial System Architecture

The proposed system consists of two chief phases. The first 1 is planing a new XML compaction technique named XQPoint which converts the normal XML paperss to a tight version. The 2nd is planing a retrieving technique that processes the NEXI obscure questions type in order to recover the relevant information from the tight papers consequently. The following two paragraphs describe the construction of the old two phases.

4.1 Design the XQPoint compaction technique

The undermentioned subdivision describes foremost how the XQPoint dainties each portion of the XML papers, so explicate its architecture, and the set of informations that will be used to prove the compressor portion of the system.

4.1.1 Architecture of XQPoint

XQPoint compressor treats the construction portion of an XML papers in different mode than handling the informations portion of the papers. Figure 2 shows that the XML papers should be analyzed foremost in order to divide its constituent into different containers. Each component or property name is associated with a alone brace of Numberss [ IDpre, IDpost ] . This process is called structural identifiers, which has been used in some questioning techniques ( Al-Khalif A et al. , 2002, Halverson et al. , 2003, Grust, 2002, Paparizos et al. , 2003 ) . IDpre represents the order of the node under the preorder traverse of the tree, while IDpost represents the order of the node under postorder traverse. In this manner the place of each node within the complete XML tree become recognizable. Figure 3 shows a sample papers with the node 's structural identifiers. The list of all brace of identifiers so are encoded into a binary codification with 2* log2 ( N ) spots for each node, where N is the figure of elements in the papers. So, the entire size of spots needed to hive away all the elements is N*2* log2 ( N ) spots.

In order to compact the informations portion of the XML papers, XQPoint separates the information of the papers into different containers harmonizing to their way place ( an encoded way ) from the root and type of these informations. Each of these containers is compressed utilizing different encoding techniques as follows:

Integer informations type: XQPoint uses variable-byte coding to encode the whole number Numberss. This attack shops the whole number Numberss which are of variable sizes into a byte sequence. The first left most seven spots of each byte stored a portion of the whole number value, while the right most spot of the byte indicates whether that byte is the last byte in the sequence ( Stanford, M.Manikandan et al. , 2006 ) .

Is used to

Floating-point type: Predictive Floating-Point Compaction is used to compact this sort of information. This attack splits both parts of the figure into mark, advocate, and fixed-point part and dainties each portion with a context based arithmetic programmer. ( Cheng and NG, 2004 )
Enumerated informations and text informations: the enumerated informations in XML are the property values that occur often, such as set of states, sections in a university, or zip-codes. XQPoint uses the Fixed Point Number Representation Technique ( FPNRT ) , which has been used in spelling checker as a compaction technique, to change over the enumerated information words and the text informations into numeral values. The numeral value is calculated utilizing the undermentioned expression:

Where n represents a information word length, ASC is the ASCII codification of any missive in the word, and I is the missive 's place in the word.

4.1.2 Data set used to prove XQPoint compressor

To prove XQPoint compaction technique, we should take a set of different types of XML paperss. These paperss should be in different sizes, figure of tickets, figure of nodes, the deepness of the longest way, and the informations ratio ( DR ) which is:

Where, DRd is the informations ratio for the XML papers ( vitamin D ) , ( D ) is the information, and ( Si ) represents the size of the XML papers.

Harmonizing to their chief features, XML paperss can be categorized into three types: ( Maneth et al. , 2008, Sakr, 2009 )

Textual paperss ( TD ) : The DRd of this type of paperss exceeds 70 % . The construction of these paperss is really simple. Books and articles are illustrations of this type.
Structural paperss ( SD ) : In this type of XML paperss, the DRd is less than 30 % . Baseball box mark and line-item transportation are two illustrations of this type.
Regular paperss ( RD ) : These paperss have DRd between 40 and 60 per centum. Relational databases are illustrations of this type.

4.2 Design the question processor

The 2nd portion of the proposed system is the obscure question processor. As shown in Figure 2, the question should be manipulated by the question processor portion of the XQPoint architecture. The construction of this portion is adopted from the question decomposition technique proposed in ( Al-Hamadani et al. , 2009 ) , which decomposed the obscure question into two parts, QCO which refers to Content-Only retrieval, and QCAS which refers to Content-And-Structure retrieval.

As you can see in Figure 4, the question processor manipulates each question through different stairss. The first measure is the query decomposition measure, which separated each question into different sub-queries. Figure 5 depicts an illustration of a NEXI obscure question that passed through this measure and be decompressed into four sub-queries.

The 2nd measure is the sub-query relaxation, where each sub-query is manipulated individually harmonizing to a specific XML papers. The relaxation procedure could be made either by altering the node sequence, adding more nodes, canceling some nodes, or altering some attribute values. A threshold should be attached to each sub-query to in order to find the degree of relaxation that is made to it. The lower threshold means low degree relaxation ( i.e. fewer alterations ) , while the higher threshold means high degree of relaxation.

The 3rd measure of the question processor is recovering the tight XML papers harmonizing to each relaxed sub-query. These retrieved paperss are ranked harmonizing to the attached sub-query threshold. The concluding measure is to group the retrieved paperss depending on the chief NEXI question. The Top-K graded paperss are decompressed and returned to the user.

5. Survey program

6. Decisions

As the increasing importance of XML use in hive awaying and reassigning informations via the World Wide Web, there is an increasing demand to diminish the size of XML paperss and to cover with these paperss in their tight manner. And as XML paperss are spread, their users are changing from an expert with strong questions to a naif user with obscure questions. Due to the old grounds, there is an increasing demand to plan a system that has the ability to accomplish both, compacting the XML papers and recovering the most relevant information harmonizing to obscure questions. This study proposes such a system that develops a new compaction technique ; XQPoint which separates informations from the construction of XML paperss and so compresses each portion as applicable. Following, the obscure question processor is used to break up the obscure question, procedure each sub-query individually, and so unite the retrieved consequences.

We expect that this technique achieve high compaction ratio and expeditiously retrieve information from the tight version.

Reference list

GZip Compressor, . hypertext transfer protocol: //www.gzip.org/ .
AL-HAMADANI, B. T. , ALWAN, R. F. , LU, J. & A ; YIP, J. ( 2009 ) Vague Content and Structure ( VCAS ) Retrieval for XML Electronic Healthcare Records ( EHR ) . Continuing of the 2009 International Conference on Internet Computing. USA.
AL-KHALIF A, S. , JAGADISH, H. , PATEL, J. , WU, Y. , KOUDAS, N. & A ; SRIVASTAVA, D. ( 2002 ) Structural Articulations: A Primitive for Efficient XML Query Pattern Matching. IN IEEE ( Ed. 8th International Conference on Data Engineering. San Jose, CA, USA.
ALISTAIR, M. , RADFORD, M. N. & A ; IAN, H. W. ( 1998 ) Arithmetic cryptography revisited. ACM Trans. Inf. Syst. , 16, 256-294.
ANDREW TROTMAN & A ; MOUNIA LALMAS ( 2006 ) Strict and Vague Interpretation of XML-Retrieval Questions. SIGIR'06.
ANDREW WATT ( 2002 ) XPath Necessities, John Wiley & A ; Sons Publishing.
ARION, A. , BONIFATI, A. , MANOLESCU, I. & A ; PUGLIESE, A. ( 2007 ) XQueC: A query-conscious tight XML database. ACM Trans. Internet Technol. , 7, 10.
AUGERI, C. ( 2008 ) On Some Consequences in Unmanned Aerial Vehicle Swarms. Dept. of Electrical & A ; Computer Engineering. San Diego, CA, USA, Air Force Institute of Technology.
AUGERI, C. J. , BULUTOGLU, D. A. , MULLINS, B. E. , BALDWIN, R. O. & A ; LEEMON C. BAIRD, I. ( 2007 ) An analysis of XML compaction efficiency. Proceedings of the 2007 workshop on Experimental computing machine scientific discipline. San Diego, California, ACM.
BZIP2 ( 1996 ) hypertext transfer protocol: //www.bzip.org/ .
CHENG, J. & A ; NG, W. ( 2004 ) XQZip: Querying Compressed XML utilizing Structural Indexing. International Conference on Extending Data Base Technology ( EDBT ) .
CLEARY, J. & A ; WITTEN, I. ( 1984 ) Data Compression Using Adaptive Coding and Partial String Matching. Communications, IEEE Transactions, 32, 396 - 402.
EUROPEAN, E. , AGANCY, hypertext transfer protocol: //www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-database-1
FERRAGINA, P. , LUCCIO, F. , MANZINI, G. & A ; MUTHUKRISHNAN, S. ( 2006 ) Compressing and seeking XML informations via two nothings. Proceedings of the fifteenth international conference on World Wide Web. Edinburgh, Scotland, ACM.
GIRARDOT, M. & A ; SUNDARESAN, N. ( 2000 ) Millau: an encoding format for efficient representation and exchange of XML over the Web. Computer Networks, 33, 747-765.
GRUST, T. ( 2002 ) Accelerating XPath location stairss. IN ACM ( Ed. ACM SIGMOD International Conference on Management of Data. Madison, WI, USA.
GUA W. L. & A ; BUEHRER, D. , J ( 1993 ) Vague sets. IEEE Transactions on Systems, 23, 610-614.
GZIP ( 1992 ) hypertext transfer protocol: //www.gzip.org/ .
HALVERSON, A. , BURGER, J. , GALANIS, L. , KINI, A. , KRISHNAMURTHY, R. , RAO, A. , TIAN, F. , VIGLAS, S. , WANG, Y. , NAUGHTON, J. & A ; DEWITT, D. ( 2003 ) Mixed Mode XML Query Processing. 29th International Conference on Very Large Data Bases. Berlin, Germany.
HOLMAN, G. K. ( 2002 ) XSLT and XPath, Prentice Hall PTR.
KAY, M. ( 2004 ) XPath 2.0 Programmers Reference, Wiley Publishing, Inc.
KAY, M. ( 2008 ) XSLT 2.0 and XPath 2.0 Programmer 's Reference, Wiley Publishing, Inc.
KUMAR, A. & A ; BISWAS, R. ( 2009 ) A Study of Vague Search to Answer Imprecise Query. IJCSNS International Journal of Computer Science and Network Security, VOL.9, 198-205.
LEAGUE, C. & A ; ENG, K. ( 2007 ) Schema-Based Compaction of XML Data with Relax NG. IEEE informations compaction conference ( DCC ) . Utah.
LIEFKE, H. & A ; SUCIU, D. ( 2000 ) XMill: an Efficient Compressor for XML Data. ACM.
LIN, Y. , ZHANG, Y. , LI, Q. & A ; YANG, J. ( 2005 ) Supporting Efficient Query Processing on Compressed XML Files. SAC'05. USA, ACM.
LIU, Y. , WANG, G. & A ; FENG, L. ( 2008 ) A General Model for Transforming Vague Sets into Fuzzy Sets. Springer Berlin / Heidelberg, 5150, 133-144.
M.MANIKANDAN, BAGAN, K. B. & A ; T.PRATHIBA ( 2006 ) Images and Integer Compression Using Bit Based Cascade Coding. GVIP Journal, Volume 6.
MANETH, S. , MIHAYLOV, N. & A ; SAKER, S. ( 2008 ) XML Tree Structure Compression. XANTEC'08, IEEE Computer Society.
MIN, J.-K. , PARK, M.-J. & A ; CHUNG, C.-W. ( 2003 ) XPRESS: a queriable compaction for XML informations. Proceedings of the 2003 ACM SIGMOD international conference on Management of information. San Diego, California, ACM.
MOFFAT. , A. ( 1990 ) Implementing the PPM informations compaction strategy. IEEE Trans. on Comm. , 38 ( 11 ) , 1917-1921.
P. MARK PETTOVELLO & A ; FOTOUHI, F. ( 2006 ) MTree: An XML XPath Graph Index. SAC'06. Dijon, France, ACM.
PAPARIZOS, S. , AL-KHALIFA, S. , CHAPMAN, A. , JAGADISH, H. , V. , LAKSHMANAN, L. V. S. , NIERMAN, A. , PATEL, J. M. , SIRVASTAVA, D. , WIWATWATTANA, N. , WU, Y. & A ; YU, C. ( 2003 ) Lumber: A Native System for Questioning XML. ACM SIGMOD International Conference on Management of Data. San Diego, CA, USA, ACM.
PEHCEVSKI, J. ( 2006 ) Evaluation of Effective XML Information Retrieval. School of Computer Science and Information Technology. Australia, RMIT University.
PLAYS, S. hypertext transfer protocol: //www.cafeconleche.org/examples/shakespeare/ .
RAY, E. T. ( 2001 ) Learning XML Guide to Creating Self-Describing Data, O'Reilly Media Inc.
SAKR, S. ( 2009 ) XML compaction techniques: A study and comparing. Journal of Computer and System Sciences, 75, 303-322.
SIGURBJORNSSON, B. & A ; TROTMAN, A. ( 2003 ) Questions: INEX 2003 working group study. 2nd workshop of the enterprise for the rating of XML retrieval ( INEX ) .
SIHEM AMER-YAHIA, SUNGRAN CHO & A ; SRIVASTAVA, D. ( 2002 ) Tree Pattern Relaxation Sihem Amer. Widening Database Technology, 2287, 496 - 513.
SKIBINSKI, P. , GRABOWSKI, S. & A ; SWACHA, J. ( 2007 ) Effective Asymmertic XML Compression. CADSM.
SKIBINSKI, P. & A ; SWACHE, J. ( 2007 ) Uniting Efficient XML Compression with Query Processing. ADBIS. Springer-Verlag.
STANFORD hypertext transfer protocol: //nlp.stanford.edu/IRbook/html/htmledition/variable-byte-codes-1.html # fig: vbalgorithm.
TOLANI, P. M. & A ; HARITSA, J. R. ( 2000 ) XGRIND: A Query-friendly XML Compressor. IEEE 18th international conference on Data Engineering.
TROTMAN, A. & A ; SIGURBJ?ORNSSON, B. O. ( 2005 ) Narrowed Extended XPath I ( NEXI ) Advances in XML Information Retrieval Berlin / Heidelberg, Springer
VOJKAN MIHAJLOVI?C, DJOERD HIEMSTRA & A ; BLOK, H. E. ( 2006 ) Vague Element Selection and Query Rewriting for XML Retrieval. Sixth Information Retrieval workshop. Dutch Belgian.
WASHINGTON hypertext transfer protocol: //www.cs.washington.edu/research/xmldatasets/data/ .
WATERLOO hypertext transfer protocol: //softbase.uwaterloo.ca/~ddbms/projects/xbench/index.html.
WIKIPEDIA hypertext transfer protocol: //download.wikipedia.org/enwikinews/ .
WINZIP ( 1990 ) hypertext transfer protocol: //www.winzip.com/ .
ZHAO, F. & A ; MA, Z. M. ( 2009 ) Vague Query Based on Vague Relational Model. AISC Springer-Verlag Berlin Heidelberg.