On the performance of markup language compression

Kheirkhahzadeh, Antonio (2015) On the performance of markup language compression. Doctoral thesis, University of West London.

[img]
Preview
PDF
Antonio_Kheirkhahzadeh_PhD_Thesis_(April_2015).pdf - Accepted Version

Download (1MB)

Abstract

Data compression is used in our everyday life to improve computer interaction or simply for storage purposes. Lossless data compression refers to those techniques that are able to compress a file in such ways that the decompressed format is the replica of the original. These techniques, which differ from the lossy data compression, are necessary and heavily used in order to reduce resource usage and improve storage and transmission speeds. Prior research led to huge improvements in compression performance and efficiency for general purpose tools which are mainly based on statistical and dictionary encoding techniques.
Extensible Markup Language (XML) is based on redundant data which is parsed as normal text by general-purpose compressors. Several tools for compressing XML data have been developed, resulting in improvements for compression size and speed using different compression techniques. These tools are mostly based on algorithms that rely on variable length encoding. XML Schema is a language used to define the structure and data types of an XML document. As a result of this, it provides XML compression tools additional information that can be used to improve compression efficiency. In addition, XML Schema is also used for validating XML data. For document compression there is a need to generate the schema dynamically for each XML file. This solution can be applied to improve the efficiency of XML compressors.
This research investigates a dynamic approach to compress XML data using a hybrid compression tool. This model allows the compression of XML data using variable and fixed length encoding techniques when their best use cases are triggered. The aim of this research is to investigate the use of fixed length encoding techniques to support general-purpose XML compressors. The results demonstrate the possibility of improving on compression size when a fixed length encoder is used to compressed most XML data types.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: Markup Language Compression
Subjects: Computing
Depositing User: Marzena Dybkowska
Date Deposited: 15 Sep 2015 15:08
Last Modified: 16 May 2017 14:57
URI: http://repository.uwl.ac.uk/id/eprint/1266

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item

Menu