On the performance of markup language compression

Kheirkhahzadeh, Antonio (2015) On the performance of markup language compression. Doctoral thesis, University of West London.

[img]
Preview
PDF
Antonio_Kheirkhahzadeh_PhD_Thesis_(April_2015).pdf - Accepted Version

Download (1MB)

Abstract

Data compression is used in our everyday life to improve computer interaction or
simply for storage purposes. Lossless data compression refers to those techniques
that are able to compress a file in such ways that the decompressed
format is the replica of the original. These techniques, which differ from the
lossy data compression, are necessary and heavily used in order to reduce resource
usage and improve storage and transmission speeds. Prior research led
to huge improvements in compression performance and efficiency for generalpurpose
tools which are mainly based on statistical and dictionary encoding
techniques.
Extensible Markup Language (XML) is based on redundant data which is parsed
as normal text by general-purpose compressors. Several tools for compressing
XML data have been developed, resulting in improvements for compression
size and speed using different compression techniques. These tools are mostly
based on algorithms that rely on variable length encoding. XML Schema is a
language used to define the structure and data types of an XML document. As
a result of this, it provides XML compression tools additional information that
can be used to improve compression efficiency. In addition, XML Schema is
also used for validating XML data. For document compression there is a need
to generate the schema dynamically for each XML file. This solution can be
applied to improve the efficiency of XML compressors.
This research investigates a dynamic approach to compress XML data using
a hybrid compression tool. This model allows the compression of XML data
using variable and fixed length encoding techniques when their best use cases
are triggered. The aim of this research is to investigate the use of fixed length
encoding techniques to support general-purpose XML compressors. The results
demonstrate the possibility of improving on compression size when a fixed
length encoder is used to compressed most XML data types.

Item Type: Thesis (Doctoral)
Subjects: Computer science, knowledge and information systems
Depositing User: Marzena Dybkowska
Date Deposited: 15 Sep 2015 15:08
Last Modified: 06 Jun 2016 15:09
URI: http://repository.uwl.ac.uk/id/eprint/1266

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item

Menu