Towards markup-aware text compression

Moore, John, Kheirkhahzadeh, Antonio and Bagale, Jiva Nath (2014) Towards markup-aware text compression. In: 2014 Data Compression Conference (DCC), 26-28 Mar 2014, Snowbird, USA.

Full text not available from this repository.


Although text compression can be successfully applied to markup languages, it does so without any semantic knowledge of the data types present within the markup. In this paper we illustrate how this added knowledge can be used to develop a hybrid tool which combines traditional text compression with markup-awareness to improve compression size against existing well known text compression tools. Our results show that for highly structured markup it is possible to improve the level of compression by around 20% compared to the best performing existing tool we studied. We describe the limitations of our approach and discuss potential implementation options with the overall goal being to produce a practical Unix-like tool.

Item Type: Conference or Workshop Item (Paper)
ISSN: 1068-0314
ISBN: 9781479938827
Identifier: 10.1109/DCC.2014.80
Page Range: p. 417
Identifier: 10.1109/DCC.2014.80
Keywords: XML; Data compression; Text analysis; Unix-like tool; XML data; XML markup; Markup-aware text compression; Markup-awareness; Educational institutions; Hybrid power systems; Protocols; Roads; Runtime; XML; XML compression; Text compression
Subjects: Computing
Depositing User: John Moore
Date Deposited: 24 Nov 2014 12:38
Last Modified: 28 Aug 2021 07:17

Actions (login required)

View Item View Item