Convolution-deconvolution word embedding: an end-to-end multi-prototype fusion embedding method for natural language processing

Shuang, Kai; Zhang, Zhixuan; Loo, Jonathan; Su, Sen

Convolution-deconvolution word embedding: an end-to-end multi-prototype fusion embedding method for natural language processing

Lists

Shuang, Kai, Zhang, Zhixuan, Loo, Jonathan ORCID: https://orcid.org/0000-0002-2197-8126 and Su, Sen (2019) Convolution-deconvolution word embedding: an end-to-end multi-prototype fusion embedding method for natural language processing. Information Fusion, 53. pp. 112-122. ISSN 1566-2535

[thumbnail of Loo_etal_IF_2019_Convolution-deconvolution_word_embedding_an_end-to-end_multi-prototype_fusion_embedding_method_for_natural_language_processing.pdf]

Preview

PDF
Loo_etal_IF_2019_Convolution-deconvolution_word_embedding_an_end-to-end_multi-prototype_fusion_embedding_method_for_natural_language_processing.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (788kB) | Preview

Official URL: https://doi.org/10.1016/j.inffus.2019.06.009

Abstract

Existing unsupervised word embedding methods have been proved to be effective to capture latent semantic information on various tasks of Natural Language Processing (NLP). However, existing word representation methods are incapable of tackling both the polysemousunaware and task-unaware problems that are common phenomena in NLP tasks. In this work, we present a novel Convolution-Deconvolution Word Embedding (CDWE), an end-to-end multi-prototype fusion embedding that fuses context-specific information and taskspecific information. To the best of our knowledge, we are the first to extend deconvolution (e.g. convolution transpose), which has been widely used in computer vision, to word embedding generation. We empirically demonstrate the efficiency and generalization ability of CDWE by applying it to two representative tasks in NLP: text classification and machine translation. The models of CDWE significantly outperform the baselines and achieve state-of-the-art results on both tasks. To validate the efficiency of CDWE further, we demonstrate how CDWE solves the polysemous-unaware and task-unaware problems via analyzing the Text Deconvolution Saliency, which is an existing strategy for evaluating the outputs of deconvolution.

Item Type:	Article
Identifier:	10.1016/j.inffus.2019.06.009
Additional Information:	The authors would like to thank the anonymous reviewers for the constructive com-ments. This work was supported in part by the National Key Research and Development Program of China (No. 2017YFB1400603).
Keywords:	word embedding, multi-prototype, neural network, natural languageprocessing
Subjects:	Computing > Intelligent systems
Related URLs:	Publisher
Date Deposited:	23 May 2019
URI:	https://repository.uwl.ac.uk/id/eprint/6104