Abstract—Calculating semantic similarity between sentences is a difficult task for computers due to the complex structure and syntax of a sentence. Typically, in order to represent a sentence, there are numerous significant characteristics which need to be alternatively considered, for example, ambiguity, words’ order, the context of sentences, etc. Various methods have been proposed to construct a language model for computing the similarity, such as average words embedding or sentence embedding based on auto-encoder architecture. However, these methods usually focus on the sentence and skip the influence of the previous sentences. In the paper, we introduce a novel approach to transform from sentences with context to embedding vectors based on auto-encoder architecture. Experiment results showed that the proposed method could find a better result for estimating similarity sentences in a certain scenario.
Index Terms—Semantic similarity, word embedding, sentence embedding, language model, auto-encoder.
All authors are with the Shibaura Institute of Technology, Koto-ku, Tokyo, Japan (e-mail: nb17502@shibaura-it.ac.jp, nb18503@shibaura-it.ac.jp, masaomi@sic.shibaura-it.ac.jp).
[PDF]
Cite: Dinh-Minh Vu, Thanh-Trung Trinh, and Masaomi Kimura, "A New Context-Based Sentence Embedding and the Semantic Similarity of Sentences," International Journal of Knowledge Engineering vol. 6, no. 1, pp. 7-11, 2020.
Copyright © 2020 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).