Skip to main content

Methods for numeracy-preserving word embeddings

Publication ,  Conference
Sundararaman, D; Si, S; Subramanian, V; Wang, G; Hazarika, D; Carin, L
Published in: EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
January 1, 2020

Word embedding models are typically able to capture the semantics of words via the distributional hypothesis, but fail to capture the numerical properties of numbers that appear in a text. This leads to problems with numerical reasoning involving tasks such as question answering. We propose a new methodology to assign and learn embeddings for numbers. Our approach creates Deterministic, Independent-of-Corpus Embeddings (referred to as DICE) for numbers, such that their cosine similarity reflects the actual distance on the number line. DICE outperforms a wide range of pre-trained word embedding models across multiple examples of two tasks: (i) evaluating the ability to capture numeration and magnitude; and (ii) to perform list maximum, decoding, and addition. We further explore the utility of these embeddings in downstream applications by initializing numbers with our approach for the task of magnitude prediction. We also introduce a regularization approach to learn model-based embeddings of numbers in a contextual setting.

Duke Scholars

Published In

EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

ISBN

9781952148606

Publication Date

January 1, 2020

Start / End Page

4742 / 4753
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sundararaman, D., Si, S., Subramanian, V., Wang, G., Hazarika, D., & Carin, L. (2020). Methods for numeracy-preserving word embeddings. In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 4742–4753).
Sundararaman, D., S. Si, V. Subramanian, G. Wang, D. Hazarika, and L. Carin. “Methods for numeracy-preserving word embeddings.” In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 4742–53, 2020.
Sundararaman D, Si S, Subramanian V, Wang G, Hazarika D, Carin L. Methods for numeracy-preserving word embeddings. In: EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2020. p. 4742–53.
Sundararaman, D., et al. “Methods for numeracy-preserving word embeddings.” EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2020, pp. 4742–53.
Sundararaman D, Si S, Subramanian V, Wang G, Hazarika D, Carin L. Methods for numeracy-preserving word embeddings. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2020. p. 4742–4753.

Published In

EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

ISBN

9781952148606

Publication Date

January 1, 2020

Start / End Page

4742 / 4753