Supervised training of Named Entity Recognition (NER) models generally require large amounts of annotations, which are hardly available for less widely used (low resource) languages, e.g., Armenian and Dutch. Therefore, it will be desirable if we could leverage knowledge extracted from a high resource language (source), e.g., English, so that NER models for the low resource languages (target) could be trained more efficiently with less cost associated with annotations. In this paper, we study cross-lingual alignment for NER, an approach for transferring knowledge from high- to low-resource languages, via the alignment of token embeddings between different languages. Specifically, we propose to align by minimizing the Wasserstein distance between the contextualized token embeddings from source and target languages. Experimental results show that our method yields improved performance over existing works for cross-lingual alignment in NER tasks.
|Original language||English (US)|
|Title of host publication||ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||5|
|State||Published - Jan 1 2022|