Unsupervised Mitigation of Gender Bias by Character Components: A Case Study of Chinese Word Embedding

Xiuying Chen, Mingzhe Li, Rui Yan, Xin Gao, Xiangliang Zhang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Word embeddings learned from massive text collections have demonstrated significant levels of discriminative biases. However, debiasing on the Chinese language, one of the most spoken languages, has been less explored. Meanwhile, existing literature relies on manually created supplementary data, which is time- and energy-consuming. In this work, we propose the first Chinese Gender-neutral word Embedding model (CGE) based on Word2vec, which learns gender-neutral word embeddings without any labeled data. Concretely, CGE utilizes and emphasizes the rich feminine and masculine information contained in radicals, i.e., a kind of component in Chinese characters, during the training procedure. This consequently alleviates discriminative gender biases. Experimental results show that our unsupervised method outperforms the state-of-the-art supervised debiased word embedding models without sacrificing the functionality of the embedding model.

Original languageEnglish (US)
Title of host publicationGeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
EditorsChristian Hardmeier, Christian Hardmeier, Christine Basta, Basta Christine, Marta R. Costa-Jussa, Gabriel Stanovsky, Hila Gonen
PublisherAssociation for Computational Linguistics (ACL)
Pages121-128
Number of pages8
ISBN (Electronic)9781955917681
StatePublished - 2022
Event4th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2022 - Seattle, United States
Duration: Jul 15 2022 → …

Publication series

NameGeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop

Conference

Conference4th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2022
Country/TerritoryUnited States
CitySeattle
Period07/15/22 → …

Bibliographical note

Publisher Copyright:
© 2022 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • General Psychology
  • Gender Studies

Fingerprint

Dive into the research topics of 'Unsupervised Mitigation of Gender Bias by Character Components: A Case Study of Chinese Word Embedding'. Together they form a unique fingerprint.

Cite this