Cross-lingual gender prediction with multi-lingual embeddings and linguistic features

less than 1 minute read


Most systems for gender profiling have focused on mono-lingual use-cases. In this paper we implement two systems for cross-lingual gender prediction and compare them: One system is based on linguistic features while the other leverages a multi-lingual embedding approach (XLMRoBERTa). Moreover, we analyse which linguistic properties are most predictive for gender profiling across languages. We find that XLM-RoBERTa performs best with accuracy scores of up to 0.87. Classification on top of linguistic features did not consistently generalize cross-lingually. However, linguistic feature analysis supported previously observed divergences of male and female language use across languages.

The research paper is avaliable here.

Leave a comment