Cross-lingual gender prediction with multi-lingual embeddings and linguistic features
Abstract:
Most systems for gender profiling have focused on mono-lingual use-cases. In this paper we implement two systems for cross-lingual gender prediction and compare them: One system is based on linguistic features while the other leverages a multi-lingual embedding approach (XLMRoBERTa). Moreover, we analyse which linguistic properties are most predictive for gender profiling across languages. We find that XLM-RoBERTa performs best with accuracy scores of up to 0.87. Classification on top of linguistic features did not consistently generalize cross-lingually. However, linguistic feature analysis supported previously observed divergences of male and female language use across languages.
The research paper is avaliable here.
Leave a comment