In a new study, researchers at New York University (NYU) found that popular language models including Google’s BERT and ALBERT and Facebook’s RoBERTa reinforce harmful race, gender, socioeconomic, religious, age, sexual, and other stereotypes. While previous research has uncovered bias in many of the same models, this latest work suggests the biases are broader in scope than originally thought.
Pretrained language models like BERT and RoBERTa have achieved success across many natural language tasks. However, there’s evidence that these models amplify the biases present in the data sets they’re trained on, implicitly perpetuating harm with biased representations. AI researchers from MIT, Intel, and the Canadian initiative CIFAR have found high levels of bias from BERT, XLNet, OpenAI’s GPT-2, and RoBERTa. And researchers at the Allen Institute for AI claim that no current machine learning technique sufficiently protects against toxic outputs, highlighting the need for better training sets and model architectures.