BiasEdit: Debiasing stereotyped language models via model editing

BiasEdit: Debiasing Stereotyped Language Models via Model Editing

Xin Xu¹ , Wei Xu² , Ningyu Zhang³ , Julian McAuley¹

¹UC San Diego ²Georgia Institute of Technology ³Zhejiang University

Abstract

Previous studies have established that pre-trained language models inherently manifest various bias. Although several debiasing strategies, such as retraining a whole model with counterfactual data, prompt tuning, and representation projection, have been introduced, they often fall short of efficiently eliminating bias or directly altering the models' biased essence. To address these issues, we propose BiasEdit, an efficient model editing method to remove stereotyped bias from language models with small editor networks. It contains a debiasing loss to guide editor networks to conduct local edits on partial parameters for debiasing, and a remaining loss to preserve the original language modeling abilities of language models during editing. Experiments demonstrate the high effectiveness and robustness of BiasEdit in eliminating bias compared to classical debiasing baselines, and its little impact on the language modeling and general capabilities of models. In addition, we conduct bias tracing and explore the effects of bias and debiasing via editing on language models.

BiasEdit

Figure 1: Debiasing a language model with BiasEdit. s: stereotyped. a: anti-stereotyped. m: meanless.

As shown in Figure 1, BiasEdit utilizes trained editor networks to produce parameter shifts for editing partial parameters of a language model. During debiasing, the debiasing loss guides the editor networks to produce parameter edits. The remaining loss preserves the original language modeling abilities of the model. After editing, an unbiased language model is obtained with the robustness of general capabilities, gender reverse and semantic generality.

@article{xin25biasedit, author = {Xin Xu, Wei Xu, Ningyu Zhang, Julian McAuley}, title = {BiasEdit: Debiasing Stereotyped Language Models via Model Editing}, year = {2025}, url = {https://github.com/zjunlp/BiasEdit} }