Fairness Playground

Welcome to the fairness playground! Here you can explore individual and group fairness of three BERT-based toxic text classification models. Unfortunately, in toxic text classification, AI often learns to associate the names of historically harassed identities with toxicity. Explore the corresponding group and individual biases of a regular model trained to maximize the performance (balanced accuracy due to class imbalance) and compare the results to an individually fair model trained with inFairness and to a group fair model trained with TFCO. Try our pre-populated examples and come up with your own tests!

Group Fairness

Similar Performance on Groups of Individuals (sentences)

Filter our validation dataset to only examples containing at least one of the specified words.

See performance on selected group
Type your own words/terms, or choose from one of the following examples.

Individual Fairness

Similar Treatment of Similar Individuals (sentences)

Compare how certain identity words affect a model's performance.

See performance between individual examples conditioned on a single word difference
Type your own template and terms, or choose from one of the following examples.