Welcome to the fairness playground! Here you can explore individual and group fairness of three BERT-based toxic text classification models. Unfortunately, in toxic text classification, AI often learns to associate the names of historically harassed identities with toxicity. Explore the corresponding group and individual biases of a regular model trained to maximize the performance (balanced accuracy due to class imbalance) and compare the results to an individually fair model trained with inFairness and to a group fair model trained with TFCO. Try our pre-populated examples and come up with your own tests!
Similar Performance on Groups of Individuals (sentences)
Filter our validation dataset to only examples containing at least one of the specified words.
Similar Treatment of Similar Individuals (sentences)
Compare how certain identity words affect a model's performance.