Natural Language Processing- Jigsaw Unintended Bias in Toxicity Classification Case Study

Nidhi Bansal
12 min readJan 13, 2020
Word Cloud

What is Natural Language Processing?

Natural Language Processing (NLP) refers to AI field which communicates with an intelligent/computer systems using a natural language such as English.
Through NLP computers can perform useful tasks with the natural languages humans use. The input and output of an NLP system can be-

  • Speech
  • Written Text

Here, we will discuss about Written text.

Processing of text means converting text to vectors.

Question: Why we convert text to vectors?

Machine Learning algorithms are based on concept of Linear Algebra and Statistics. Linear Algebra is applied on numerical data. So, when text is convert to vectors, all concepts can be applied easily.

Lets say we have text1, text2 and text3, we convert them to vectors v1, v2 and v3 respectively. Claculate geometric distance between vectors v1, v2 and between v1 and v3.

If Geometric Distance(v1,v2) > Geometric Distance(v1,v3)

means vi is more closer to v3 than v2, so we can say text1 and text3 are more similar.

Text Pre-processing

Before converting text there are some pre-processing steps.

  1. Begin by removing the html tags
  2. Remove any punctuation or limited set of special characters like , or . or # etc.
  3. Check if the word is made up of English letters and is not alpha-numeric
  4. Check to see if the length of the word is greater than 2 (as it was researched that there is no adjective in 2-letters)
  5. Convert the word to lowercase( as tasty and Tasty considered as different tokens)
  6. Remove Stop words(e.g. a, the, am, are etc.)
  7. Expand contractions(e.g. “aren’t” expanded to “are not”)
  8. Finally Stemming the word (It treats words like tasty, taste, tastes as a same word and store a word taste for all of this set of words)

Terminology

  1. Text or sentence for a row entry is called as text document.
  2. Collection of these text documents is called Corpus

Techniques to convert Text to vectors or Text Modeling

There are four techniques to convert text to vectors:
1. BOW (Bag of Words)
2. TF-IDF (Term Frequency-Inverse Document Frequency)
3. Avg Word2Vec
4. TF-IDF Word2Vec

1. BOW (Bag of Words)

In Bag of words, basically we create a set of unique words of dimension ‘d’ from the corpus. And corresponding vectors are created for each document on the basis if word is present or not in text document and how many counts of word is present in document.

Lets, take an example:

From above example distance between V1 and V2 is less so they seems similar but not. for sentences have different meanings i.e tasty and not tasty.

Solution: Bi-gram , Tri-gram and so on..

In Bi-Gram we take two consecutive words and make a unique word of it.
Let see an example:

So, from above example we can see, performance has been improved.

Similarly, in tri-grams, unique word is created with 3 consecutive words.

2. TF-IDF (Term Frequency-Inverse Document Frequency)

Second technique to convert text to vectors is TF-IDF.
TF is term frequency means how many times a term/word occurs in a document.
IDF is Inverse Document Frequency means inverse of frequency of word occurrence in all documents of corpus.

Lets understand this:

In BOW: vector value is occurrence of words in document.
In TF-IDF: vector value of TF-IDF value of words.

Advantage of TF-IDF:
1. More importance is given to words whose frequency is higher in a document(from TF).
2. Rare Words in Corpus will get more importance(from IDF).
3. Negligible importance given to Stop Words.

Word Embedding

In this technique each word converts in to vectors with some dimensions say ‘d’. This is called Word Embedding.
Word Embedding is a type of word representation that allows words with similar meaning to be understood by machine learning algorithms. Technically , it is a mapping of words into vectors of real numbers using the neural network, probabilistic model, or dimension reduction on word co-occurrence matrix.
There are various word embedding models available such as word2vec (Google), Glove (Stanford) and fastText (Facebook).

Two advantage of word embedding over BOW and TF_IDF:
1. Word embedding take symmetric words in to consideration.(e.g. Words like tasty and delicious)
2. Take relationship into consideration. For e.g.

Word Embedding (Source: Internet)

Here, ||v(man)- v(woman)|| is parallel to ||v(king)- v(queen)||

Similarly, country -capital relationship, verb tense relationship taken care of.

3. Avg Word2Vec

From Word Embedding we get vector of each word. In Avg Word2Vec average of all vectors in a document is taken to calculate average Word2Vec of the respective document.
Lets see formula for Avg Word2Vec:

Avg Word2vec is basically average of word vectors of the document.

4. TF-IDF Word2Vec

It is called TF-IDF weighted Word2Vec. In TF-IDF Word2Vec, TF-IDF weighted average of all vectors in a document is taken to calculate TF-IDF Word2Vec of the respective document. Formula is as below:

In above formula shown in picture, if all TF-IDF values becomes 1 , then TF-IDF Word2Vec becomes Avg Word2Vec.

Jigsaw Unintended Bias in Toxicity Classification Case Study

Problem Description:

The Conversation AI team, a research initiative founded by Jigsaw and Google (both part of Alphabet), builds technology to protect voices in conversation.
A main area of focus is machine learning models that can identify toxicity in online conversations, where toxicity is defined as anything rude, disrespectful or otherwise likely to make someone leave a discussion.
They use data by human raters to improve civility in online conversations for various toxic conversational attributes.

Context:
This is a Kaggle competetion: https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/overview

Data
train.csv, test.csv (Download from Kaggle)

Output to be submitted
It is in format submission.csv (Download from Kaggle)

Data overview

Attribute information:
* comment_text: text of individual comments
* target: toxicity label( to be predicted to for test data. target>=0.5 will be consider to be positive class(toxic))

When the Conversation AI team first built toxicity models, they found that the models incorrectly learned to associate the names of frequently attacked identities with toxicity. Models predicted a high likelihood of toxicity for comments containing those identities (e.g. “gay”), even when those comments were not actually toxic (such as “I am a gay woman”). This happens because training data was pulled from available sources where unfortunately, certain identities are overwhelmingly referred to in offensive ways.
There are some identity attributes which are taken care:
‘male’, ‘female’,’homosexual_gay_or_lesbian’, ‘muslim’, ‘christian’, ‘jewish’,’white’, ‘black’, ‘psychiatric_or_mental_illness’

aux_columns: ’severe_toxicity’,’obscene’,’identity_attack’,’insult’,’threat’,’sexual_explicit’,

Step 1: Data Loading and Data Splitting

First step is to load train and test data. Shape of train and test data is:
Shape of train_data (1804874, 45) Shape of test_data (97320, 2)

Columns in train data are:
‘id’, ‘target’, ‘comment_text’,’severe_toxicity’,’obscene’,’identity_attack’,’insult’,’threat’,’sexual_explicit’,’male’, ‘female’,’homosexual_gay_or_lesbian’, ‘muslim’, ‘christian’, ‘jewish’,’white’, ‘black’, ‘psychiatric_or_mental_illness’

We will take comment_text, target(output) and identity attributes and aux attributes(aux outputs)column.

Evaluation Metrics(This is used to calculate weights)

A newly developed metric that combines several submetrics to balance overall performance with various aspects of unintended bias is used here

First, we’ll define each submetric.

Overall AUC

This is the ROC-AUC for the full evaluation set.

Bias AUCs

To measure unintended bias, we again calculate the ROC-AUC, this time on three specific subsets of the test set for each identity, each capturing a different aspect of unintended bias. You can learn more about these metrics in Conversation AI’s recent paper Nuanced Metrics for Measuring Unintended Bias with Real Data in Text Classification.

Subgroup AUC: Here, we restrict the data set to only the examples that mention the specific identity subgroup. A low value in this metric means the model does a poor job of distinguishing between toxic and non-toxic comments that mention the identity.

BPSN (Background Positive, Subgroup Negative) AUC: Here, we restrict the test set to the non-toxic examples that mention the identity and the toxic examples that do not. A low value in this metric means that the model confuses non-toxic examples that mention the identity with toxic examples that do not, likely meaning that the model predicts higher toxicity scores than it should for non-toxic examples mentioning the identity.

BNSP (Background Negative, Subgroup Positive) AUC: Here, we restrict the test set to the toxic examples that mention the identity and the non-toxic examples that do not. A low value here means that the model confuses toxic examples that mention the identity with non-toxic examples that do not, likely meaning that the model predicts lower toxicity scores than it should for toxic examples mentioning the identity.

# Defining data
x_data = data['comment_text']
y_aux_data = data[aux_columns].fillna(0).values
# Initialize weights for calculating loss# Overall
weights = np.ones((len(data),)) / 4
# Subgroup
weights += (data[identity_attribute].fillna(0).values >= 0.5).sum(axis=1).astype(bool).astype(np.int) / 4
# Background Positive, Subgroup Negative
weights += (((data['target'].values >= 0.5).astype(bool).astype(np.int) +
(data[identity_attribute].fillna(0).values < 0.5).sum(axis=1).astype(bool).astype(np.int)) > 1).astype(bool).astype(np.int) / 4
# Background Negative, Subgroup Positive
weights += (((data['target'].values < 0.5).astype(bool).astype(np.int) +
(data[identity_attribute].fillna(0).values >= 0.5).sum(axis=1).astype(bool).astype(np.int)) > 1).astype(bool).astype(np.int) / 4
#Normalize them
#weights/= weights.mean()
loss_weight = 1.0 / weights.mean()
y_data=data['target'].valuesfor column in identity_attribute + ['target'] :
data[column] = np.where(data[column] >=0.5, True, False)

Data Splitting in ratio train:test is 2:1

Step 2: Data Preprocessing

In the Preprocessing phase we do the following with the comment_text in the order below:-

  1. Begin by removing the html tags
  2. Remove any punctuation or limited set of special characters like , or . or # etc.
  3. Check if the word is made up of english letters and is not alpha-numeric
  4. Check to see if the length of the word is greater than 2 (as it was researched that there is no adjective in 2-letters)
  5. Convert the word to lowercase
  6. Remove Stopwords
  7. Expand contractions
  8. Delete and isolate symbols
  9. Finally Snowball Stemming the word (it was observed to be better than Porter Stemming)

Below is example of comment text before and after preprocessing:

Before Preprocessing: Loving this collection. Cant wait till Season 2 is released. Should be any day now according to http://yeezy-season2.com/

After Preprocessing: loving collection not wait till season released day according

Step 3: Exploratory Data Analysis

Complete code available in my github profile(link shared in the end)

Univariate Analysis of data:

data.describe()

Analysis of ‘target’ variable of data:

We got data is highly imbalanced:

>>data['target'].value_counts()
0 1660540
1 144334
Name: target, dtype: int64

There are 92% of comments are non-toxic comments.

Distribution Plot of target:

Auxilary Labels analysis for toxic comments for attributes: ‘severe_toxicity’,’obscene’,’identity_attack’,’insult’,’threat’,’sexual_explicit’

Auxilary Labels analysis

‘Insult’ feature appears the most

Distribution plot of characters per comment:

Analysis:

  • 297 is mean number of characters length of text
  • 1 is minimum number of character in a text
  • 1906 is maximum number of characters in comment texts

Wordcloud to find most frequent words in comment text

Most frequent word is ‘Trump’

Step 3: Text Featurization

In this step we will create word embeddings using glove vectors.

I am creating word vectors of length 300 using glove.840B.300d.txt

With the help of tokenizer, we will get number of unique tokens.

A Embedded Matrix if created for unique token with glove vectors.

Example of a word vector of length 300 is:

In[20]:
embeddings_index["happy"]

Out[20]:

array([ 0.036775 ,  0.40917  , -0.52141  , -0.067184 ,  0.087702 ,
-0.048564 , 0.40947 , -0.42818 , 0.19304 , 2.3925 ,
-0.11441 , -0.22952 , -0.16061 , 0.035533 , -0.53179 ,
0.19764 , -0.48827 , 0.57439 , -0.064301 , 0.47053 ,
-0.29647 , -0.15927 , -0.052798 , 0.10121 , -0.054461 ,
0.036129 , -0.16118 , -0.34139 , 0.45834 , -0.20144 ,
-0.29067 , -0.51888 , -0.062106 , 0.14084 , 0.016413 ,
0.050826 , 0.13243 , -0.033663 , -0.42228 , -0.30086 ,
0.06202 , 0.26338 , 0.077223 , 0.27307 , 0.13392 ,
0.30183 , -0.16546 , 0.057011 , -0.0034585, -0.071113 ,
-0.27287 , -0.10297 , 0.07457 , -0.32104 , 0.36696 ,
0.27051 , -0.15776 , 0.2978 , -0.18988 , 0.097477 ,
0.035665 , -0.49749 , -0.52759 , -0.046148 , 0.021715 ,
-0.11047 , -0.18007 , 0.20295 , 0.15254 , -0.045976 ,
-0.21846 , -0.066865 , -0.21355 , 0.017509 , 0.66474 ,
0.25527 , 0.24864 , -0.094851 , -0.012857 , 0.46896 ,
0.052031 , 0.62488 , -0.12662 , 0.063972 , -0.15719 ,
-0.45907 , 0.32286 , -0.17502 , 0.64181 , 0.091587 ,
-0.075871 , 0.11718 , -0.13864 , 0.24951 , -0.40664 ,
0.08845 , -0.29196 , -0.51624 , -0.074847 , -0.012822 ,
-0.088844 , -0.19935 , 0.052734 , -0.13588 , 0.231 ,
-0.34368 , 0.30607 , -0.21223 , 0.08178 , 0.10097 ,
0.33585 , -0.17491 , 0.019115 , 0.15998 , 0.38803 ,
-0.35932 , 0.31682 , -0.18614 , 0.11732 , -0.068517 ,
0.50785 , -0.0035486, 0.20069 , 0.25218 , 0.38309 ,
0.19359 , 0.43857 , -0.29954 , -0.14219 , 0.087962 ,
-0.14229 , 0.10075 , -0.58986 , -0.12672 , 0.036944 ,
-0.050421 , -0.19875 , -0.051368 , -0.023402 , 0.08744 ,
-2.4938 , 0.15427 , 0.12373 , -0.0086429, -0.17007 ,
-0.519 , -0.29962 , 0.24369 , -0.20535 , -0.24942 ,
-0.079362 , 0.40986 , -0.10753 , 0.098907 , -0.063449 ,
0.05373 , 0.26206 , 0.13207 , -0.067694 , -0.56168 ,
-0.18867 , 0.14453 , -0.22469 , -0.28404 , 0.20909 ,
-0.46989 , 0.30992 , -0.13283 , 0.041392 , 0.11146 ,
0.17015 , -0.059407 , -0.16098 , -0.2211 , -0.0035877,
-0.22357 , -0.01852 , -0.23026 , -0.18824 , 0.32997 ,
0.16287 , -0.52067 , 0.17308 , -0.024264 , -0.041321 ,
-0.3241 , -0.44122 , -0.11114 , 0.22684 , -0.10883 ,
-0.1278 , -0.16696 , 0.051048 , -0.12131 , 0.18038 ,
0.19793 , 0.134 , -0.37113 , 0.36008 , 0.092685 ,
-0.30263 , 0.16565 , -0.10863 , -0.29565 , 0.26143 ,
0.13369 , -0.090181 , 0.021989 , -0.093353 , -0.20325 ,
-0.2008 , 0.20721 , 0.17208 , -0.20199 , 0.043315 ,
0.17768 , 0.57448 , -0.45917 , -0.077197 , 0.12051 ,
0.07209 , -0.095313 , 0.10973 , 0.22375 , 0.045804 ,
-0.13573 , 0.14041 , -0.11364 , -0.46605 , -0.43262 ,
-0.058678 , 0.19043 , -0.40867 , 0.30509 , 0.18542 ,
0.095309 , -0.42329 , -0.15225 , -0.13827 , 0.18119 ,
0.14755 , -0.053628 , 0.031298 , 0.65695 , -0.1717 ,
0.23649 , -0.34742 , -0.17438 , -0.085304 , 0.37687 ,
0.21322 , -0.13184 , -0.35197 , -0.14072 , 0.2332 ,
0.21014 , -0.14763 , 0.047515 , -0.27979 , 0.090331 ,
-0.15565 , 0.42803 , -0.019297 , 0.012198 , 0.036031 ,
-0.10396 , 0.11014 , 0.13458 , 0.2775 , 0.36225 ,
-0.35591 , -0.16877 , -0.41201 , 0.070133 , -0.27769 ,
0.13739 , -0.057831 , 0.19277 , 0.11131 , 0.53696 ,
0.0093424, -0.26107 , -0.38663 , 0.040653 , 0.18617 ,
0.26312 , 0.12212 , -0.030012 , 0.096286 , 0.47376 ,
-0.21633 , 0.10798 , -0.17703 , 0.22116 , 0.6726 ,
0.065036 , -0.017414 , -0.048585 , -0.090863 , 0.28591 ],
dtype=float32)

Step 4: Model Architecture

  1. Input Layer
  2. Embedding Layer( Embedding matrix is used at weights)
  3. Dropout
  4. 2 Bidirectional LSTM units
  5. Attention Layer
  6. Hidden Layer
  7. Output Layer

Model is created using the above model architecture.

For compiling the model ’binary_crossentropy’ loss metrics is used and for optimizer i have used ’adam’. For performance metrics of model accuracy is used.

Code is as below:

# Model Architecture
from keras.layers import Input, Dense, Embedding, SpatialDropout1D, add, concatenate
from keras.layers import CuDNNLSTM, Bidirectional, GlobalMaxPooling1D, GlobalAveragePooling1D
def build_model(embedding_matrix, num_aux_targets,loss_weight):
""" Model Architecture"""

LSTM_UNITS = 128
DENSE_HIDDEN_UNITS = 4 * LSTM_UNITS
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
embedding_layer = Embedding(*embedding_matrix.shape,
weights=[embedding_matrix],
trainable=False)
x = embedding_layer(sequence_input)
x = SpatialDropout1D(0.3)(x)
x = Bidirectional(CuDNNLSTM(LSTM_UNITS, return_sequences=True))(x)
x = Bidirectional(CuDNNLSTM(LSTM_UNITS, return_sequences=True))(x)
X = Attention(MAX_SEQUENCE_LENGTH)(x)
hidden = concatenate([
GlobalMaxPooling1D()(x),
GlobalAveragePooling1D()(x),
])
hidden = add([hidden, Dense(DENSE_HIDDEN_UNITS, activation='relu')(hidden)])
hidden = add([hidden, Dense(DENSE_HIDDEN_UNITS, activation='relu')(hidden)])

result = Dense(1, activation='sigmoid')(hidden)
aux_result = Dense(num_aux_targets, activation='sigmoid')(result)


# Compile model.
print('compiling model')
model = Model(inputs=sequence_input, outputs=[result, aux_result])
model.compile(loss=[custom_loss,'binary_crossentropy'],loss_weights=[loss_weight, 1.0],
optimizer='adam')
return(model)

Step 5: Results

We predict the validation data with our model.

Final_metric score is 96.2946537314013 on validation data.

Kaggle score is 92.389 on test data.

Screenshot of Kaggle score

Please check my GitHub link below for complete code:

References

https://www.appliedaicourse.com/

Future Work

Bidirectional Encoder Representations from Transformers (BERT) is a latest technique for NLP. Implementation of BERT model will further improve our performance.

Please give your valuable feedback, so that i can improve further and clap if you like my blog.

--

--