One-hot encode is widely used in nlp. In this tutorial, we will introduce how to create one-hot encode using scilit-learn MultiLabelBinarizer.
1.Import library
from sklearn.preprocessing import MultiLabelBinarizer import numpy as np
2.Prepare text data
y = [('Texas', 'Florida'), ('California', 'Alabama'), ('Texas', 'Florida'), ('Delware', 'Florida'), ('Texas', 'Alabama')]
3.Create one-hot encode using MultiLabelBinarizer()
one_hot = MultiLabelBinarizer() # One-hot encode data one_hot.fit_transform(y)
Run this code, you will get one-hoe encode as follows:
array([[0, 0, 0, 1, 1], [1, 1, 0, 0, 0], [0, 0, 0, 1, 1], [0, 0, 1, 1, 0], [1, 0, 0, 0, 1]])
4.View word one-hot encode
print(one_hot.classes_)
Run this code, you will see:
array(['Alabama', 'California', 'Delware', 'Florida', 'Texas'], dtype=object)