Naive Bayes Algorithm for Binary Classification with Laplace Smoothing.

Data is taken from kaggle competition Real or Not? NLP with Disaster Tweets

Data Sample:

print(f"Two  tweets == {x_train[:2]}")
print(f"Tweet Label == {y_train[:2]}")
Two  tweets == [['Our', 'Deeds', 'are', 'the', 'Reason', 'of', 'this', '#earthquake', 'May', 'ALLAH', 'Forgive', 'us', 'all'], ['Forest', 'fire', 'near', 'La', 'Ronge', 'Sask.', 'Canada']]
Tweet Label == [1, 1]

class NaiveBayes[source]

NaiveBayes()

Naive Bayes Algorithm for Binary Classification

NaiveBayes.fit[source]

NaiveBayes.fit(X, y)

Train Naive Bayes.

Args:

X (nested list): nested list of tokenized samples.
y (list): list of corresponding lables.

NaiveBayes.predict[source]

NaiveBayes.predict(X)

Predict the labels for samples.

Args:

X(nested list): nested list of tokenized samples.

Returns:

list of predicted label.
from scratch.models.naive_bayes import NaiveBayes
nb = NaiveBayes()
nb.fit(x_train, y_train)
nb.classes
[1, 0]
nb.vocab_length
24501
predictions = nb.predict(x_test)
accuracy(y_test, predictions)
0.7329246935201401