This exercise concerns the classification of spam email.
Create a corpus of spam email and one of non-spam mail. Examine each
corpus and decide what features appear to be useful for classification:
unigram words? bigrams? message length, sender, time of arrival? Then
train a classification algorithm (decision tree, naive Bayes, SVM,
logistic regression, or some other algorithm of your choosing) on a
training set and report its accuracy on a test set.
This exercise concerns the classification of spam email. Create a corpus of spam email and one of non-spam mail. Examine each corpus and decide what features appear to be useful for classification: unigram words? bigrams? message length, sender, time of arrival? Then train a classification algorithm (decision tree, naive Bayes, SVM, logistic regression, or some other algorithm of your choosing) on a training set and report its accuracy on a test set.