Natural Language Processing
Table of contents
N-gram character models
A sequence of written symbols of length n is called an n-gram with special case “unigram” for 1-gram, “bigram” for 2-gram, and “trigram” for 3-gram. A model of the probability distribution of n-letter sequences is thus called an n-gram model.
Character Level Unigrams :-
Character Level Bigrams :-
Character Level Triigrams :-
Text Classification
Given a text of some kind, to decide which of a predefined set of classes it belongs to is called text classification. Language identification , genre classification ,sentiment analysis and spam detection are examples of text classification.