Natural Language Processing

Table of contents

N-gram character models

A sequence of written symbols of length n is called an n-gram with special case “unigram” for 1-gram, “bigram” for 2-gram, and “trigram” for 3-gram. A model of the probability distribution of n-letter sequences is thus called an n-gram model.

Character Level Unigrams :-

Character Level Bigrams :-

Character Level Triigrams :-

Text Classification

Given a text of some kind, to decide which of a predefined set of classes it belongs to is called text classification. Language identification , genre classification ,sentiment analysis and spam detection are examples of text classification.

Language :