[15] Text III: Advanced Text Processing


[15.1] Intro

Text processing - humanistic (analytical) and artistic (generative). 
Contents:
-word counting;
-gather word-based statistics;
-counting words of certain types;
-ways to use simple word lists and lexical resource WordNet.

[15.2] Words and Sentences

words as tokens - the total number of words in a text;
words as types - the total number of the unique words.

word count = view of words as tokens

tokenisation - segmentation that determines what are the words in a text

[15.3] Adjective Counting with Part-of-Speech Tagging

We’re looking for a way to identify the adjectives in this sentence (and any sentence) so that we can count them.
[] == lists, having several things delimited by commas
() -- ('element', 'element') == pairs
'' == string

NN - noun; IN - proposition; JJ - adjective; DT - determiner; VBZ - “third-person “be,” singular, present.

[15.4] Sentence Counting with a Tokenizer