from nltk.book import *
*** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908
text1
<Text: Moby Dick by Herman Melville 1851>
text6
<Text: Monty Python and the Holy Grail>
text1.concordance("white")
Displaying 25 of 281 matches: once a whale in Spitzbergen that was white all over ." -- A VOYAGE TO GREENLAND a swinging sign over the door with a white painting upon it , faintly represent deeply brown and burnt , making his white teeth dazzling by the contrast ; whi ed to me . I remembered a story of a white man -- a whaleman too -- who , falli ver heard of a hot sun ' s tanning a white man into a purplish yellow one . How while yet afloat . And ever , as the white moon shows her affrighted face from o all his ivory teeth , like so many white bolts , upon his prison . Then Jonah ld be so companionable ; as though a white man were anything more dignified tha our starboard hand till we opened a white church to the larboard , and then ke ened in the moonlight ; and like the white ivory tusks of some huge elephant , s needlessly , ye harpooneers ; good white cedar plank is raised full three per umility in looking up at him ; and a white man standing before him seemed a whi ite man standing before him seemed a white flag come to beg truce of a fortress l powers of discernment . So that no white sailor seriously contradicted him wh g grimness was owing to the barbaric white leg upon which he partly stood . It ant for sereneness , to send up mild white vapours among mild white hairs , not end up mild white vapours among mild white hairs , not among torn iron - grey l are whales hereabouts ! If ye see a white one , split your lungs for him ! " W something queer about that , eh ? A white whale -- did ye mark that , man ? Lo he whales , making more gay foam and white water generally than any other of th pard - like look , being of a milk - white ground colour , dotted with round an o separate colours , black above and white below . The white comprises part of , black above and white below . The white comprises part of his head , and the ike a mute , maned sea - lion on the white coral beach , surrounded by his warl Alas ! Dough - Boy ! hard fares the white waiter who waits upon cannibals . No
text3.concordance("lived")
Displaying 25 of 38 matches: ay when they were created . And Adam lived an hundred and thirty years , and be ughters : And all the days that Adam lived were nine hundred and thirty yea and nd thirty yea and he died . And Seth lived an hundred and five years , and bega ve years , and begat Enos : And Seth lived after he begat Enos eight hundred an welve years : and he died . And Enos lived ninety years , and begat Cainan : An years , and begat Cainan : And Enos lived after he begat Cainan eight hundred ive years : and he died . And Cainan lived seventy years and begat Mahalaleel : rs and begat Mahalaleel : And Cainan lived after he begat Mahalaleel eight hund years : and he died . And Mahalaleel lived sixty and five years , and begat Jar s , and begat Jared : And Mahalaleel lived after he begat Jared eight hundred a and five yea and he died . And Jared lived an hundred sixty and two years , and o years , and he begat Eno And Jared lived after he begat Enoch eight hundred y and two yea and he died . And Enoch lived sixty and five years , and begat Met ; for God took him . And Methuselah lived an hundred eighty and seven years , , and begat Lamech . And Methuselah lived after he begat Lamech seven hundred nd nine yea and he died . And Lamech lived an hundred eighty and two years , an ch the LORD hath cursed . And Lamech lived after he begat Noah five hundred nin naan shall be his servant . And Noah lived after the flood three hundred and fi xad two years after the flo And Shem lived after he begat Arphaxad five hundred at sons and daughters . And Arphaxad lived five and thirty years , and begat Sa ars , and begat Salah : And Arphaxad lived after he begat Salah four hundred an begat sons and daughters . And Salah lived thirty years , and begat Eber : And y years , and begat Eber : And Salah lived after he begat Eber four hundred and begat sons and daughters . And Eber lived four and thirty years , and begat Pe y years , and begat Peleg : And Eber lived after he begat Peleg four hundred an
text5.concordance("lol")
Displaying 25 of 822 matches: ast PART 24 / m boo . 26 / m and sexy lol U115 boo . JOIN PART he drew a girl w ope he didnt draw a penis PART ewwwww lol & a head between her legs JOIN JOIN s a bowl i got a blunt an a bong ...... lol JOIN well , glad it worked out my cha e " PART Hi U121 in ny . ACTION would lol @ U121 . . . but appearently she does 30 make sure u buy a nice ring for U6 lol U7 Hi U115 . ACTION isnt falling for didnt ya hear !!!! PART JOIN geeshhh lol U6 PART hes deaf ppl here dont get it es nobody here i wanna misbeahve with lol JOIN so read it . thanks U7 .. Im hap ies want to chat can i talk to him !! lol U121 !!! forwards too lol JOIN ALL PE k to him !! lol U121 !!! forwards too lol JOIN ALL PErvs ... redirect to U121 ' loves ME the most i love myself JOIN lol U44 how do u know that what ? jerkett ng wrong ... i can see it in his eyes lol U20 = fiance Jerketts lmao wtf yah I cooler by the minute what 'd I miss ? lol noo there too much work ! why not ?? that mean I want you ? U6 hello room lol U83 and this .. has been the grammar the rule he 's in PM land now though lol ah ok i wont bug em then someone wann flight to hell :) lmao bbl maybe PART LOL lol U7 it was me , U83 hahah U83 ! 80 ht to hell :) lmao bbl maybe PART LOL lol U7 it was me , U83 hahah U83 ! 808265 082653953 K-Fed got his ass kicked .. Lol . ACTION laughs . i got a first class . i got a first class ticket to hell lol U7 JOIN any texas girls in here ? any . whats up U155 i was only kidding . lol he 's a douchebag . Poor U121 i 'm bo ??? sits with U30 Cum to my shower . lol U121 . ACTION U1370 watches his nads ur nad with a stick . ca u U23 ewwww lol *sniffs* ewwwwww PART U115 ! owww spl ACTION is resisting . ur female right lol U115 beeeeehave Remember the LAst tim pm's me . charge that is 1.99 / min . lol @ innocent hahah lol .... yeah LOLOLO is 1.99 / min . lol @ innocent hahah lol .... yeah LOLOLOLLL U12 thats not nic s . lmao no U115 Check my record . :) Lol lick em U7 U23 how old r u lol Way to
text1.similar("white")
same the old whale long great other first large good red whole very best dead polar sperm particular fine flying
text1.common_contexts(["white", "whale"])
the_bone a_and the_head the_for
text1.generate()
long , from one to the top - mast , and no coffin and went out a sea captain -- this peaking of the whales . , so as to preserve all his might had in former years abounding with them , they toil with their lances , strange tales of Southern whaling . at once the bravest Indians he was , after in vain strove to pierce the profundity . ? then ?" a levelled flame of pale , And give no chance , watch him ; though the line , it is to be gainsaid . have been
'long , from one to the top - mast , and no coffin and went out a sea\ncaptain -- this peaking of the whales . , so as to preserve all his\nmight had in former years abounding with them , they toil with their\nlances , strange tales of Southern whaling . at once the bravest\nIndians he was , after in vain strove to pierce the profundity . ?\nthen ?" a levelled flame of pale , And give no chance , watch him ;\nthough the line , it is to be gainsaid . have been'
len(text1) # how many tokens (sequence of characters that is treated as a group)
260819
set(text1) # set of vocabulary that is used (no matter if they are used only one or several times in ther text)
{'Dr', 'juices', 'needlessly', 'palisades', 'Malay', 'scientific', 'Smeer', 'Jonas', 'wrapper', 'contrasting', 'Specksioneer', 'Mammoth', 'Terra', 'condensed', 'incorruptible', 'recent', 'ineffable', 'strong', 'abashed', 'glimpses', 'jurisprudence', 'Tarsus', 'hugging', 'heaps', 'emotion', 'traces', 'outraged', 'dams', 'liquid', 'terrier', 'sank', 'Fishes', 'Belubed', 'bird', 'feverishly', 'eulogy', 'instead', 'wrapall', 'Caw', 'mustering', 'stings', 'STEPS', 'burrower', 'Liberties', 'rockings', 'lumber', 'smaller', 'unnecessary', 'stowed', 'uncouth', 'approaches', 'Dives', 'ELIZA', 'disproved', 'Kick', 'corroborated', 'foibles', 'able', 'czar', 'syllable', 'rehearsing', 'charmingly', 'mutineers', 'Phaedon', 'unmistakable', 'eastwards', 'wast', 'images', 'probationary', 'legislative', 'sally', 'amphitheatrical', 'Buoyed', 'slumbers', 'heartless', 'circular', 'balanced', 'Whosoever', 'shifting', 'Fleet', 'uncontaminated', 'imperceptibly', 'fanning', 'noiseless', 'unwarrantable', 'andirons', 'splice', 'massive', 'operation', 'asylum', 'Mississippi', 'incorporate', 'Rhyme', 'Midnight', 'extracted', 'embodied', 'Dominic', 'changing', 'entirely', 'while', 'citation', 'executioner', 'Deliberately', 'roasted', 'sloped', 'Den', 'CAPTORS', 'eats', 'tune', 'swelled', 'Cap', 'ears', 'parti', 'hunger', 'mentioned', 'managing', 'enjoining', 'Ecclesiastes', 'together', 'colony', 'willing', 'hose', 'crushing', 'Ledyard', 'girls', 'entrenched', 'Bud', 'seething', 'cut', 'slippery', 'molten', 'proved', 'remain', 'badger', 'selectest', 'hones', 'independence', 'progenitors', 'sordidness', 'solely', 'wedged', 'nightmare', 'reduced', 'plumping', 'Samson', ',--"', 'reigneth', 'diminish', 'genius', 'keeled', 'piercing', 'essayed', '8', 'premeditated', 'wonst', 'Shaking', 'bountifully', 'biased', 'Consider', 'Oft', 'hisself', 'laying', 'Napoleon', 'anaconda', 'degrees', 'vain', 'bulky', 'cock', 'careens', 'flies', 'whispers', 'chuckled', 'groan', 'oaken', 'moons', 'canted', 'advance', 'mocked', 'mow', '3', 'experienced', 'incorruption', 'slid', 'forthing', 'frenzied', '.--"', 'owned', 'CROW', 'transpired', 'explained', 'cumbrous', 'widest', 'landsman', 'verdure', 'conceives', 'certain', 'troops', 'Elsewhere', 'Commodores', 'heathen', 'hunch', 'houseless', 'abominate', 'linear', 'predestinating', 'Whose', 'tragedies', 'Bachelor', 'dilated', 'declines', 'Blinding', 'scrutiny', 'whaleboning', 'brokenly', 'Shan', 'disordered', 'candidate', 'metal', 'overcome', 'upside', 'weakened', 'watch', 'hie', 'retarded', 'deploy', 'sleepers', 'Ireland', 'SOME', 'stock', 'erections', 'interferes', 'corresponds', 'England', 'significant', 'coiled', 'dignified', 'preternaturalness', 'gaff', 'trio', 'negative', 'touched', 'feast', 'label', 'waning', 'IS', 'programme', 'knife', 'thews', 'wolves', "?'--'", 'seductive', 'apparatus', 'libraries', 'hath', 'eagerly', 'comical', 'ruffles', 'soonest', 'hoar', 'rules', 'sparks', 'Shall', 'besooted', 'steered', 'Crossed', 'bucks', 'screwed', 'utilities', 'corrupt', 'thread', 'improvements', 'Exploring', 'masted', 'wished', 'Smuggled', 'perpendicularly', 'Hospital', 'face', 'lamb', 'unlikely', 'speaker', 'refer', 'glided', 'exception', 'materially', 'tunic', 'obscure', 'reproachfully', 'Hotel', 'bumpkin', 'affirms', 'nondescript', 'Tiger', 'graved', 'preceding', 'aleak', 'subtleness', 'numbed', 'unmoor', 'service', 'prophetic', 'incarcerated', 'flounders', 'coupled', 'interweavingly', 'Scorpio', 'hemp', 'TO', 'consisted', 'molasses', 'SHOALS', 'pleasant', 'VERBAL', 'cowhide', 'fresh', 'threshold', 'breeding', 'discover', 'Eternity', 'juicy', 'unicorns', 'Rig', 'wharf', '?', 'cliffs', 'abundant', 'unretracing', 'parents', 'dictionary', 'Justice', 'short', 'hard', '4TH', 'glazier', 'incites', 'salamander', 'inflicted', 'dived', 'whisperingly', 'Panting', 'administering', 'interflowing', 'tastefully', 'ABOUT', 'surveying', 'unconditional', 'cavalier', 'Himmalehs', 'lurches', 'footman', 'VEINS', 'crystallized', 'Requiem', 'powerfully', 'Loaded', 'starts', 'vest', 'sake', 'canvas', 'earnestness', 'July', 'keeling', 'vapoured', 'reforming', 'orator', 'smite', 'predicted', 'hospitality', 'elasticity', 'crest', 'Further', 'crotch', 'navigator', 'Librarian', 'gallery', 'nobly', 'Midwifery', 'OFFICIO', 'pouch', 'Abel', 'Try', 'sixpence', 'similes', 'placing', 'forces', 'Affected', 'occurs', 'bony', 'wounds', 'prized', 'laugh', 'revelled', 'spawned', 'rains', 'HOW', 'Zeuglodon', 'delivery', 'mastership', 'China', 'effectually', 'vero', 'Push', 'begged', 'appeals', 'painfully', 'dies', 'why', 'course', 'eventually', 'torments', 'Cough', 'restricting', 'rarely', 'Herschel', 'squattings', 'rehearsed', 'SUDDEN', 'oftentimes', 'rods', 'cleft', 'sobbing', 'Bowditch', 'heels', 'jointed', 'consequent', 'polite', 'connect', 'parenthesize', 'lowering', 'explosion', 'WHEN', 'ERECTION', 'confused', 'alms', 'mannerly', 'delusions', 'presumption', 'lowly', 'soggy', 'invitingly', 'flourishings', 'hurt', 'Spring', 'mill', 'rascally', 'warringly', 'Pupella', 'Days', 'twice', 'conveys', 'RIGGING', 'admonishing', 'behead', 'ignore', 'accompanies', 'Zoned', 'eddied', 'enduring', 'Mungo', 'lad', 'tenement', 'pocket', 'pyramid', 'shipyards', 'gradations', 'quaint', 'Depend', 'impatience', 'sheered', 'walrus', 'bags', 'guests', 'clamor', 'lightning', 'futures', 'paramount', 'migratory', 'kindle', 'putting', 'breeching', 'joiner', 'obtains', 'shudderingly', 'prescribed', 'Unhinge', 'abode', 'manhandle', 'gudgeons', 'exceed', 'Smeerenberg', 'silently', 'assailant', 'industry', 'fearfulness', 'tray', 'bloodshed', 'comparatively', 'throbbing', 'shields', 'speeding', 'eely', 'inwreathing', 'Fountain', 'ha', 'sensitive', 'Manhatto', 'connexion', 'firmer', 'nutmeg', 'fetch', 'Suspended', 'employs', 'big', 'crushed', 'kindhearted', 'Both', 'reason', 'thing', 'infinite', 'annihilation', 'combat', 'Western', 'Grampus', 'ISOLATO', '75', 'progressive', 'contrary', 'SULPHUR', 'resulting', 'BROWN', 'billow', 'interrupt', 'supplemental', 'obliquity', 'unread', 'perfect', 'anointing', 'SPOUTING', 'Paris', 'Mate', 'blessing', 'pause', 'constellation', 'muffling', 'HO', 'arise', 'fronting', 'perilousness', 'settled', 'ME', 'defied', 'corpse', 'Ginger', 'constellations', 'pelvis', 'merciful', 'economical', 'Hollanders', 'diligent', 'shuffling', 'buried', 'jurisdiction', 'slept', 'wages', 'steeds', '32', 'firkins', 'profoundly', 'outfit', 'embodiment', 'providential', 'curly', 'nurtured', '102', 'felt', 'mustn', 'attained', 'knee', 'RETAKING', 'curling', 'aisle', 'Russian', 'division', '1750', 'tidiest', 'balmed', 'crawl', 'footpads', 'boldly', 'hump', 'brackish', 'chapter', 'intermediate', 'miller', 'contributory', 'forewarnings', 'MY', 'commodores', 'abruptly', 'CHAPTERS', 'wealthy', 'luxurious', 'blood', 'Arched', 'frightens', 'writhed', 'timber', 'fish', 'statistically', 'mast', 'midst', 'messmates', 'irritated', 'rumpled', 'Europa', 'passing', 'habitude', 'ships', 'Hurriedly', '9', 'barbs', 'Dunkirk', 'honour', 'cherrying', 'engrossing', 'inventive', 'mark', 'bastions', 'cousin', 'rocket', 'STUBB', 'plaguey', 'Drink', 'resounded', 'entered', 'frame', 'tens', 'forge', 'fortune', 'observations', 'elemental', 'dentistical', 'starve', 'labelled', 'quivers', 'consists', 'becharmed', 'gazes', 'compendious', 'flock', 'Heave', 'chalices', 'bigot', 'conception', 'discovering', 'checkered', 'radiating', 'reversing', 'village', 'individually', 'unpoetical', 'hamstring', 'bowes', 'Falsehood', 'equipped', 'subsided', 'lumps', 'Ethiopian', 'ATTACKED', 'contraband', 'mermaid', 'ERROMANGOAN', 'Look', 'distension', 'defiles', 'drove', 'atrocious', 'ushered', 'jury', 'filling', 'twigs', 'falls', 'blurred', 'dealers', 'monarch', 'ejaculation', 'how', 'prop', 'Indiaman', 'requiring', 'St', 'slantingly', 'pallidness', 'restless', 'Memory', 'supposition', 'spoils', 'overdone', 'featuring', 'tangle', 'Deuteronomy', 'waistband', 'Dante', 'highwaymen', 'turns', 'unalterable', 'constraint', 'persuade', 'Reality', 'assistance', 'orthodoxy', 'blasting', 'hen', 'sure', 'Returning', 'knight', 'rips', 'snuffers', 'cable', 'THIS', 'admonitions', 'unalloyed', 'pampered', 'circumstances', 'inns', 'guineas', 'dreamt', 'suns', 'lege', 'colouring', 'Wet', 'excitement', 'shoving', 'Bess', 'Hartz', 'Corinthians', 'dismissed', 'sacrificial', 'Dar', 'relics', 'Time', 'whole', 'relation', 'United', 'splendors', 'Stood', 'interested', 'prejudices', 'sacred', 'Broke', 'ticking', 'India', 'inseparable', 'Vitus', 'accordingly', 'Muffled', 'skewered', 'cows', 'liked', 'caught', 'justly', 'sup', 'sick', 'intermission', 'interluding', 'alpacas', 'employ', 'Backs', 'rivers', 'reverberations', 'leaks', 'sorrow', 'furnace', 'enters', 'education', 'amazing', 'nose', 'raved', 'locking', 'imposing', 'shoemaker', 'badly', 'change', 'methodization', 'hits', 'knockings', 'knotty', 'Kentucky', 'envy', 'kindling', 'bottomed', 'weather', 'fright', 'Har', 'LL', 'belike', 'blubbering', 'haughty', 'reproach', 'autumnal', 'ample', 'verifications', 'seven', 'Squeeze', 'Methinks', 'hitching', 'guns', 'affidavit', 'napping', 'restrained', 'vortex', 'august', 'reef', 'chewed', 'animal', 'Sibbald', 'plentifully', 'swamps', 'simultaneous', 'Coppered', 'preposterous', 'From', 'Crish', 'chassee', 'ANIMAL', 'magistrate', 'inveterate', 'bedfellows', 'Chapel', 'legend', 'ween', 'punctual', 'ridges', 'luck', 'lift', 'Squall', 'fling', 'deceitfulness', 'scream', 'chock', 'shan', 'oxygenated', 'roared', 'Rondeletius', 'chances', 'napkin', 'trellised', 'decline', 'fancying', 'concussions', 'lazily', 'Traitors', 'foie', 'Rocks', 'moles', 'illimitably', 'twitch', '1840', 'In', 'loftiest', 'unneeded', 'Round', 'confining', 'Rope', 'proceeded', 'plain', 'caterpillar', 'twelfth', 'slapped', 'tornado', 'burghers', 'Bones', 'soap', 'seethings', 'crawled', 'Latin', 'dreadful', 'Cuba', 'terrible', 'lithe', 'wedge', 'Middle', 'unnaturally', 'witness', 'Painter', 'travelling', 'hears', 'unlacing', 'Silence', 'mumblings', 'cocoanuts', 'centrally', 'throbs', 'tormented', 'bright', 'inconclusive', 'tips', 'jails', 'braver', 'rooms', 'ranks', 'tilt', 'boy', 'enabled', '---,', 'marred', 'expect', 'news', 'jingling', 'knocks', 'soils', 'Pegu', 'feelingly', 'fireplaces', 'instantaneously', 'cords', 'descriptively', 'feature', 'beside', 'Teneriffe', 'BELFAST', 'papers', 'swamp', 'unthinkingly', 'cathedrals', 'unexaggerating', 'daintiest', 'exhort', 'sizes', 'Walks', 'brevet', 'touching', 'Museum', 'conspicuous', 'Spermaceti', 'humane', 'ottomans', 'argument', 'sob', 'considerably', '86', 'sport', 'transition', 'caution', 'untouched', 'Saxon', 'rattling', 'yawing', 'latently', 'collateral', 'petrified', 'sheets', 'array', 'marvels', 'Miserable', 'piety', 'firmest', 'objected', 'drag', 'throned', 'picturesquely', 'blame', 'watchmakers', 'ST', 'scrawl', 'unprincipled', 'indispensableness', 'vacant', 'combined', 'frequency', 'toed', 'comrades', 'served', 'shrivelled', 'Sub', 'hastier', 'compassion', 'latitudes', 'easy', 'effulgent', 'commanded', 'Jago', 'never', 'trust', 'Glancing', 'bloom', 'Holloa', 'Slowly', 'sulphur', 'Del', 'maketh', 'racket', 'snore', 'stoopingly', 'consequence', 'latitude', 'Swimming', 'signification', 'Lesson', 'OAKUM', ...}
sorted(set(text3)) # sorted set of used vocabulary
['!', "'", '(', ')', ',', ',)', '.', '.)', ':', ';', ';)', '?', '?)', 'A', 'Abel', 'Abelmizraim', 'Abidah', 'Abide', 'Abimael', 'Abimelech', 'Abr', 'Abrah', 'Abraham', 'Abram', 'Accad', 'Achbor', 'Adah', 'Adam', 'Adbeel', 'Admah', 'Adullamite', 'After', 'Aholibamah', 'Ahuzzath', 'Ajah', 'Akan', 'All', 'Allonbachuth', 'Almighty', 'Almodad', 'Also', 'Alvah', 'Alvan', 'Am', 'Amal', 'Amalek', 'Amalekites', 'Ammon', 'Amorite', 'Amorites', 'Amraphel', 'An', 'Anah', 'Anamim', 'And', 'Aner', 'Angel', 'Appoint', 'Aram', 'Aran', 'Ararat', 'Arbah', 'Ard', 'Are', 'Areli', 'Arioch', 'Arise', 'Arkite', 'Arodi', 'Arphaxad', 'Art', 'Arvadite', 'As', 'Asenath', 'Ashbel', 'Asher', 'Ashkenaz', 'Ashteroth', 'Ask', 'Asshur', 'Asshurim', 'Assyr', 'Assyria', 'At', 'Atad', 'Avith', 'Baalhanan', 'Babel', 'Bashemath', 'Be', 'Because', 'Becher', 'Bedad', 'Beeri', 'Beerlahairoi', 'Beersheba', 'Behold', 'Bela', 'Belah', 'Benam', 'Benjamin', 'Beno', 'Beor', 'Bera', 'Bered', 'Beriah', 'Bethel', 'Bethlehem', 'Bethuel', 'Beware', 'Bilhah', 'Bilhan', 'Binding', 'Birsha', 'Bless', 'Blessed', 'Both', 'Bow', 'Bozrah', 'Bring', 'But', 'Buz', 'By', 'Cain', 'Cainan', 'Calah', 'Calneh', 'Can', 'Cana', 'Canaan', 'Canaanite', 'Canaanites', 'Canaanitish', 'Caphtorim', 'Carmi', 'Casluhim', 'Cast', 'Cause', 'Chaldees', 'Chedorlaomer', 'Cheran', 'Cherubims', 'Chesed', 'Chezib', 'Come', 'Cursed', 'Cush', 'Damascus', 'Dan', 'Day', 'Deborah', 'Dedan', 'Deliver', 'Diklah', 'Din', 'Dinah', 'Dinhabah', 'Discern', 'Dishan', 'Dishon', 'Do', 'Dodanim', 'Dothan', 'Drink', 'Duke', 'Dumah', 'Earth', 'Ebal', 'Eber', 'Edar', 'Eden', 'Edom', 'Edomites', 'Egy', 'Egypt', 'Egyptia', 'Egyptian', 'Egyptians', 'Ehi', 'Elah', 'Elam', 'Elbethel', 'Eldaah', 'EleloheIsrael', 'Eliezer', 'Eliphaz', 'Elishah', 'Ellasar', 'Elon', 'Elparan', 'Emins', 'En', 'Enmishpat', 'Eno', 'Enoch', 'Enos', 'Ephah', 'Epher', 'Ephra', 'Ephraim', 'Ephrath', 'Ephron', 'Er', 'Erech', 'Eri', 'Es', 'Esau', 'Escape', 'Esek', 'Eshban', 'Eshcol', 'Ethiopia', 'Euphrat', 'Euphrates', 'Eve', 'Even', 'Every', 'Except', 'Ezbon', 'Ezer', 'Fear', 'Feed', 'Fifteen', 'Fill', 'For', 'Forasmuch', 'Forgive', 'From', 'Fulfil', 'G', 'Gad', 'Gaham', 'Galeed', 'Gatam', 'Gather', 'Gaza', 'Gentiles', 'Gera', 'Gerar', 'Gershon', 'Get', 'Gether', 'Gihon', 'Gilead', 'Girgashites', 'Girgasite', 'Give', 'Go', 'God', 'Gomer', 'Gomorrah', 'Goshen', 'Guni', 'Hadad', 'Hadar', 'Hadoram', 'Hagar', 'Haggi', 'Hai', 'Ham', 'Hamathite', 'Hamor', 'Hamul', 'Hanoch', 'Happy', 'Haran', 'Hast', 'Haste', 'Have', 'Havilah', 'Hazarmaveth', 'Hazezontamar', 'Hazo', 'He', 'Hear', 'Heaven', 'Heber', 'Hebrew', 'Hebrews', 'Hebron', 'Hemam', 'Hemdan', 'Here', 'Hereby', 'Heth', 'Hezron', 'Hiddekel', 'Hinder', 'Hirah', 'His', 'Hitti', 'Hittite', 'Hittites', 'Hivite', 'Hobah', 'Hori', 'Horite', 'Horites', 'How', 'Hul', 'Huppim', 'Husham', 'Hushim', 'Huz', 'I', 'If', 'In', 'Irad', 'Iram', 'Is', 'Isa', 'Isaac', 'Iscah', 'Ishbak', 'Ishmael', 'Ishmeelites', 'Ishuah', 'Isra', 'Israel', 'Issachar', 'Isui', 'It', 'Ithran', 'Jaalam', 'Jabal', 'Jabbok', 'Jac', 'Jachin', 'Jacob', 'Jahleel', 'Jahzeel', 'Jamin', 'Japhe', 'Japheth', 'Jared', 'Javan', 'Jebusite', 'Jebusites', 'Jegarsahadutha', 'Jehovahjireh', 'Jemuel', 'Jerah', 'Jetheth', 'Jetur', 'Jeush', 'Jezer', 'Jidlaph', 'Jimnah', 'Job', 'Jobab', 'Jokshan', 'Joktan', 'Jordan', 'Joseph', 'Jubal', 'Judah', 'Judge', 'Judith', 'Kadesh', 'Kadmonites', 'Karnaim', 'Kedar', 'Kedemah', 'Kemuel', 'Kenaz', 'Kenites', 'Kenizzites', 'Keturah', 'Kiriathaim', 'Kirjatharba', 'Kittim', 'Know', 'Kohath', 'Kor', 'Korah', 'LO', 'LORD', 'Laban', 'Lahairoi', 'Lamech', 'Lasha', 'Lay', 'Leah', 'Lehabim', 'Lest', 'Let', 'Letushim', 'Leummim', 'Levi', 'Lie', 'Lift', 'Lo', 'Look', 'Lot', 'Lotan', 'Lud', 'Ludim', 'Luz', 'Maachah', 'Machir', 'Machpelah', 'Madai', 'Magdiel', 'Magog', 'Mahalaleel', 'Mahalath', 'Mahanaim', 'Make', 'Malchiel', 'Male', 'Mam', 'Mamre', 'Man', 'Manahath', 'Manass', 'Manasseh', 'Mash', 'Masrekah', 'Massa', 'Matred', 'Me', 'Medan', 'Mehetabel', 'Mehujael', 'Melchizedek', 'Merari', 'Mesha', 'Meshech', 'Mesopotamia', 'Methusa', 'Methusael', 'Methuselah', 'Mezahab', 'Mibsam', 'Mibzar', 'Midian', 'Midianites', 'Milcah', 'Mishma', 'Mizpah', 'Mizraim', 'Mizz', 'Moab', 'Moabites', 'Moreh', 'Moreover', 'Moriah', 'Muppim', 'My', 'Naamah', 'Naaman', 'Nahath', 'Nahor', 'Naphish', 'Naphtali', 'Naphtuhim', 'Nay', 'Nebajoth', 'Neither', 'Night', 'Nimrod', 'Nineveh', 'Noah', 'Nod', 'Not', 'Now', 'O', 'Obal', 'Of', 'Oh', 'Ohad', 'Omar', 'On', 'Onam', 'Onan', 'Only', 'Ophir', 'Our', 'Out', 'Padan', 'Padanaram', 'Paran', 'Pass', 'Pathrusim', 'Pau', 'Peace', 'Peleg', 'Peniel', 'Penuel', 'Peradventure', 'Perizzit', 'Perizzite', 'Perizzites', 'Phallu', 'Phara', 'Pharaoh', 'Pharez', 'Phichol', 'Philistim', 'Philistines', 'Phut', 'Phuvah', 'Pildash', 'Pinon', 'Pison', 'Potiphar', 'Potipherah', 'Put', 'Raamah', 'Rachel', 'Rameses', 'Rebek', 'Rebekah', 'Rehoboth', 'Remain', 'Rephaims', 'Resen', 'Return', 'Reu', 'Reub', 'Reuben', 'Reuel', 'Reumah', 'Riphath', 'Rosh', 'Sabtah', 'Sabtech', 'Said', 'Salah', 'Salem', 'Samlah', 'Sarah', 'Sarai', 'Saul', 'Save', 'Say', 'Se', 'Seba', 'See', 'Seeing', 'Seir', 'Sell', 'Send', 'Sephar', 'Serah', 'Sered', 'Serug', 'Set', 'Seth', 'Shalem', 'Shall', 'Shalt', 'Shammah', 'Shaul', 'Shaveh', 'She', 'Sheba', 'Shebah', 'Shechem', 'Shed', 'Shel', 'Shelah', 'Sheleph', 'Shem', 'Shemeber', 'Shepho', 'Shillem', 'Shiloh', 'Shimron', 'Shinab', 'Shinar', 'Shobal', 'Should', 'Shuah', 'Shuni', 'Shur', 'Sichem', 'Siddim', 'Sidon', 'Simeon', 'Sinite', 'Sitnah', 'Slay', 'So', 'Sod', 'Sodom', 'Sojourn', 'Some', 'Spake', 'Speak', 'Spirit', 'Stand', 'Succoth', 'Surely', 'Swear', 'Syrian', 'Take', 'Tamar', 'Tarshish', 'Tebah', 'Tell', 'Tema', 'Teman', 'Temani', 'Terah', 'Thahash', 'That', 'The', 'Then', 'There', 'Therefore', 'These', 'They', 'Thirty', 'This', 'Thorns', 'Thou', 'Thus', 'Thy', 'Tidal', 'Timna', 'Timnah', 'Timnath', 'Tiras', 'To', 'Togarmah', 'Tola', 'Tubal', 'Tubalcain', 'Twelve', 'Two', 'Unstable', 'Until', 'Unto', 'Up', 'Upon', 'Ur', 'Uz', 'Uzal', 'We', 'What', 'When', 'Whence', 'Where', 'Whereas', 'Wherefore', 'Which', 'While', 'Who', 'Whose', 'Whoso', 'Why', 'Wilt', 'With', 'Woman', 'Ye', 'Yea', 'Yet', 'Zaavan', 'Zaphnathpaaneah', 'Zar', 'Zarah', 'Zeboiim', 'Zeboim', 'Zebul', 'Zebulun', 'Zemarite', 'Zepho', 'Zerah', 'Zibeon', 'Zidon', 'Zillah', 'Zilpah', 'Zimran', 'Ziphion', 'Zo', 'Zoar', 'Zohar', 'Zuzims', 'a', 'abated', 'abide', 'able', 'abode', 'abomination', 'about', 'above', 'abroad', 'absent', 'abundantly', 'accept', 'accepted', 'according', 'acknowledged', 'activity', 'add', 'adder', 'afar', 'afflict', 'affliction', 'afraid', 'after', 'afterward', 'afterwards', 'aga', 'again', 'against', 'age', 'aileth', 'air', 'al', 'alive', 'all', 'almon', 'alo', 'alone', 'aloud', 'also', 'altar', 'altogether', 'always', 'am', 'among', 'amongst', 'an', 'and', 'angel', 'angels', 'anger', 'angry', 'anguish', 'anointedst', 'anoth', 'another', 'answer', 'answered', 'any', 'anything', 'appe', 'appear', 'appeared', 'appease', 'appoint', 'appointed', 'aprons', 'archer', 'archers', 'are', 'arise', 'ark', 'armed', 'arms', 'army', 'arose', 'arrayed', 'art', 'artificer', 'as', 'ascending', 'ash', 'ashamed', 'ask', 'asked', 'asketh', 'ass', 'assembly', 'asses', 'assigned', 'asswaged', 'at', 'attained', 'audience', 'avenged', 'aw', 'awaked', 'away', 'awoke', 'back', 'backward', 'bad', 'bade', 'badest', 'badne', 'bak', 'bake', 'bakemeats', 'baker', 'bakers', 'balm', 'bands', 'bank', 'bare', 'barr', 'barren', 'basket', 'baskets', 'battle', 'bdellium', 'be', 'bear', 'beari', 'bearing', 'beast', 'beasts', 'beautiful', 'became', 'because', 'become', 'bed', 'been', 'befall', 'befell', 'before', 'began', 'begat', 'beget', 'begettest', 'begin', 'beginning', 'begotten', 'beguiled', 'beheld', 'behind', 'behold', 'being', 'believed', 'belly', 'belong', 'beneath', 'bereaved', 'beside', 'besides', 'besought', 'best', 'betimes', 'better', 'between', 'betwixt', 'beyond', 'binding', 'bird', 'birds', 'birthday', 'birthright', 'biteth', 'bitter', 'blame', 'blameless', 'blasted', 'bless', 'blessed', 'blesseth', 'blessi', 'blessing', 'blessings', 'blindness', 'blood', 'blossoms', 'bodies', 'boldly', 'bondman', 'bondmen', 'bondwoman', 'bone', 'bones', 'book', 'booths', 'border', 'borders', 'born', 'bosom', 'both', 'bottle', 'bou', 'boug', 'bough', 'bought', 'bound', 'bow', 'bowed', 'bowels', 'bowing', 'boys', 'bracelets', 'branches', 'brass', 'bre', 'breach', 'bread', 'breadth', 'break', 'breaketh', 'breaking', 'breasts', 'breath', 'breathed', 'breed', 'brethren', 'brick', 'brimstone', 'bring', 'brink', 'broken', 'brook', 'broth', 'brother', 'brought', 'brown', 'bruise', 'budded', 'build', 'builded', 'built', 'bulls', 'bundle', 'bundles', 'burdens', 'buried', 'burn', 'burning', 'burnt', 'bury', 'buryingplace', 'business', 'but', 'butler', 'butlers', 'butlership', 'butter', 'buy', 'by', 'cakes', 'calf', 'call', 'called', 'came', 'camel', 'camels', 'camest', 'can', 'cannot', 'canst', 'captain', 'captive', 'captives', 'carcases', 'carried', 'carry', 'cast', 'castles', 'catt', 'cattle', 'caught', 'cause', 'caused', 'cave', 'cease', 'ceased', 'certain', 'certainly', 'chain', 'chamber', 'change', 'changed', 'changes', 'charge', 'charged', 'chariot', 'chariots', 'chesnut', 'chi', 'chief', 'child', 'childless', 'childr', 'children', 'chode', 'choice', 'chose', 'circumcis', 'circumcise', 'circumcised', 'citi', 'cities', 'city', 'clave', 'clean', 'clear', 'cleave', 'clo', 'closed', 'clothed', 'clothes', 'cloud', 'clusters', 'co', 'coat', 'coats', 'coffin', 'cold', ...]
len(set(text1)) # how big is the set of vocabulary / how many distict words are in the text?
19317
len(set(text1)) / len(text1) # the number of distinct words is 7,4% of the total number of words
0.07406285585022564
text1.count("white")
191
text1.count("whale")
906
text1.count("whale") / len(text1)
0.003473673313677301
# 0,3% of the text is the words whale
def lexical_diversity(text):
return len(set(text)) / len(text)
lexical_diversity(text1)
0.07406285585022564
def percentage(count, total):
return 100 * count / total
percentage(906, 260819)
0.3473673313677301
percentage(text1.count("whale"), len(text1))
0.3473673313677301
# 1.2.1 Lists
sent1 = ['Call', 'me', 'Ishmael', '.']
sent1
['Call', 'me', 'Ishmael', '.']
len(sent1)
4
lexical_diversity(sent1)
1.0
# 100% diversity
sent2
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.']
sent1
['Call', 'me', 'Ishmael', '.']
sent9
['THE', 'suburb', 'of', 'Saffron', 'Park', 'lay', 'on', 'the', 'sunset', 'side', 'of', 'London', ',', 'as', 'red', 'and', 'ragged', 'as', 'a', 'cloud', 'of', 'sunset', '.']
sentence1 = ['I', 'am', 'not', 'a', 'computer', ',', 'am', 'I', '?']
sorted(sentence1)
[',', '?', 'I', 'I', 'a', 'am', 'am', 'computer', 'not']
set(sentence1)
{',', '?', 'I', 'a', 'am', 'computer', 'not'}
len(sentence1)
9
len(set(sentence1))
7
sentence1.count('I')
2
sentence1.count('not')
1
['I', 'am', 'not', 'a', 'computer',] + [',', 'am', 'I', '?'] # addition of two lists = concatenation
['I', 'am', 'not', 'a', 'computer', ',', 'am', 'I', '?']
sent1 + sent4
['Call', 'me', 'Ishmael', '.', 'Fellow', '-', 'Citizens', 'of', 'the', 'Senate', 'and', 'of', 'the', 'House', 'of', 'Representatives', ':']
sentence1.append("Yes!")
sentence1
['I', 'am', 'not', 'a', 'computer', ',', 'am', 'I', '?', 'Yes!', 'Yes!', 'Yes!']
# 1.2.2 Indexing Lists
text1[98]
'others'
text1.index('awake')
16672
text1[100:107] # slices word 101, 102, 103, 104, 105, 106 and 107
['and', 'to', 'teach', 'them', 'by', 'what', 'name']
sent = ['word1', 'word2', 'word3']
sent[0]
'word1'
sent[2]
'word3'
sent[4]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) /tmp/ipykernel_4878/2879615376.py in <module> ----> 1 sent[4] IndexError: list index out of range
sent[:3] # starts at the beginning of the list up to 3
['word1', 'word2', 'word3']
sent[0:] # starts at 2 and ends at the end of the list
['word1', 'word2', 'word3']
sentence2 = ['I', 'just', 'called', 'to', 'say', 'I', 'love', 'you']
sentence2
['I', 'just', 'called', 'to', 'say', 'I', 'love', 'you']
sentence2[5:]
['I', 'love', 'you']
sentence2[:3]
['I', 'just', 'called']
# 1.2.4 Strings
name = 'Monty'
name[0]
'M'
name[:4]
'Mont'
name * 3
'MontyMontyMonty'
''.join(['I ', 'love ', 'you.'])
'I love you.'
'i luv ya'.split()
['i', 'luv', 'ya']
# 1.3
saying = ['I', 'just', 'called', 'to', 'say', 'I', 'love', 'you']
set(saying)
{'I', 'called', 'just', 'love', 'say', 'to', 'you'}
tokens = set(saying)
sorted(tokens)
['I', 'called', 'just', 'love', 'say', 'to', 'you']
sorted_tokens = sorted(tokens)
sorted_tokens[-2:]
['to', 'you']
# 1.3.1 Frequency Distributions
fdist1 = FreqDist(text1)
print(fdist1)
<FreqDist with 19317 samples and 260819 outcomes>
fdist1.most_common(50)
[(',', 18713), ('the', 13721), ('.', 6862), ('of', 6536), ('and', 6024), ('a', 4569), ('to', 4542), (';', 4072), ('in', 3916), ('that', 2982), ("'", 2684), ('-', 2552), ('his', 2459), ('it', 2209), ('I', 2124), ('s', 1739), ('is', 1695), ('he', 1661), ('with', 1659), ('was', 1632), ('as', 1620), ('"', 1478), ('all', 1462), ('for', 1414), ('this', 1280), ('!', 1269), ('at', 1231), ('by', 1137), ('but', 1113), ('not', 1103), ('--', 1070), ('him', 1058), ('from', 1052), ('be', 1030), ('on', 1005), ('so', 918), ('whale', 906), ('one', 889), ('you', 841), ('had', 767), ('have', 760), ('there', 715), ('But', 705), ('or', 697), ('were', 680), ('now', 646), ('which', 640), ('?', 637), ('me', 627), ('like', 624)]
fdist1['whale']
906
fdist1.hapaxes()
['Herman', 'Melville', ']', 'ETYMOLOGY', 'Late', 'Consumptive', 'School', 'threadbare', 'lexicons', 'mockingly', 'flags', 'mortality', 'signification', 'HACKLUYT', 'Sw', 'HVAL', 'roundness', 'Dut', 'Ger', 'WALLEN', 'WALW', 'IAN', 'RICHARDSON', 'KETOS', 'GREEK', 'CETUS', 'LATIN', 'WHOEL', 'ANGLO', 'SAXON', 'WAL', 'HWAL', 'SWEDISH', 'ICELANDIC', 'BALEINE', 'BALLENA', 'FEGEE', 'ERROMANGOAN', 'Librarian', 'painstaking', 'burrower', 'grub', 'Vaticans', 'stalls', 'higgledy', 'piggledy', 'gospel', 'promiscuously', 'commentator', 'belongest', 'sallow', 'Pale', 'Sherry', 'loves', 'bluntly', 'Subs', 'thankless', 'Hampton', 'Court', 'hie', 'refugees', 'pampered', 'Michael', 'Raphael', 'unsplinterable', 'GENESIS', 'JOB', 'JONAH', 'punish', 'ISAIAH', 'soever', 'cometh', 'incontinently', 'perisheth', 'PLUTARCH', 'MORALS', 'breedeth', 'Whirlpooles', 'Balaene', 'arpens', 'PLINY', 'Scarcely', 'TOOKE', 'LUCIAN', 'TRUE', 'catched', 'OCTHER', 'VERBAL', 'TAKEN', 'MOUTH', 'ALFRED', '890', 'gudgeon', 'retires', 'MONTAIGNE', 'APOLOGY', 'RAIMOND', 'SEBOND', 'Nick', 'RABELAIS', 'cartloads', 'STOWE', 'ANNALS', 'LORD', 'BACON', 'Touching', 'ork', 'DEATH', 'sovereignest', 'bruise', 'HAMLET', 'leach', 'Mote', 'availle', 'returne', 'againe', 'worker', 'Dinting', 'paine', 'thro', 'maine', 'FAERIE', 'Immense', 'til', 'DAVENANT', 'PREFACE', 'GONDIBERT', 'spermacetti', 'Hosmannus', 'Nescio', 'VIDE', 'Spencer', 'Talus', 'flail', 'threatens', 'jav', 'lins', 'WALLER', 'SUMMER', 'ISLANDS', 'Commonwealth', 'Civitas', 'OPENING', 'SENTENCE', 'HOBBES', 'LEVIATHAN', 'Silly', 'Mansoul', 'chewing', 'sprat', 'PILGRIM', 'PROGRESS', 'Created', 'PARADISE', 'LOST', '---"', 'Hugest', 'Stretched', 'Draws', 'FULLLER', 'PROFANE', 'HOLY', 'STATE', 'DRYDEN', 'ANNUS', 'MIRABILIS', 'aground', 'EDGE', 'TEN', 'SPITZBERGEN', 'PURCHAS', 'wantonness', 'fuzzing', 'vents', 'HERBERT', 'INTO', 'ASIA', 'AFRICA', 'SCHOUTEN', 'SIXTH', 'CIRCUMNAVIGATION', 'Elbe', 'ducat', 'herrings', 'GREENLAND', 'Several', 'Fife', 'Anno', '1652', 'Pitferren', 'SIBBALD', 'FIFE', 'KINROSS', 'Myself', 'Sperma', 'ceti', 'fierceness', 'RICHARD', 'STRAFFORD', 'LETTER', 'BERMUDAS', 'PHIL', 'TRANS', '1668', 'PRIMER', 'COWLEY', '1729', '"...', 'frequendy', 'insupportable', 'disorder', 'ULLOA', 'SOUTH', 'AMERICA', 'sylphs', 'petticoat', 'Oft', 'Tho', 'RAPE', 'LOCK', 'NAT', 'wales', 'JOHNSON', 'COOK', 'dung', 'lime', 'juniper', 'UNO', 'VON', 'TROIL', 'LETTERS', 'BANKS', 'SOLANDER', '1772', 'Nantuckois', 'JEFFERSON', 'MEMORIAL', 'MINISTER', 'REFERENCE', 'PARLIAMENT', 'SOMEWHERE', 'guarding', 'protecting', 'robbers', 'BLACKSTONE', 'Rodmond', 'suspends', 'attends', 'FALCONER', 'Bright', 'roofs', 'domes', 'rockets', 'Around', 'unwieldy', 'COWPER', 'VISIT', 'LONDON', 'HUNTER', 'DISSECTION', 'SMALL', 'SIZED', 'aorta', 'gushing', 'PALEY', 'THEOLOGY', 'mammiferous', 'hind', 'BARON', 'CUVIER', 'COLNETT', 'PURPOSE', 'EXTENDING', 'SPERMACETI', 'Floundered', 'chace', 'peopling', 'Gather', 'Led', 'instincts', 'trackless', 'Assaulted', 'voracious', 'spiral', 'MONTGOMERY', 'WORLD', 'FLOOD', 'Paean', 'fatter', 'Flounders', 'CHARLES', 'LAMB', 'TRIUMPH', '1690', 'OBED', 'Susan', 'HAWTHORNE', 'TWICE', 'bespeak', 'raal', 'COOPER', 'PILOT', 'Berlin', 'Gazette', 'ECKERMANN', 'CONVERSATIONS', 'GOETHE', 'ESSEX', 'WAS', 'ATTACKED', 'FINALLY', 'DESTROYED', 'OWEN', 'CHACE', 'FIRST', 'SAID', 'VESSEL', 'YORK', '1821', 'piping', 'dimmed', 'phospher', 'ELIZABETH', 'OAKES', 'SMITH', 'amounted', '440', 'SCORESBY', 'Mad', 'agonies', 'endures', 'infuriated', 'rears', 'snaps', 'propelled', 'observers', 'opportunities', 'habitudes', 'BEALE', 'offensively', 'artful', 'mischievous', 'FREDERICK', 'DEBELL', '1840', 'October', 'Raise', 'ay', 'THAR', 'bowes', 'os', 'ROSS', 'ETCHINGS', 'CRUIZE', '1846', 'Globe', 'transactions', 'relate', 'HUSSEY', 'SURVIVORS', 'parried', 'MISSIONARY', 'JOURNAL', 'TYERMAN', 'boldest', 'persevering', 'REPORT', 'DANIEL', 'SPEECH', 'SENATE', 'APPLICATION', 'ERECTION', 'BREAKWATER', 'CAPTORS', 'WHALEMAN', 'ADVENTURES', 'BIOGRAPHY', 'GATHERED', 'HOMEWARD', 'COMMODORE', 'PREBLE', 'REV', 'CHEEVER', 'MUTINEER', 'BROTHER', 'ANOTHER', 'MCCULLOCH', 'COMMERCIAL', 'reciprocal', 'clews', 'SOMETHING', 'UNPUBLISHED', 'CURRENTS', 'Pedestrians', 'recollect', 'gateways', 'VOYAGER', 'ARCTIC', 'NEWSPAPER', 'TAKING', 'RETAKING', 'HOBOMACK', 'MIRIAM', 'FISHERMAN', 'appliance', 'RIBS', 'TRUCKS', 'Terra', 'Del', 'Fuego', 'DARWIN', 'NATURALIST', ";--'", '!\'"', 'WHARTON', 'Loomings', 'spleen', 'regulating', 'circulation', 'Whenever', 'drizzly', 'hypos', 'philosophical', 'Cato', 'Manhattoes', 'reefs', 'downtown', 'gazers', 'Circumambulate', 'Corlears', 'Coenties', 'Slip', 'Whitehall', 'Posted', 'sentinels', 'spiles', 'pier', 'lath', 'counters', 'desks', 'loitering', 'shady', 'Inlanders', 'lanes', 'alleys', 'attract', 'dale', 'dreamiest', 'shadiest', 'quietest', 'enchanting', 'Saco', 'crucifix', 'Deep', 'mazy', 'Tiger', 'Tennessee', 'Rockaway', 'Persians', 'deity', 'Narcissus', 'ungraspable', 'hazy', 'quarrelsome', 'offices', 'abominate', 'toils', 'trials', 'barques', 'schooners', 'broiling', 'buttered', 'judgmatically', 'peppered', 'reverentially', 'idolatrous', 'dotings', 'ibis', 'roasted', 'bake', 'plumb', 'Van', 'Rensselaers', 'Randolphs', 'Hardicanutes', 'lording', 'tallest', 'decoction', 'Seneca', 'Stoics', 'Testament', 'promptly', 'rub', 'infliction', 'BEING', 'PAID', 'urbane', 'ills', 'monied', 'consign', 'prevalent', 'violate', 'Pythagorean', 'commonalty', 'police', 'surveillance', 'programme', 'solo', 'CONTESTED', 'ELECTION', 'PRESIDENCY', 'UNITED', 'STATES', 'ISHMAEL', 'BLOODY', 'AFFGHANISTAN', 'managers', 'genteel', 'comedies', 'farces', 'cunningly', 'disguises', 'cajoling', 'unbiased', 'freewill', 'discriminating', 'overwhelming', 'undeliverable', 'itch', 'forbidden', 'ignoring', 'lodges', 'Carpet', 'Bag', 'Manhatto', 'candidates', 'penalties', 'Tyre', 'Carthage', 'imported', 'cobblestones', 'bitingly', 'shouldering', 'price', 'fervent', 'asphaltic', 'pavement', 'flinty', 'projections', 'soles', 'Too', 'cheapest', 'cheeriest', 'invitingly', 'particles', 'peer', 'Angel', 'Doom', 'wailing', 'gnashing', 'Wretched', 'entertainment', 'Moving', 'emigrant', 'poverty', 'creak', 'lodgings', 'zephyr', 'hob', 'toasting', 'observest', 'sashless', 'glazier', 'reasonest', 'chinks', 'crannies', 'lint', 'chattering', 'shiverings', 'cob', 'redder', 'Orion', 'glitters', 'conservatories', 'president', 'temperance', 'blubbering', 'straggling', 'wainscots', 'reminding', 'oilpainting', 'besmoked', 'defaced', 'unequal', 'crosslights', 'hags', 'delineate', 'bewitched', 'ponderings', 'boggy', 'soggy', 'squitchy', 'froze', 'heath', 'icebound', 'represents', 'Horner', 'foundered', 'clubs', 'harvesting', 'hacking', 'horrifying', 'Mixed', 'Nathan', 'Swain', 'corkscrew', 'Blanco', 'sojourning', 'fireplaces', 'duskier', 'cockpits', 'rarities', 'Projecting', 'Within', 'shelves', 'flasks', 'bustles', 'deliriums', 'Abominable', 'tumblers', 'cylinders', 'goggling', 'deceitfully', 'tapered', 'Parallel', 'pecked', 'footpads', 'Fill', 'shilling', 'examining', 'SKRIMSHANDER', 'accommodated', 'unoccupied', 'haint', 'pose', 'whalin', 'decidedly', 'objectionable', 'wander', 'Battery', 'ruminating', 'adorning', 'potatoes', 'sartainty', 'diabolically', 'steaks', 'undress', 'looker', 'rioting', 'Grampus', 'seed', 'Feegees', 'tramping', 'Enveloped', 'bedarned', 'eruption', 'officiating', 'brimmers', 'complained', 'potion', 'colds', 'catarrhs', 'liquor', 'arrantest', 'topers', 'obstreperously', 'aloof', 'desirous', 'hilarity', 'coffer', 'Southerner', 'mountaineers', 'Alleghanian', 'missed', 'supernaturally', 'congratulate', 'multiply', 'bachelor', 'abominated', 'tidiest', 'bedwards', 'shan', 'tablecloth', 'Skrimshander', 'bump', 'spraining', 'eider', 'yoking', 'rickety', 'whirlwinds', 'knockings', 'dismissed', 'popped', 'cherishing', 'chuckled', 'chuckle', 'mightily', 'catches', 'bamboozingly', 'overstocked', 'toothpick', 'rayther', 'BROWN', 'slanderin', 'farrago', 'BROKE', 'Sartain', 'Mt', 'Hecla', 'persist', 'mystifying', 'unsay', 'criminal', 'Wall', 'purty', 'sarmon', 'rips', 'tellin', 'bought', 'balmed', 'curios', 'sellin', 'inions', 'fooling', 'idolators', 'Depend', 'reg', 'lar', 'spliced', 'Johnny', 'sprawling', 'Arter', 'glim', 'jiffy', 'irresolute', 'vum', 'WON', 'Folding', 'scrutiny', 'porcupine', 'moccasin', 'ponchos', 'parade', 'rainy', 'remembering', 'commended', 'cobs', 'Nod', 'footfall', 'unlacing', 'blackish', 'plasters', 'inkling', 'Placing', 'crammed', 'scalp', 'mildewed', 'Ignorance', 'parent', 'nonplussed', 'undressing', 'checkered', 'Thirty', 'frogs', 'quaked', 'wrapall', 'dreadnaught', 'fumbled', 'Remembering', 'manikin', 'tenpin', 'andirons', 'jambs', 'bricks', 'appropriate', 'applying', 'hastier', 'withdrawals', 'antics', 'devotee', 'extinguishing', 'unceremoniously', 'bagged', 'sportsman', 'woodcock', 'uncomfortableness', 'deliberating', 'puffed', 'sang', 'Stammering', 'conjured', 'responses', 'debel', 'flourishing', 'Angels', 'flourishings', 'peddlin', 'sleepe', 'grunted', 'gettee', 'motioning', 'comely', 'insured', 'Counterpane', 'parti', 'triangles', 'interminable', 'caper', 'supperless', '21st', 'hemisphere', 'sigh', 'Sixteen', 'ached', 'coaches', 'stockinged', 'slippering', 'misbehaviour', 'unendurable', 'stepmothers', 'misfortunes', 'steeped', 'shudderingly', 'confounding', 'soberly', 'recurred', 'predicament', 'unlock', 'bridegroom', 'clasp', 'hugged', 'rouse', 'snore', 'scratch', 'Throwing', 'expostulations', 'unbecomingness', 'matrimonial', 'dawning', 'overture', 'innate', 'compliment', 'civility', 'rudeness', 'toilette', 'dressing', 'donning', 'gaspings', 'booting', 'caterpillar', 'outlandishness', 'manners', 'education', 'undergraduate', 'dreamt', 'cowhide', 'pinched', 'curtains', 'indecorous', 'contented', 'restricting', 'donned', 'lathering', 'unsheathes', 'whets', 'Rogers', 'cutlery', 'Afterwards', 'baton', 'Breakfast', 'pleasantly', 'bountifully', 'laughable', 'bosky', 'unshorn', 'gowns', 'toasted', 'lingers', 'tarried', 'barred', 'Grub', 'Park', 'assurance', 'polish', 'occasioned', 'embarrassed', 'bashfulness', 'duelled', 'winking', 'tastes', 'sheepishly', 'bashful', 'icicle', 'admirer', 'cordially', 'grappling', 'genteelly', 'eschewed', 'undivided', '6', 'circulating', 'nondescripts', 'Chestnut', 'jostle', 'Regent', 'Lascars', 'Bombay', 'Apollo', 'Feegeeans', 'Tongatobooarrs', 'Erromanggoans', 'Pannangians', 'Brighggians', 'weekly', 'Vermonters', 'stalwart', 'frames', 'felled', 'strutting', 'wester', 'bombazine', 'cloak', 'mow', 'gloves', 'joins', 'outfit', 'waistcoats', 'Hay', 'Seed', 'tract', 'dearest', 'pave', 'eggs', 'patrician', 'parks', 'scraggy', 'scoria', 'Herr', 'dowers', 'nieces', 'reservoirs', 'maples', 'bountiful', 'proffer', 'passer', 'cones', 'blossoms', 'superinduced', 'carnation', 'Salem', 'sweethearts', 'Puritanic', 'Whaleman', 'Wrapping', 'Each', 'quote', 'TALBOT', 'Near', 'Desolation', '1st', 'SISTER', 'ROBERT', 'WILLIS', 'ELLERY', 'NATHAN', 'COLEMAN', 'WALTER', 'CANNY', 'SETH', 'GLEIG', 'Forming', 'ELIZA', '31st', 'MARBLE', 'SHIPMATES', 'EZEKIEL', 'HARDY', 'AUGUST', '3d', '1833', 'WIDOW', 'Shaking', 'glazed', 'Affected', 'relatives', 'unhealing', 'sympathetically', 'wounds', 'bleed', 'blanks', ...]
# 1.3.2 Fine grained selection of words
Vocabulary = set(text1)
long_words = [word for word in Vocabulary if len(word) > 15]
sorted(long_words)
['CIRCUMNAVIGATION', 'Physiognomically', 'apprehensiveness', 'cannibalistically', 'characteristically', 'circumnavigating', 'circumnavigation', 'circumnavigations', 'comprehensiveness', 'hermaphroditical', 'indiscriminately', 'indispensableness', 'irresistibleness', 'physiognomically', 'preternaturalness', 'responsibilities', 'simultaneousness', 'subterraneousness', 'supernaturalness', 'superstitiousness', 'uncomfortableness', 'uncompromisedness', 'undiscriminating', 'uninterpenetratingly']
long_words = [word for word in Vocabulary if len(word) > 16]
sorted(long_words)
['cannibalistically', 'characteristically', 'circumnavigations', 'comprehensiveness', 'indispensableness', 'preternaturalness', 'subterraneousness', 'superstitiousness', 'uncomfortableness', 'uncompromisedness', 'uninterpenetratingly']
fdist1 = FreqDist(text1)
sorted(word for word in set(text1) if len(word) > 7 and fdist1[word] < 7) # word longer than 7 characters and appears at least 7 times in the text
['ADDITIONAL', 'ADVANCING', 'ADVENTURES', 'AFFGHANISTAN', 'ALGERINE', 'APPLICATION', 'APPROACHING', 'ASCENDING', 'ATTACKED', 'ATTITUDES', 'Abednego', 'Abjectus', 'Abominable', 'Accessory', 'According', 'Accordingly', 'Accursed', 'Achilles', 'Acushnet', 'Admirals', 'Advancement', 'Adventures', 'Advocate', 'Affected', 'Affidavit', 'Affrighted', 'Africans', 'Afterwards', 'Ahasuerus', 'Albatross', 'Albemarle', 'Albicore', 'Aldrovandi', 'Aldrovandus', 'Alexander', 'Alexanders', 'Algerine', 'Alleghanian', 'Alleghanies', 'Almanack', 'Almighty', 'Ambergriese', 'Ambergris', 'Americans', 'Americas', 'Amsterdam', 'Anacharsis', 'Anatomist', 'Andromeda', 'Animated', 'Anomalous', 'Antarctic', 'Antilles', 'Antiochus', 'Anything', 'Apoplexy', 'Aquarius', 'Archbishop', 'Archipelagoes', 'Arethusa', 'Aristotle', 'Arkansas', 'Aroostook', 'Arsacidean', 'Arsacides', 'Articles', 'Ashantee', 'Asiatics', 'Asphaltites', 'Assaulted', 'Assuming', 'Assuredly', 'Assyrian', 'Astronomy', 'Atlantics', 'Attached', 'Australia', 'Australian', 'Austrian', 'Availing', 'BERMUDAS', 'BIOGRAPHY', 'BLACKSMITH', 'BLACKSTONE', 'BREAKERS', 'BREAKWATER', 'Babylonian', 'Baltimore', 'Bartholomew', 'Basilosaurus', 'Bastille', 'Battering', 'Beelzebub', 'Befooled', 'Belisarius', 'Belshazzar', 'Bendigoes', 'Benjamin', 'Berkshire', 'Bibliographical', 'Biographical', 'Blacksmith', 'Blackstone', 'Blinding', 'Blocksburg', 'Bonapartes', 'Bonneterre', 'Bordeaux', 'Bourbons', 'Bowditch', 'Brahmins', 'Brandreth', 'Breakfast', 'Brighggians', 'Broadway', 'Bulwarks', 'Business', 'Butchers', 'CAPTAINS', 'CARPENTER', 'CAULKING', 'CHAPTERS', 'CHRONICLER', 'CIRCUMNAVIGATION', 'CLUSTERS', 'COMMERCIAL', 'COMMODORE', 'COMSTOCK', 'CONTESTED', 'CONTINUES', 'CONVERSATIONS', 'CRUISING', 'CURRENTS', 'Cachalot', 'Caesarian', 'Californian', 'Cambyses', 'Campagna', 'Canadian', 'Canaller', 'Cannibal', 'Cannibals', 'Canterbury', 'Capricornus', 'Captains', 'Carefully', 'Carpenter', 'Carthage', 'Caryatid', 'Castaway', 'Cathedral', 'Catholic', 'Catskill', 'Cattegat', 'Certainly', 'Cervantes', 'Cetacean', 'Cetology', 'Champagne', 'Champollion', 'Charlemagne', 'Chartering', 'Cherries', 'Chestnut', 'Christendom', 'Christianity', 'Christmas', 'Circassian', 'Circumambulate', 'Clearing', 'Cleopatra', 'Cleveland', 'Clifford', 'Clinging', 'Cockatoo', 'Coenties', 'Coleridge', 'Colonies', 'Colossus', 'Columbus', 'Commanded', 'Commanders', 'Commodores', 'Commonly', 'Commonwealth', 'Companies', 'Comparing', 'Concerning', 'Congregation', 'Congregational', 'Conjuror', 'Connecticut', 'Consequently', 'Considering', 'Constable', 'Constantine', 'Constantinople', 'Consumptive', 'Continents', 'Contrasted', 'Conversation', 'Convulsively', 'Copenhagen', 'Coppered', 'Corinthians', 'Corkscrew', 'Corlears', 'Coronation', 'Corresponding', 'Counterpane', 'Crappoes', 'Crockett', 'Crossing', 'Crowding', 'Crozetts', 'Cruising', 'Cruppered', 'Crusaders', 'Cyclades', 'DAVENANT', 'DESTROYED', 'DICTIONARY', 'DISCOVERS', 'DISSECTION', 'DUODECIMOES', 'Damocles', 'Dardanelles', 'Darkness', 'Darmonodes', 'Dauphine', 'Decanter', 'Decapitation', 'December', 'Deliberately', 'Delightful', 'Deliverer', 'Denderah', 'Descartian', 'Descending', 'Desecrated', 'Desmarest', 'Desolation', 'Despairing', 'Despatch', 'Detached', 'Deuteronomy', 'Diminish', 'Discovery', 'Dorchester', 'Doubloon', 'Doubtless', 'Drinking', 'Dropping', 'Dunfermline', 'Duodecimo', 'Duodecimoes', 'Dutchman', 'ECKERMANN', 'ELECTION', 'ELIZABETH', 'EMBONPOINT', 'ERECTION', 'ERROMANGOAN', 'ETCHINGS', 'ETYMOLOGY', 'EXCHANGE', 'EXCHANGING', 'EXTENDING', 'EXTRACTS', 'Earthsman', 'Ecclesiastes', 'Eckerman', 'Eddystone', 'Edgewise', 'Egyptians', 'Ehrenbreitstein', 'Electors', 'Elephant', 'Elephanta', 'Elephants', 'Ellenborough', 'Elsewhere', 'Emblazonings', 'Emboldened', 'Emperors', 'Enderbies', 'Enderbys', 'Englander', 'Englishman', 'Englishmen', 'Entering', 'Entreaties', 'Enveloped', 'Ephesian', 'Epilogue', 'Equality', 'Equatorial', 'Erromanggoans', 'Erroneous', 'Esquimaux', 'Eternities', 'Eternity', 'Ethiopian', 'Euclidean', 'Euroclydon', 'European', 'Evangelist', 'Evangelists', 'Excellent', 'Excepting', 'Exception', 'Expedition', 'Expeditions', 'Exploring', 'Extending', 'FALCONER', 'FISHERMAN', 'FOLLOWING', 'FORECASTLE', 'FORESAIL', 'FREDERICK', 'Falsehood', 'Farewell', 'Fashioned', 'February', 'Feegeeans', 'Ferdinando', 'Fernandes', 'Fiercely', 'Fisheries', 'Fishiest', 'Floating', 'Floundered', 'Flounders', 'Forecastle', 'Forehead', 'Foremost', 'Forthwith', 'Fountain', 'Frankfort', 'Franklin', 'Frederick', 'Frenchmen', 'Friesland', 'Frighted', 'Frobisher', 'Froissart', 'Furthermore', 'GATHERED', 'GENERALLY', 'GOLDSMITH', 'GONDIBERT', 'GREENLAND', 'Galleries', 'Gallipagos', 'Gardiner', 'Gentlemen', 'Geological', 'Gibraltar', 'Glancing', 'Glimpses', 'Golconda', 'Goldsmith', 'Gomorrah', 'Gracious', 'Granting', 'Greedily', 'Greenlanders', 'Greenlandmen', 'Greenwich', 'Grenadier', 'Growlands', 'Guernseyman', 'Gulfweed', 'HACKLUYT', 'HARPOONEERS', 'HATCHWAY', 'HAWTHORNE', 'HEREABOUTS', 'HOBOMACK', 'HOMEWARD', 'HORIZONTAL', 'Hackluyt', 'Hampshire', 'Handling', 'Hannibal', 'Hanoverian', 'Hardicanutes', 'Harmattans', 'Harpooneer', 'Harpoons', 'Hawaiian', 'Hearkening', 'Hedgehog', 'Herculaneum', 'Hercules', 'Herschel', 'Highland', 'Himmalehan', 'Himmalehs', 'Hindostan', 'Historians', 'Historically', 'Hitherto', 'Hogarthian', 'Hoisting', 'Hollanders', 'Holofernes', 'Honourary', 'Horrible', 'Hosmannus', 'Hospital', 'Hoveringly', 'Humiliation', 'Humpback', 'Hurriedly', 'Hydriote', 'Hyperborean', 'ICELANDIC', 'IMPOSING', 'ISOLATOES', 'Icebergs', 'Icelandic', 'Ignorance', 'Illinois', 'Immediately', 'Immemorial', 'Immortal', 'Impenetrable', 'Imperial', 'Impossible', 'Imprimis', 'Improving', 'Inasmuch', 'Indiaman', 'Indiamen', 'Indolence', 'Inferable', 'Inlanders', 'Innkeeper', 'Innocents', 'Inquisition', 'Inserting', 'Instances', 'Instantly', 'Insufferable', 'Insurance', 'Interweaving', 'Intolerably', 'Invisible', 'Iroquois', 'Isabella', 'Islanders', 'Isolatoes', 'Israelites', 'JEFFERSON', 'Jinglers', 'Jonathan', 'Judgment', 'Jungfrau', 'Justinian', 'Kentuckian', 'Kentucky', 'Krusenstern', 'Krusensterns', 'LANTERNS', 'LASHINGS', 'LENGTHWISE', 'LEVIATHAN', 'LIGHTNING', 'LOUNGING', 'Labrador', 'Lacepede', 'Lackaday', 'Lamatins', 'Langsdorff', 'Laplander', 'Laplandish', 'Launched', 'Learning', 'Leicester', 'Leuwenhoeck', 'Levanter', 'Levelling', 'Leviathanic', 'Leviathanism', 'Liberties', 'Librarian', 'Lieutenant', 'Lighting', 'Lightning', 'Likewise', 'Linnaeus', 'Literally', 'Littleton', 'Loftiest', 'Lombardy', 'Loomings', 'Lothario', 'Louisiana', 'Loveliness', 'Lowering', 'MAINMAST', 'MCCULLOCH', 'MEANTIME', 'MEMORIAL', 'MINISTER', 'MIRABILIS', 'MISSIONARY', 'MONTAIGNE', 'MONTGOMERY', 'MUTINEER', 'MYSTERIOUS', 'Maccabees', 'Mackinaw', 'Macrocephalus', 'Maelstrom', 'Magnanimous', 'Magnitude', 'Manchester', 'Manhatto', 'Manhattoes', 'Manillas', 'Marchant', 'Massachusetts', 'Massacre', 'Mastodon', 'Measured', 'Measurement', 'Meddling', 'Melancthon', 'Melville', 'Mendanna', 'Mephistophelean', 'Merchant', 'Mesopotamian', 'Methinks', 'Methuselah', 'Michigan', 'Midnight', 'Midships', 'Midwifery', 'Mingling', 'Miserable', 'Mississippi', 'Mississippies', 'Missouri', 'Mistress', 'Mogulship', 'Moluccas', 'Monadnock', 'Monongahela', 'Monsieurs', 'Monsoons', 'Monstrous', 'Mordecai', 'Mountains', 'Mounttop', 'Mysteriously', 'Mysticetus', 'NARRATIVE', 'NARWHALE', 'NATURALIST', 'NEWSPAPER', 'Nantuckois', 'Napoleon', 'Narcissus', 'Narragansett', 'Narwhale', 'National', 'Neskyeuna', 'Netherlands', 'Newcastle', 'Newfoundland', 'Nicholas', 'Nightgown', 'Northern', 'Northman', 'Norwegian', 'November', 'OCTAVOES', 'Observatory', 'Oceanica', 'Olmstead', 'Opposite', 'Ordinaire', 'Oriental', 'Orientals', 'Originally', 'Overhearing', 'PARADISE', 'PARLIAMENT', 'PITCHING', 'PLUTARCH', 'PORPOISE', 'PORTUGUESE', 'PRESIDENCY', 'PREVIOUS', 'PROGRESS', 'Pacifics', 'Pactolus', 'Pandects', 'Pannangians', 'Pantheistic', 'Pantheists', 'Pantheon', 'Paracelsan', 'Paracelsus', 'Paradise', 'Parallel', 'Parisians', 'Parliament', 'Partners', 'Patagonia', 'Patagonian', 'Patience', 'Pedestrians', 'Penetrating', 'Perchance', 'Persians', 'Peruvian', 'Peterson', 'Petrified', 'Philippe', 'Philippine', 'Philistine', 'Philistines', 'Philologically', 'Philopater', 'Phrenologist', 'Physeter', 'Physiognomically', 'Physiognomist', 'Physiognomy', 'Pictures', 'Pirohitee', 'Pitchpoling', 'Pitferren', 'Platonian', 'Platonic', 'Platonist', 'Platonists', 'Polynesia', 'Polynesian', 'Polynesians', 'Pontoppodan', 'Porpoises', 'Portugal', 'Portuguese', 'Possession', 'Possibly', 'Postscript', 'Pottowottamie', 'Pottsfich', 'Povelson', 'Praetorians', 'Prairies', 'Presbyterian', 'Presbyterians', 'Presently', 'Preserving', 'Pressing', 'Preternatural', 'Probably', 'Procopius', 'Prodigies', 'Prodromus', 'Projecting', 'Prometheus', 'Propontis', 'Protestant', 'Providence', 'Puritanic', 'Puritans', 'Pythagoras', 'Pythagorean', 'Quadrant', 'Quakeress', 'Quakerish', 'Quakerism', 'Quitting', 'RABELAIS', 'RECLINING', 'REFERENCE', 'REMAINING', 'REPEATED', 'REPUBLICA', 'RESPECTABLE', 'RESUMING', 'RETAKING', 'RICHARDSON', 'Rabelais', 'Railroads', 'Ramadans', 'Randolphs', 'Receiving', 'Reckoning', 'Reference', 'Regarded', 'Regarding', 'Relieved', 'Remember', 'Remembering', 'Removing', 'Rensselaers', 'Republican', 'Respectively', 'Retreating', 'Retribution', 'Returning', 'Righteousness', 'Rinaldini', 'Ripplingly', 'Rockaway', 'Rokovoko', 'Rondeletius', 'Rousseau', 'SCHOUTEN', 'SCORESBY', 'SENTENCE', 'SHIPMATES', 'SHIPWRECK', 'SHRINKING', 'SICILIAN', 'SKRIMSHANDER', 'SNEEZING', 'SOLANDER', 'SOMETHING', 'SOMEWHERE', 'SPERMACETI', 'SPITZBERGEN', 'SPOUTING', 'SPOUTINGS', 'SPRINGING', 'STANDING', 'STARBUCK', 'STRAFFORD', 'SURVIVORS', 'Sagittarius', 'Salisbury', 'Sandwich', 'Saratoga', 'Saturday', 'Savesoul', 'Scandinavian', 'Scarcely', 'Schmerenburgh', 'Schoolmasters', 'Sciences', 'Scorning', 'Scorpion', 'Scotland', 'Scriptural', 'Scripture', 'Scriptures', 'Secondly', 'Secretary', 'Securing', 'Semiramis', 'Senators', 'Seychelle', 'Shadrach', 'Shakespeare', 'Sheffield', 'Shetland', 'Shifting', 'Shipmate', 'Shipmates', 'Shooting', 'Shrouded', 'Siberian', 'Sicilian', 'Sideways', 'Silently', 'Skeleton', 'Skrimshander', 'Sleeping', 'Smeerenberg', 'Smithfield', 'Smuggled', 'Snatching', 'Snodhead', 'Societies', 'Socratic', 'Solander', 'Something', 'Sometimes', 'Sounding', 'Southerner', 'Sovereign', 'Spaniard', 'Spaniards', 'Spanishly', 'Specksioneer', 'Specksynder', 'Spermaceti', 'Spermacetti', 'Spitzbergen', 'Spurzheim', 'Squaring', 'Stammering', 'Standing', 'Starboard', 'Starting', 'Startled', 'Stealing', 'Steering', 'Stepping', 'Straightway', 'Strangest', 'Strength', 'Stretched', 'Strictly', 'Striking', 'Stripped', 'Stylites', 'Subtilize', 'Suddenly', 'Sullenly', 'Superior', 'Supplied', 'Supposing', 'Suppression', 'Surmises', 'Suspended', 'Swackhammer', 'Swimming', 'Symphony', 'Syracuse', 'TAMBOURINE', 'TASHTEGO', 'TERROREM', 'THEOLOGY', 'THRASHER', 'Tahitian', 'Tahitians', 'Tamerlane', 'Tanaquil', 'Tartarean', 'Tartarian', 'Temperance', 'Tempered', 'Teneriffe', 'Tennessee', 'Terrible', 'Tertiary', 'Testament', 'Thinking', 'Thorkill', 'Thrasher', 'Threading', 'Throttling', 'Throughout', 'Throwing', 'Thrusted', 'Thrusting', 'Thundering', 'Titanism', 'Tomahawk', 'Tongatobooarrs', 'Tormentoto', 'Touching', 'Trafalgar', 'Traitors', 'Tranquilly', 'Transported', 'Tuileries', 'Typhoons', 'UNPUBLISHED', 'UNWINDING', 'Ultimately', 'Unappalled', 'Uncommonly', 'Unconsciously', 'Unerringly', 'Unfitness', 'Unicornism', 'Unmindful', 'Unobserved', 'Unshored', 'Unwittingly', 'Upharsin', 'Uppermost', 'Vacantly', 'Valparaiso', 'Vancouver', 'Vaticans', 'Vehemently', 'Venetian', 'Venetianly', 'Vengeance', 'Vermonters', 'Versailles', 'Vesuvius', 'Vineyarder', 'Virginia', 'WHALEBONE', 'WHALEMAN', 'WHALESHIPS', 'WINDLASS', 'Washington', 'Watching', 'Wellington', 'Whalebone', 'Whaleman', 'Whalemen', 'Whatever', 'Wheelbarrow', 'Whenever', 'Whereupon', 'Whirlpooles', 'Whitehall', 'Whiteness', 'Whitsuntide', 'Whosoever', 'Willoughby', 'Winnebago', 'Woebegone', 'Wonderfullest', 'Wondrous', 'Wrapping', 'Wretched', 'Wrinkled', 'Yorkshire', 'Zealanders', 'Zeuglodon', 'Zogranda', 'Zoroaster', '_____________', 'abandonedly', 'abandonment', 'abasement', 'abatement', 'abbreviate', 'abbreviation', 'abhorred', 'abhorrence', 'abhorrent', 'abhorring', 'abjectly', 'ablutions', 'abominable', 'abominate', 'abominated', 'abomination', 'aboriginal', 'aboriginally', 'aboriginalness', 'abortion', 'abortions', 'abounded', 'aboundingly', 'abridged', 'abruptly', 'absolute', 'absolutely', 'absorbed', 'absorbing', 'absorbingly', 'abstained', 'abstemious', 'abstinence', 'abstract', 'abstracted', 'abstraction', 'absurdly', 'abundance', 'abundant', 'abundantly', 'accelerate', 'accelerated', 'accelerating', 'accessible', 'accessory', 'accident', 'accidental', 'accidentally', 'accidents', 'accommodate', 'accommodated', 'accommodation', 'accompanied', 'accompanies', 'accompaniments', 'accompany', 'accompanying', 'accomplish', 'accomplishing', 'accomplishment', 'accordance', 'accordingly', 'accosted', 'accountable', 'accountants', 'accounting', 'accumulate', 'accumulated', 'accumulating', 'accuracy', 'accurate', ...]
# 1.3.3 Collections and Bigrams
list(bigrams(['I', 'just', 'called', 'to', 'say', 'I', 'love', 'you']))
[('I', 'just'), ('just', 'called'), ('called', 'to'), ('to', 'say'), ('say', 'I'), ('I', 'love'), ('love', 'you')]
list(bigrams(sentence2))
[('I', 'just'), ('just', 'called'), ('called', 'to'), ('to', 'say'), ('say', 'I'), ('I', 'love'), ('love', 'you')]
text1.collocations()
Sperm Whale; Moby Dick; White Whale; old man; Captain Ahab; sperm whale; Right Whale; Captain Peleg; New Bedford; Cape Horn; cried Ahab; years ago; lower jaw; never mind; Father Mapple; cried Stubb; chief mate; white whale; ivory leg; one hand
# 1.3.4 Counting other things
[len(word) for word in sentence2]
[1, 4, 6, 2, 3, 1, 4, 3]
fdist = FreqDist(len(word) for word in sentence2)
print(fdist) # es gibt 5 verschiedene Wortlängen (1,2,3,4,6) und 9 Wörter
<FreqDist with 5 samples and 8 outcomes>
fdist.most_common() # 2x Länge1, 2x Länge4 etc
[(1, 2), (4, 2), (3, 2), (6, 1), (2, 1)]
fdist.max() # Länge1 am häufigsten
1
fdist[1] # wie oft Länge1
2
fdist.freq(1)
0.25
# 1.4
sentence2
['I', 'just', 'called', 'to', 'say', 'I', 'love', 'you']
[word for word in sentence2 if len(word) >= 3] # größer gleich
['just', 'called', 'say', 'love', 'you']
[word for word in sentence2 if len(word) != 3] # nicht gleich
['I', 'just', 'called', 'to', 'I', 'love']
sorted(word for word in set(text1) if word.endswith('ing'))
['According', 'Anything', 'Assuming', 'Availing', 'Baling', 'Battering', 'Behring', 'Being', 'Blinding', 'Bobbing', 'Bring', 'Capting', 'Charing', 'Chartering', 'Clearing', 'Clinging', 'Closing', 'Coming', 'Comparing', 'Concerning', 'Considering', 'Corresponding', 'Crossing', 'Crowding', 'Cruising', 'Crying', 'Cutting', 'Dashing', 'Descending', 'Despairing', 'Ding', 'Dinting', 'Diving', 'Drawing', 'Drinking', 'Dropping', 'During', 'Dying', 'Entering', 'Espying', 'Excepting', 'Exploring', 'Extending', 'Fanning', 'Fasting', 'Fearing', 'Finding', 'Floating', 'Flying', 'Folding', 'Forming', 'Gaining', 'Gamming', 'Giving', 'Glancing', 'Gliding', 'Going', 'Granting', 'Halting', 'Handling', 'Having', 'Heading', 'Hearing', 'Hearkening', 'Hiding', 'Hoisting', 'Holding', 'Improving', 'Inserting', 'Interweaving', 'Issuing', 'Keeping', 'King', 'Leading', 'Leaning', 'Learning', 'Levelling', 'Lighting', 'Lightning', 'Looking', 'Lowering', 'Lying', 'Making', 'Marking', 'Meddling', 'Mingling', 'Morning', 'Moving', 'Nothing', 'Overhearing', 'Owing', 'Panting', 'Peering', 'Penetrating', 'Pitchpoling', 'Placing', 'Preserving', 'Pressing', 'Projecting', 'Pudding', 'Pulling', 'Pushing', 'Quitting', 'Raising', 'Receiving', 'Reckoning', 'Regarding', 'Remembering', 'Removing', 'Retreating', 'Returning', 'Rising', 'Running', 'Sailing', 'Scorning', 'Securing', 'Seeing', 'Seizing', 'Setting', 'Shaking', 'Shifting', 'Shooting', 'Sing', 'Sleeping', 'Snatching', 'Something', 'Sounding', 'Spring', 'Squaring', 'Stammering', 'Standing', 'Starting', 'Stealing', 'Steering', 'Stepping', 'Stowing', 'Striking', 'Supposing', 'Swimming', 'Swing', 'Taking', 'Thinking', 'Threading', 'Throttling', 'Throwing', 'Thrusting', 'Thundering', 'Touching', 'Turning', 'Tying', 'Walking', 'Wapping', 'Watching', 'Welding', 'Whaling', 'Winding', 'Wrapping', 'abating', 'abhorring', 'abiding', 'abounding', 'absorbing', 'accelerating', 'accompanying', 'accomplishing', 'according', 'accounting', 'accumulating', 'acknowledging', 'adding', 'addressing', 'adhering', 'adjoining', 'adjusting', 'administering', 'admitting', 'admonishing', 'adopting', 'adoring', 'adorning', 'advancing', 'affecting', 'affording', 'agonizing', 'allaying', 'allowing', 'alluding', 'alluring', 'altering', 'alternating', 'amazing', 'amputating', 'amusing', 'animating', 'annihilating', 'announcing', 'anointing', 'anything', 'appalling', 'appearing', 'applying', 'approaching', 'approving', 'arboring', 'arching', 'arguing', 'arising', 'arranging', 'arriving', 'ascending', 'ascertaining', 'aspiring', 'assailing', 'assisting', 'assuaging', 'assuming', 'assuring', 'astonishing', 'attaching', 'attacking', 'attaining', 'attending', 'awaiting', 'babbling', 'backing', 'bagging', 'balancing', 'baling', 'banding', 'bantering', 'bartering', 'bathing', 'battering', 'bawling', 'beaching', 'bearing', 'beating', 'beckoning', 'becoming', 'bedevilling', 'befriending', 'begging', 'beginning', 'beheading', 'beholding', 'being', 'belaying', 'believing', 'belonging', 'bending', 'beseeching', 'bespattering', 'bespeaking', 'bestirring', 'bethinking', 'betokening', 'bewildering', 'bewitching', 'bidding', 'binding', 'biting', 'blackberrying', 'blackling', 'blanketing', 'blasting', 'blazing', 'bleaching', 'bleeding', 'blending', 'blessing', 'blinding', 'blotting', 'blowing', 'blubbering', 'blundering', 'boarding', 'boasting', 'boiling', 'bolstering', 'bolting', 'booming', 'booting', 'bordering', 'borrowing', 'bottling', 'bouncing', 'bounding', 'bowing', 'bowling', 'bowstring', 'boxing', 'bracing', 'braiding', 'braining', 'branding', 'breaching', 'breakfasting', 'breaking', 'breathing', 'breeching', 'breeding', 'breezing', 'brimming', 'bring', 'bringing', 'bristling', 'broiling', 'brooding', 'browsing', 'brushing', 'bubbling', 'buckling', 'budding', 'building', 'bumping', 'bundling', 'bunting', 'burning', 'bursting', 'burying', 'busying', 'butchering', 'buttoning', 'cajoling', 'caking', 'calculating', 'calling', 'canting', 'capering', 'capping', 'capsizing', 'capturing', 'careening', 'carking', 'carrying', 'carving', 'cascading', 'casting', 'catching', 'caulking', 'causing', 'ceasing', 'ceiling', 'chalking', 'chancing', 'changing', 'characterizing', 'charging', 'chasing', 'chattering', 'chatting', 'cheating', 'cheering', 'cherishing', 'cherrying', 'chewing', 'chipping', 'choking', 'chopping', 'churning', 'circling', 'circulating', 'circumnavigating', 'circumventing', 'clanging', 'clanking', 'clapping', 'clashing', 'clattering', 'cleansing', 'clearing', 'cleaving', 'clenching', 'climbing', 'cling', 'clinging', 'clinking', 'closing', 'clothing', 'clotting', 'clustering', 'clutching', 'coalescing', 'coasting', 'coating', 'cobbling', 'coercing', 'coiling', 'collaring', 'collecting', 'colouring', 'combing', 'combining', 'comforting', 'coming', 'commanding', 'commenting', 'committing', 'communicating', 'communing', 'comparing', 'completing', 'composing', 'comprehending', 'comprising', 'concentrating', 'concerning', 'conciliating', 'concluding', 'concurring', 'condemning', 'condescending', 'conducting', 'confiding', 'confining', 'conflicting', 'confounding', 'connecting', 'conquering', 'consecrating', 'considering', 'consisting', 'constituting', 'consulting', 'consuming', 'containing', 'contemplating', 'contending', 'contenting', 'continuing', 'contracting', 'contrasting', 'controlling', 'convalescing', 'conveying', 'cooking', 'cooling', 'copying', 'corresponding', 'counteracting', 'countersinking', 'counting', 'courting', 'covering', 'cozening', 'cracking', 'crashing', 'crawling', 'creaking', 'creating', 'creeping', 'cringing', 'crossing', 'crouching', 'crowding', 'crowing', 'crowning', 'cruising', 'crunching', 'crushing', 'crying', 'cudgelling', 'cunning', 'curdling', 'curing', 'curling', 'cursing', 'curvetting', 'curving', 'cutting', 'cymballing', 'damning', 'dancing', 'dangling', 'daring', 'darkling', 'darting', 'dashing', 'dawning', 'dazzling', 'deadening', 'deadreckoning', 'deafening', 'decanting', 'decapitating', 'deceiving', 'declaring', 'declining', 'dedicating', 'deepening', 'deliberating', 'demanding', 'denominating', 'denying', 'departing', 'depending', 'depicting', 'deprecating', 'deriding', 'descending', 'describing', 'descrying', 'deserving', 'desiring', 'despairing', 'destroying', 'determining', 'developing', 'devoting', 'devouring', 'dictating', 'digesting', 'digging', 'dilating', 'ding', 'dining', 'dinning', 'dipping', 'directing', 'disappearing', 'discharging', 'discoursing', 'discovering', 'discriminating', 'disentangling', 'disheartening', 'disinfecting', 'dismasting', 'dismembering', 'disobeying', 'disposing', 'disputing', 'dissembling', 'distinguishing', 'distrusting', 'disturbing', 'diverting', 'dividing', 'diving', 'dodging', 'dogging', 'doing', 'domineering', 'donning', 'doubling', 'doubting', 'dragging', 'drawing', 'dreaming', 'drenching', 'dressing', 'drifting', 'drinking', 'dripping', 'driving', 'drooping', 'dropping', 'drowning', 'drugging', 'ducking', 'dumpling', 'during', 'dusting', 'dwelling', 'dying', 'eating', 'eddying', 'edging', 'elevating', 'elucidating', 'eluding', 'embalming', 'embarking', 'embattling', 'emblazoning', 'embracing', 'emerging', 'emptying', 'encasing', 'enchanting', 'encircling', 'encountering', 'enduring', 'engaging', 'engendering', 'engineering', 'engraving', 'engrossing', 'enhancing', 'enjoining', 'enjoying', 'enkindling', 'enlightening', 'enlisting', 'ensuing', 'entangling', 'entering', 'entertaining', 'enticing', 'enveloping', 'erecting', 'erring', 'escaping', 'essaying', 'establishing', 'evening', 'everlasting', 'everything', 'evincing', 'exaggerating', 'examining', 'exasperating', 'excavating', 'exceeding', 'excepting', 'exchanging', 'exciting', 'exclaiming', 'excluding', 'exhaling', 'exhausting', 'exhibiting', 'exhilarating', 'existing', 'expanding', 'expending', 'expiring', 'exploding', 'exposing', 'expressing', 'extending', 'extinguishing', 'extorting', 'extracting', 'exulting', 'eyeing', 'facilitating', 'facing', 'fading', 'fainting', 'falling', 'faltering', 'famishing', 'fancying', 'fanning', 'farthing', 'fastening', 'fasting', 'fattening', 'favouring', 'feasting', 'feathering', 'featuring', 'feeding', 'feeling', 'fencing', 'ferreting', 'festooning', 'fetching', 'fighting', 'filling', 'filliping', 'finding', 'fishing', 'fitting', 'fixing', 'flailing', 'flaming', 'flanking', 'flashing', 'flattening', 'flattering', 'fleeting', 'flickering', 'fling', 'flinging', 'flitting', 'floating', 'flogging', 'floundering', 'flourishing', 'flowering', 'flowing', 'fluking', 'fluttering', 'flying', 'foaming', 'fobbing', 'folding', 'following', 'fooling', 'forbearing', 'forbidding', 'foreboding', 'foregoing', 'forerunning', 'foreseeing', 'foreshadowing', 'forgetting', 'forging', 'forking', 'forming', 'forswearing', 'forthing', 'foundering', 'foundling', 'freebooting', 'freezing', 'freshening', 'fringing', 'fronting', 'fumbling', 'furnishing', 'fuzzing', 'gaining', 'gamming', 'gaping', 'gardening', 'gasping', 'gathering', 'gazing', 'generalizing', 'getting', 'gibbering', 'girdling', 'giving', 'glancing', 'glaring', 'gleaming', 'gliding', 'glimmering', 'glistening', 'glittering', 'gloating', 'glorying', 'glowing', 'gnashing', 'gnawing', 'goggling', 'going', 'goring', 'granting', 'grappling', 'grasping', 'grating', 'grazing', 'grinding', 'grinning', 'gripping', 'groping', 'growing', 'guarding', 'gulping', 'gurgling', 'gushing', 'hacking', 'hailing', 'halting', 'halving', 'hammering', 'hamstring', 'handing', 'handling', 'hanging', 'happening', 'harboring', 'harpooning', 'harpstring', 'harvesting', 'hastening', 'hauling', 'haunting', 'having', 'heading', 'hearing', 'heaving', 'heeding', 'heeling', 'helping', 'heralding', 'herding', 'herring', 'hiding', 'hinting', 'hissing', 'hitching', 'hitting', 'hobbling', 'hoisting', 'holding', 'hollowing', 'honing', 'honouring', 'hooking', 'hoping', 'horrifying', 'housekeeping', 'hovering', 'howling', 'hugging', 'humming', 'hunting', 'hurling', 'hurrying', 'hurtling', 'igniting', 'ignoring', 'illuminating', 'imagining', 'imbibing', 'impairing', 'impaling', 'imparting', 'impeding', 'impelling', 'importing', 'imposing', 'improving', 'inclining', 'including', 'incommoding', 'increasing', 'inculcating', 'individualizing', 'inducing', 'indulging', 'infecting', 'infesting', 'ing', 'inhaling', 'inkling', 'inlaying', 'inquiring', 'inserting', 'insinuating', 'inspecting', 'insulting', 'intending', 'intensifying', 'interblending', 'interesting', 'interfering', 'interflowing', 'interfusing', 'interlacing', 'interluding', 'intermeddling', 'intermitting', 'interpreting', 'intersecting', 'intertwisting', 'intervening', 'inventing', 'investing', 'inviting', 'invoking', 'involving', 'inwreathing', 'issuing', 'jabbering', 'jamming', 'jeering', 'jerking', 'jetting', 'jingling', 'joining', 'joking', 'judging', 'juggling', 'jumping', 'keeling', 'keeping', 'kicking', 'kidnapping', 'killing', 'kindling', 'king', 'kneeling', 'knitting', 'knocking', 'knowing', 'landing', 'languishing', 'lashing', 'lasting', 'lathering', 'laughing', 'launching', 'laying', 'leading', 'leaking', 'leaning', 'leaping', 'learning', 'leaving', 'leering', 'letting', 'lifting', 'lighting', 'lightning', 'limping', 'lingering', 'lining', 'listening', 'living', 'loading', 'lobtailing', 'locking', 'loitering', 'longing', 'looking', 'looming', 'loosening', 'lording', 'losing', 'lounging', 'loving', 'lowering', 'lunging', 'lurching', 'lurking', 'lying', 'maddening', 'magnetizing', 'magnifying', 'maiming', 'maintaining', 'making', 'managing', 'manufacturing', 'marching', 'marking', 'marling', 'marring', 'marvelling', 'mastering', 'meaning', 'measuring', 'meddling', 'meeting', 'melting', 'menacing', 'mending', 'mentioning', 'merging', 'middling', 'migrating', 'milling', 'mimicking', 'mincing', 'mingling', 'misgiving', 'missing', 'mistaking', 'mistifying', 'mixing', 'moaning', 'mobbing', 'mocking', 'modifying', 'molesting', 'mongering', 'monopolising', 'morning', 'mortemizing', 'motioning', 'mounting', 'mourning', 'moving', 'muffling', 'mumbling', 'murdering', 'murmuring', 'mustering', 'mutinying', 'muttering', 'mystifying', 'nailing', 'napping', 'narrating', 'nearing', 'needing', 'neighboring', 'nestling', 'nibbling', 'nothing', 'noticing', 'noting', 'notwithstanding', 'nourishing', 'nursing', 'obeying', 'obscuring', 'observing', 'obtruding', 'occupying', 'offering', 'officiating', 'offing', 'offspring', ...]
sorted(word for word in set(text1) if 'shm' in word)
['Englishman', 'Englishmen', 'Ishmael', 'accomplishment', 'astonishment', 'blandishments', 'embellishments', 'establishment', 'nourishment', 'punishment']
sorted(word for word in set(text7) if '-' in word and 'index' in word)
['Stock-index', 'index-arbitrage', 'index-fund', 'index-options', 'index-related', 'stock-index']
sorted(word for word in set(text3) if word.istitle() and len(word) > 10) # großbuchstaben und länger als 10
['Abelmizraim', 'Allonbachuth', 'Beerlahairoi', 'Canaanitish', 'Chedorlaomer', 'Girgashites', 'Hazarmaveth', 'Hazezontamar', 'Ishmeelites', 'Jegarsahadutha', 'Jehovahjireh', 'Kirjatharba', 'Melchizedek', 'Mesopotamia', 'Peradventure', 'Philistines', 'Zaphnathpaaneah']
sorted(word for word in set(sentence2) if not word.islower())
['I']
sorted(t for t in set(text2) if 'cie' in t or 'cei' in t)
['ancient', 'ceiling', 'conceit', 'conceited', 'conceive', 'conscience', 'conscientious', 'conscientiously', 'deceitful', 'deceive', 'deceived', 'deceiving', 'deficiencies', 'deficiency', 'deficient', 'delicacies', 'excellencies', 'fancied', 'insufficiency', 'insufficient', 'legacies', 'perceive', 'perceived', 'perceiving', 'prescience', 'prophecies', 'receipt', 'receive', 'received', 'receiving', 'society', 'species', 'sufficient', 'sufficiently', 'undeceive', 'undeceiving']
# 1.4.2 Operating on Every Element
[word.upper() for word in text1]
['[', 'MOBY', 'DICK', 'BY', 'HERMAN', 'MELVILLE', '1851', ']', 'ETYMOLOGY', '.', '(', 'SUPPLIED', 'BY', 'A', 'LATE', 'CONSUMPTIVE', 'USHER', 'TO', 'A', 'GRAMMAR', 'SCHOOL', ')', 'THE', 'PALE', 'USHER', '--', 'THREADBARE', 'IN', 'COAT', ',', 'HEART', ',', 'BODY', ',', 'AND', 'BRAIN', ';', 'I', 'SEE', 'HIM', 'NOW', '.', 'HE', 'WAS', 'EVER', 'DUSTING', 'HIS', 'OLD', 'LEXICONS', 'AND', 'GRAMMARS', ',', 'WITH', 'A', 'QUEER', 'HANDKERCHIEF', ',', 'MOCKINGLY', 'EMBELLISHED', 'WITH', 'ALL', 'THE', 'GAY', 'FLAGS', 'OF', 'ALL', 'THE', 'KNOWN', 'NATIONS', 'OF', 'THE', 'WORLD', '.', 'HE', 'LOVED', 'TO', 'DUST', 'HIS', 'OLD', 'GRAMMARS', ';', 'IT', 'SOMEHOW', 'MILDLY', 'REMINDED', 'HIM', 'OF', 'HIS', 'MORTALITY', '.', '"', 'WHILE', 'YOU', 'TAKE', 'IN', 'HAND', 'TO', 'SCHOOL', 'OTHERS', ',', 'AND', 'TO', 'TEACH', 'THEM', 'BY', 'WHAT', 'NAME', 'A', 'WHALE', '-', 'FISH', 'IS', 'TO', 'BE', 'CALLED', 'IN', 'OUR', 'TONGUE', 'LEAVING', 'OUT', ',', 'THROUGH', 'IGNORANCE', ',', 'THE', 'LETTER', 'H', ',', 'WHICH', 'ALMOST', 'ALONE', 'MAKETH', 'THE', 'SIGNIFICATION', 'OF', 'THE', 'WORD', ',', 'YOU', 'DELIVER', 'THAT', 'WHICH', 'IS', 'NOT', 'TRUE', '."', '--', 'HACKLUYT', '"', 'WHALE', '.', '...', 'SW', '.', 'AND', 'DAN', '.', 'HVAL', '.', 'THIS', 'ANIMAL', 'IS', 'NAMED', 'FROM', 'ROUNDNESS', 'OR', 'ROLLING', ';', 'FOR', 'IN', 'DAN', '.', 'HVALT', 'IS', 'ARCHED', 'OR', 'VAULTED', '."', '--', 'WEBSTER', "'", 'S', 'DICTIONARY', '"', 'WHALE', '.', '...', 'IT', 'IS', 'MORE', 'IMMEDIATELY', 'FROM', 'THE', 'DUT', '.', 'AND', 'GER', '.', 'WALLEN', ';', 'A', '.', 'S', '.', 'WALW', '-', 'IAN', ',', 'TO', 'ROLL', ',', 'TO', 'WALLOW', '."', '--', 'RICHARDSON', "'", 'S', 'DICTIONARY', 'KETOS', ',', 'GREEK', '.', 'CETUS', ',', 'LATIN', '.', 'WHOEL', ',', 'ANGLO', '-', 'SAXON', '.', 'HVALT', ',', 'DANISH', '.', 'WAL', ',', 'DUTCH', '.', 'HWAL', ',', 'SWEDISH', '.', 'WHALE', ',', 'ICELANDIC', '.', 'WHALE', ',', 'ENGLISH', '.', 'BALEINE', ',', 'FRENCH', '.', 'BALLENA', ',', 'SPANISH', '.', 'PEKEE', '-', 'NUEE', '-', 'NUEE', ',', 'FEGEE', '.', 'PEKEE', '-', 'NUEE', '-', 'NUEE', ',', 'ERROMANGOAN', '.', 'EXTRACTS', '(', 'SUPPLIED', 'BY', 'A', 'SUB', '-', 'SUB', '-', 'LIBRARIAN', ').', 'IT', 'WILL', 'BE', 'SEEN', 'THAT', 'THIS', 'MERE', 'PAINSTAKING', 'BURROWER', 'AND', 'GRUB', '-', 'WORM', 'OF', 'A', 'POOR', 'DEVIL', 'OF', 'A', 'SUB', '-', 'SUB', 'APPEARS', 'TO', 'HAVE', 'GONE', 'THROUGH', 'THE', 'LONG', 'VATICANS', 'AND', 'STREET', '-', 'STALLS', 'OF', 'THE', 'EARTH', ',', 'PICKING', 'UP', 'WHATEVER', 'RANDOM', 'ALLUSIONS', 'TO', 'WHALES', 'HE', 'COULD', 'ANYWAYS', 'FIND', 'IN', 'ANY', 'BOOK', 'WHATSOEVER', ',', 'SACRED', 'OR', 'PROFANE', '.', 'THEREFORE', 'YOU', 'MUST', 'NOT', ',', 'IN', 'EVERY', 'CASE', 'AT', 'LEAST', ',', 'TAKE', 'THE', 'HIGGLEDY', '-', 'PIGGLEDY', 'WHALE', 'STATEMENTS', ',', 'HOWEVER', 'AUTHENTIC', ',', 'IN', 'THESE', 'EXTRACTS', ',', 'FOR', 'VERITABLE', 'GOSPEL', 'CETOLOGY', '.', 'FAR', 'FROM', 'IT', '.', 'AS', 'TOUCHING', 'THE', 'ANCIENT', 'AUTHORS', 'GENERALLY', ',', 'AS', 'WELL', 'AS', 'THE', 'POETS', 'HERE', 'APPEARING', ',', 'THESE', 'EXTRACTS', 'ARE', 'SOLELY', 'VALUABLE', 'OR', 'ENTERTAINING', ',', 'AS', 'AFFORDING', 'A', 'GLANCING', 'BIRD', "'", 'S', 'EYE', 'VIEW', 'OF', 'WHAT', 'HAS', 'BEEN', 'PROMISCUOUSLY', 'SAID', ',', 'THOUGHT', ',', 'FANCIED', ',', 'AND', 'SUNG', 'OF', 'LEVIATHAN', ',', 'BY', 'MANY', 'NATIONS', 'AND', 'GENERATIONS', ',', 'INCLUDING', 'OUR', 'OWN', '.', 'SO', 'FARE', 'THEE', 'WELL', ',', 'POOR', 'DEVIL', 'OF', 'A', 'SUB', '-', 'SUB', ',', 'WHOSE', 'COMMENTATOR', 'I', 'AM', '.', 'THOU', 'BELONGEST', 'TO', 'THAT', 'HOPELESS', ',', 'SALLOW', 'TRIBE', 'WHICH', 'NO', 'WINE', 'OF', 'THIS', 'WORLD', 'WILL', 'EVER', 'WARM', ';', 'AND', 'FOR', 'WHOM', 'EVEN', 'PALE', 'SHERRY', 'WOULD', 'BE', 'TOO', 'ROSY', '-', 'STRONG', ';', 'BUT', 'WITH', 'WHOM', 'ONE', 'SOMETIMES', 'LOVES', 'TO', 'SIT', ',', 'AND', 'FEEL', 'POOR', '-', 'DEVILISH', ',', 'TOO', ';', 'AND', 'GROW', 'CONVIVIAL', 'UPON', 'TEARS', ';', 'AND', 'SAY', 'TO', 'THEM', 'BLUNTLY', ',', 'WITH', 'FULL', 'EYES', 'AND', 'EMPTY', 'GLASSES', ',', 'AND', 'IN', 'NOT', 'ALTOGETHER', 'UNPLEASANT', 'SADNESS', '--', 'GIVE', 'IT', 'UP', ',', 'SUB', '-', 'SUBS', '!', 'FOR', 'BY', 'HOW', 'MUCH', 'THE', 'MORE', 'PAINS', 'YE', 'TAKE', 'TO', 'PLEASE', 'THE', 'WORLD', ',', 'BY', 'SO', 'MUCH', 'THE', 'MORE', 'SHALL', 'YE', 'FOR', 'EVER', 'GO', 'THANKLESS', '!', 'WOULD', 'THAT', 'I', 'COULD', 'CLEAR', 'OUT', 'HAMPTON', 'COURT', 'AND', 'THE', 'TUILERIES', 'FOR', 'YE', '!', 'BUT', 'GULP', 'DOWN', 'YOUR', 'TEARS', 'AND', 'HIE', 'ALOFT', 'TO', 'THE', 'ROYAL', '-', 'MAST', 'WITH', 'YOUR', 'HEARTS', ';', 'FOR', 'YOUR', 'FRIENDS', 'WHO', 'HAVE', 'GONE', 'BEFORE', 'ARE', 'CLEARING', 'OUT', 'THE', 'SEVEN', '-', 'STORIED', 'HEAVENS', ',', 'AND', 'MAKING', 'REFUGEES', 'OF', 'LONG', '-', 'PAMPERED', 'GABRIEL', ',', 'MICHAEL', ',', 'AND', 'RAPHAEL', ',', 'AGAINST', 'YOUR', 'COMING', '.', 'HERE', 'YE', 'STRIKE', 'BUT', 'SPLINTERED', 'HEARTS', 'TOGETHER', '--', 'THERE', ',', 'YE', 'SHALL', 'STRIKE', 'UNSPLINTERABLE', 'GLASSES', '!', 'EXTRACTS', '.', '"', 'AND', 'GOD', 'CREATED', 'GREAT', 'WHALES', '."', '--', 'GENESIS', '.', '"', 'LEVIATHAN', 'MAKETH', 'A', 'PATH', 'TO', 'SHINE', 'AFTER', 'HIM', ';', 'ONE', 'WOULD', 'THINK', 'THE', 'DEEP', 'TO', 'BE', 'HOARY', '."', '--', 'JOB', '.', '"', 'NOW', 'THE', 'LORD', 'HAD', 'PREPARED', 'A', 'GREAT', 'FISH', 'TO', 'SWALLOW', 'UP', 'JONAH', '."', '--', 'JONAH', '.', '"', 'THERE', 'GO', 'THE', 'SHIPS', ';', 'THERE', 'IS', 'THAT', 'LEVIATHAN', 'WHOM', 'THOU', 'HAST', 'MADE', 'TO', 'PLAY', 'THEREIN', '."', '--', 'PSALMS', '.', '"', 'IN', 'THAT', 'DAY', ',', 'THE', 'LORD', 'WITH', 'HIS', 'SORE', ',', 'AND', 'GREAT', ',', 'AND', 'STRONG', 'SWORD', ',', 'SHALL', 'PUNISH', 'LEVIATHAN', 'THE', 'PIERCING', 'SERPENT', ',', 'EVEN', 'LEVIATHAN', 'THAT', 'CROOKED', 'SERPENT', ';', 'AND', 'HE', 'SHALL', 'SLAY', 'THE', 'DRAGON', 'THAT', 'IS', 'IN', 'THE', 'SEA', '."', '--', 'ISAIAH', '"', 'AND', 'WHAT', 'THING', 'SOEVER', 'BESIDES', 'COMETH', 'WITHIN', 'THE', 'CHAOS', 'OF', 'THIS', 'MONSTER', "'", 'S', 'MOUTH', ',', 'BE', 'IT', 'BEAST', ',', 'BOAT', ',', 'OR', 'STONE', ',', 'DOWN', 'IT', 'GOES', 'ALL', 'INCONTINENTLY', 'THAT', 'FOUL', 'GREAT', 'SWALLOW', 'OF', 'HIS', ',', 'AND', 'PERISHETH', 'IN', 'THE', 'BOTTOMLESS', 'GULF', 'OF', 'HIS', 'PAUNCH', '."', '--', 'HOLLAND', "'", 'S', 'PLUTARCH', "'", 'S', 'MORALS', '.', '"', 'THE', 'INDIAN', 'SEA', 'BREEDETH', 'THE', 'MOST', 'AND', 'THE', 'BIGGEST', 'FISHES', 'THAT', 'ARE', ':', 'AMONG', 'WHICH', 'THE', 'WHALES', 'AND', 'WHIRLPOOLES', 'CALLED', 'BALAENE', ',', 'TAKE', 'UP', 'AS', 'MUCH', 'IN', 'LENGTH', 'AS', 'FOUR', 'ACRES', 'OR', 'ARPENS', 'OF', 'LAND', '."', '--', 'HOLLAND', "'", 'S', 'PLINY', '.', '"', 'SCARCELY', 'HAD', 'WE', 'PROCEEDED', 'TWO', 'DAYS', 'ON', 'THE', 'SEA', ',', 'WHEN', 'ABOUT', 'SUNRISE', 'A', 'GREAT', 'MANY', 'WHALES', 'AND', 'OTHER', 'MONSTERS', 'OF', 'THE', 'SEA', ',', 'APPEARED', '.', 'AMONG', 'THE', 'FORMER', ',', 'ONE', 'WAS', 'OF', 'A', 'MOST', 'MONSTROUS', 'SIZE', '.', '...', 'THIS', 'CAME', 'TOWARDS', 'US', ',', 'OPEN', '-', 'MOUTHED', ',', 'RAISING', 'THE', 'WAVES', 'ON', 'ALL', 'SIDES', ',', 'AND', 'BEATING', 'THE', 'SEA', 'BEFORE', 'HIM', 'INTO', 'A', 'FOAM', '."', '--', 'TOOKE', "'", 'S', 'LUCIAN', '.', '"', 'THE', 'TRUE', 'HISTORY', '."', '"', 'HE', 'VISITED', 'THIS', 'COUNTRY', 'ALSO', 'WITH', 'A', 'VIEW', 'OF', 'CATCHING', 'HORSE', '-', 'WHALES', ',', 'WHICH', 'HAD', 'BONES', 'OF', 'VERY', 'GREAT', 'VALUE', 'FOR', 'THEIR', 'TEETH', ',', 'OF', 'WHICH', 'HE', 'BROUGHT', 'SOME', 'TO', 'THE', 'KING', '.', '...', 'THE', 'BEST', 'WHALES', 'WERE', 'CATCHED', 'IN', 'HIS', 'OWN', 'COUNTRY', ',', 'OF', 'WHICH', 'SOME', 'WERE', 'FORTY', '-', 'EIGHT', ',', 'SOME', 'FIFTY', 'YARDS', 'LONG', '.', 'HE', ...]
len(text1)
260819
len(set(text1))
19317
len(set(word.lower() for word in text1)) # dont double count i.e. This and this
17231
len(set(word.lower() for word in text1 if word.isalpha())) # gets rid of punctuation
16948
# 1.4.3 Nested Code Blocks
word1 = 'cat'
if len(word1) < 5:
print('word length is less than 5')
word length is less than 5
if len(word1) >= 5:
print('word is greater than or equal to 5')
else:
print('no')
no
for word in ['This', 'is', 'it']:
print(word)
This is it
# 1.4.4 Looping with Conditions
for word in sentence2:
if word.endswith('d'):
print(word)
called
for word in sentence2:
if word.islower():
print(word, 'is a lowercase word')
elif word.istitle():
print(word, 'is a titlecase word')
else:
print(word, 'is punctuation')
I is a titlecase word just is a lowercase word called is a lowercase word to is a lowercase word say is a lowercase word I is a titlecase word love is a lowercase word you is a lowercase word
tricky = sorted(word for word in set(text2) if 'cie' in word or 'cei' in word)
for word in tricky:
print(word, end=' ') # fügt Leerzeichen statt Umbruch ein
ancient ceiling conceit conceited conceive conscience conscientious conscientiously deceitful deceive deceived deceiving deficiencies deficiency deficient delicacies excellencies fancied insufficiency insufficient legacies perceive perceived perceiving prescience prophecies receipt receive received receiving society species sufficient sufficiently undeceive undeceiving
# 1.5.5