from nltk.book import *
text1
<Text: Moby Dick by Herman Melville 1851>
text7
<Text: Wall Street Journal>
text1.concordance("monstrous")
Displaying 11 of 11 matches: ong the former , one was of a most monstrous size . ... This came towards us , ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r ll over with a heathenish array of monstrous clubs and spears . Some were thick d as you gazed , and wondered what monstrous cannibal and savage could ever hav that has survived the flood ; most monstrous and most mountainous ! That Himmal they might scout at Moby Dick as a monstrous fable , or still worse and more de th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l ing Scenes . In connexion with the monstrous pictures of whales , I am strongly ere to enter upon those still more monstrous stories of them which are to be fo ght have been rummaged out of this monstrous cabinet there is no telling . But of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
text1.concordance("water")
Displaying 25 of 190 matches: mighty whales which swim in a sea of water , and have a sea of oil swimming in e aground in twelve or thirteen feet water ." -- THOMAS EDGE ' S TEN VOYAGES TO n , and in wantonness fuzzing up the water through their pipes and vents , whic the vault of heaven . " So fire with water to compare , The ocean serves on hig n the bore than the main pipe of the water - works at London Bridge , and the w r - works at London Bridge , and the water roaring in its passage through that being preserved by leaping into the water when they saw the onset was inevitab denly a mighty mass emerged from the water , and shot up perpendicularly into t ight of land . Look at the crowds of water - gazers there . Circumambulate the ore crowds , pacing straight for the water , and seemingly bound for a dive . S No . They must get just as nigh the water as they possibly can without falling , and he will infallibly lead you to water , if water there be in all that regi ll infallibly lead you to water , if water there be in all that region . Should as every one knows , meditation and water are wedded for ever . But here is an -- what is the one charm wanting ?-- Water -- there is not a drop of water ther ?-- Water -- there is not a drop of water there ! Were Niagara but a cataract ike a Newfoundland dog just from the water , and sat up in bed , stiff as a pik stand centre table , dipped it into water and commenced lathering his face . I natives . But New Bedford beats all Water Street and Wapping . In these last - ysters observing the sun through the water , and thinking that thick water the the water , and thinking that thick water the thinnest of air . Methinks my bo to the floor with the weight of the water it had absorbed . However , hat and reitstein , with a perennial well of water within the walls . But the side ladd ates ? Cadiz is in Spain ; as far by water , from Joppa , as Jonah could possib , sunk , too , beneath the ship ' s water - line , Jonah feels the heralding p
text2.concordance("affection")
Displaying 25 of 79 matches: , however , and , as a mark of his affection for the three girls , he left them t . It was very well known that no affection was ever supposed to exist between deration of politeness or maternal affection on the side of the former , the tw d the suspicion -- the hope of his affection for me may warrant , without impru hich forbade the indulgence of his affection . She knew that his mother neither rd she gave one with still greater affection . Though her late conversation wit can never hope to feel or inspire affection again , and if her home be uncomfo m of the sense , elegance , mutual affection , and domestic comfort of the fami , and which recommended him to her affection beyond every thing else . His soci ween the parties might forward the affection of Mr . Willoughby , an equally st the most pointed assurance of her affection . Elinor could not be surprised at he natural consequence of a strong affection in a young and ardent mind . This opinion . But by an appeal to her affection for her mother , by representing t every alteration of a place which affection had established as perfect with hi e will always have one claim of my affection , which no other can possibly shar f the evening declared at once his affection and happiness . " Shall we see you ause he took leave of us with less affection than his usual behaviour has shewn ness ." " I want no proof of their affection ," said Elinor ; " but of their en onths , without telling her of his affection ;-- that they should part without ould be the natural result of your affection for her . She used to be all unres distinguished Elinor by no mark of affection . Marianne saw and listened with i th no inclination for expense , no affection for strangers , no profession , an till distinguished her by the same affection which once she had felt no doubt o al of her confidence in Edward ' s affection , to the remembrance of every mark was made ? Had he never owned his affection to yourself ?" " Oh , no ; but if
text3.concordance("lived")
Displaying 25 of 38 matches: ay when they were created . And Adam lived an hundred and thirty years , and be ughters : And all the days that Adam lived were nine hundred and thirty yea and nd thirty yea and he died . And Seth lived an hundred and five years , and bega ve years , and begat Enos : And Seth lived after he begat Enos eight hundred an welve years : and he died . And Enos lived ninety years , and begat Cainan : An years , and begat Cainan : And Enos lived after he begat Cainan eight hundred ive years : and he died . And Cainan lived seventy years and begat Mahalaleel : rs and begat Mahalaleel : And Cainan lived after he begat Mahalaleel eight hund years : and he died . And Mahalaleel lived sixty and five years , and begat Jar s , and begat Jared : And Mahalaleel lived after he begat Jared eight hundred a and five yea and he died . And Jared lived an hundred sixty and two years , and o years , and he begat Eno And Jared lived after he begat Enoch eight hundred y and two yea and he died . And Enoch lived sixty and five years , and begat Met ; for God took him . And Methuselah lived an hundred eighty and seven years , , and begat Lamech . And Methuselah lived after he begat Lamech seven hundred nd nine yea and he died . And Lamech lived an hundred eighty and two years , an ch the LORD hath cursed . And Lamech lived after he begat Noah five hundred nin naan shall be his servant . And Noah lived after the flood three hundred and fi xad two years after the flo And Shem lived after he begat Arphaxad five hundred at sons and daughters . And Arphaxad lived five and thirty years , and begat Sa ars , and begat Salah : And Arphaxad lived after he begat Salah four hundred an begat sons and daughters . And Salah lived thirty years , and begat Eber : And y years , and begat Eber : And Salah lived after he begat Eber four hundred and begat sons and daughters . And Eber lived four and thirty years , and begat Pe y years , and begat Peleg : And Eber lived after he begat Peleg four hundred an
text4.concordance("god")
Displaying 25 of 108 matches: eliance on the protection of Almighty God , I shall forthwith commence the duti humble , acknowledged dependence upon God and His overruling providence . We ha great office I must humbly invoke the God of our fathers for wisdom and firmnes d the same Bible and pray to the same God , and each invokes His aid against th hat any men should dare to ask a just God ' s assistance in wringing their brea offenses which , in the providence of God , must needs come , but which , havin butes which the believers in a living God always ascribe to Him ? Fondly do we war may speedily pass away . Yet , if God wills that it continue until all the r all , with firmness in the right as God gives us to see the right , let us st the prayers of the nation to Almighty God in behalf of this consummation . Fell r , they have " followed the light as God gave them to see the light ." They ar ess their fathers and their fathers ' God that the Union was preserved , that s the support and blessings of Almighty God . Fellow citizens , in the presence o ng the power and goodness of Almighty God , who presides over the destiny of na expect the favor and help of Almighty God -- that He will give to me wisdom , s suggestion to enterprise and labor . God has placed upon our head a diadem and urn than the pledge I now give before God and these witnesses of unreserved and han human life can escape the laws of God and nature . Manifestly nothing is mo and invoking the guidance of Almighty God . Our faith teaches that there is no re is no safer reliance than upon the God of our fathers , who has so singularl e the direction and favor of Almighty God . I should shrink from the duties thi devolve upon it , and in the fear of God will " take occasion by the hand and citizens and the aid of the Almighty God in the discharge of my responsible du our heartstrings like some air out of God ' s own presence , where justice and forward - looking men , to my side . God helping me , I will not fail them , i
text5.concordance("lol")
Displaying 25 of 822 matches: ast PART 24 / m boo . 26 / m and sexy lol U115 boo . JOIN PART he drew a girl w ope he didnt draw a penis PART ewwwww lol & a head between her legs JOIN JOIN s a bowl i got a blunt an a bong ...... lol JOIN well , glad it worked out my cha e " PART Hi U121 in ny . ACTION would lol @ U121 . . . but appearently she does 30 make sure u buy a nice ring for U6 lol U7 Hi U115 . ACTION isnt falling for didnt ya hear !!!! PART JOIN geeshhh lol U6 PART hes deaf ppl here dont get it es nobody here i wanna misbeahve with lol JOIN so read it . thanks U7 .. Im hap ies want to chat can i talk to him !! lol U121 !!! forwards too lol JOIN ALL PE k to him !! lol U121 !!! forwards too lol JOIN ALL PErvs ... redirect to U121 ' loves ME the most i love myself JOIN lol U44 how do u know that what ? jerkett ng wrong ... i can see it in his eyes lol U20 = fiance Jerketts lmao wtf yah I cooler by the minute what 'd I miss ? lol noo there too much work ! why not ?? that mean I want you ? U6 hello room lol U83 and this .. has been the grammar the rule he 's in PM land now though lol ah ok i wont bug em then someone wann flight to hell :) lmao bbl maybe PART LOL lol U7 it was me , U83 hahah U83 ! 80 ht to hell :) lmao bbl maybe PART LOL lol U7 it was me , U83 hahah U83 ! 808265 082653953 K-Fed got his ass kicked .. Lol . ACTION laughs . i got a first class . i got a first class ticket to hell lol U7 JOIN any texas girls in here ? any . whats up U155 i was only kidding . lol he 's a douchebag . Poor U121 i 'm bo ??? sits with U30 Cum to my shower . lol U121 . ACTION U1370 watches his nads ur nad with a stick . ca u U23 ewwww lol *sniffs* ewwwwww PART U115 ! owww spl ACTION is resisting . ur female right lol U115 beeeeehave Remember the LAst tim pm's me . charge that is 1.99 / min . lol @ innocent hahah lol .... yeah LOLOLO is 1.99 / min . lol @ innocent hahah lol .... yeah LOLOLOLLL U12 thats not nic s . lmao no U115 Check my record . :) Lol lick em U7 U23 how old r u lol Way to
text1.similar("monstrous")
true contemptible christian abundant few part mean careful puzzled mystifying passing curious loving wise doleful gamesome singular delightfully perilous fearless
text2.similar("monstrous")
very so exceedingly heartily a as good great extremely remarkably sweet vast amazingly
text2.common_contexts(["monstrous", "very"])
am_glad a_pretty a_lucky is_pretty be_glad
text3.similar("true")
the
text1.similar("true")
all in of that this on he not seen as there other see now known it take hand be for
text3.common_contexts(["true", "off"])
No common contexts were found
text3.common_contexts(["true", "tower"])
No common contexts were found
text1.common_contexts(["true", "tower"])
s_and
text2.common_contexts(["child", "coffee"])
No common contexts were found
text2.common_contexts(["child", "wind"])
No common contexts were found
text2.common_contexts(["fair", "wind"])
No common contexts were found
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/numpy/core/__init__.py in <module> 21 try: ---> 22 from . import multiarray 23 except ImportError as exc: /usr/local/lib/python3.7/dist-packages/numpy/core/multiarray.py in <module> 11 ---> 12 from . import overrides 13 from . import _multiarray_umath /usr/local/lib/python3.7/dist-packages/numpy/core/overrides.py in <module> 6 ----> 7 from numpy.core._multiarray_umath import ( 8 add_docstring, implement_array_function, _get_implementing_args) ImportError: libf77blas.so.3: cannot open shared object file: No such file or directory During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/nltk/draw/dispersion.py in dispersion_plot(text, words, ignore_case, title) 25 try: ---> 26 from matplotlib import pylab 27 except ImportError as e: /usr/local/lib/python3.7/dist-packages/matplotlib/__init__.py in <module> 106 # definitions, so it is safe to import from it here. --> 107 from . import _api, cbook, docstring, rcsetup 108 from matplotlib.cbook import MatplotlibDeprecationWarning, sanitize_sequence /usr/local/lib/python3.7/dist-packages/matplotlib/cbook/__init__.py in <module> 27 ---> 28 import numpy as np 29 /usr/local/lib/python3.7/dist-packages/numpy/__init__.py in <module> 149 --> 150 from . import core 151 from .core import * /usr/local/lib/python3.7/dist-packages/numpy/core/__init__.py in <module> 47 __version__, exc) ---> 48 raise ImportError(msg) 49 finally: ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.7 from "/usr/bin/python3" * The NumPy version is: "1.21.2" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: libf77blas.so.3: cannot open shared object file: No such file or directory The above exception was the direct cause of the following exception: ValueError Traceback (most recent call last) /tmp/ipykernel_6842/2208355114.py in <module> ----> 1 text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"]) /usr/local/lib/python3.7/dist-packages/nltk/text.py in dispersion_plot(self, words) 550 from nltk.draw import dispersion_plot 551 --> 552 dispersion_plot(self, words) 553 554 def _train_default_ngram_lm(self, tokenized_sents, n=3): /usr/local/lib/python3.7/dist-packages/nltk/draw/dispersion.py in dispersion_plot(text, words, ignore_case, title) 29 "The plot function requires matplotlib to be installed." 30 "See http://matplotlib.org/" ---> 31 ) from e 32 33 text = list(text) ValueError: The plot function requires matplotlib to be installed.See http://matplotlib.org/
text2.generate()
Building ngram index...
knew , had by this remembrance , and if , by rapid degrees , so long . , she could live without one another , and in her rambles . at least the last evening of a brother , could you know , from the first . Dashwood ? this gentleman himself , and must put up with a kindness which they are very much vexed at , for it -- and I shall keep it entirely . admire it ; and it is , explain the grounds , or if any place could give her ease , was a
'knew , had by this remembrance , and if , by rapid degrees , so long .\n, she could live without one another , and in her rambles . at least\nthe last evening of a brother , could you know , from the first .\nDashwood ? this gentleman himself , and must put up with a kindness\nwhich they are very much vexed at , for it -- and I shall keep it\nentirely . admire it ; and it is , explain the grounds , or if any\nplace could give her ease , was a'
text5.generate()
Building ngram index...
here !! - m - canada with cam girls pm me baby U109 .. get the hell aint JOIN I am PART lmao U5 i hate to see me , U111 ? The Offspring - Come Out Swinging ... . 2 ****** Welcome to #talkcity_adults ! the heck is that a roaring column of flame streaks out of uniform . I dont know how thats facilitated . chat ? .. but hey , a / f / FL pics , wanna chat pm me Whaaa ? amy have offended him ? ACTION this song is so good night good u
'here !! - m - canada with cam girls pm me baby U109 .. get the hell\naint JOIN I am PART lmao U5 i hate to see me , U111 ? The Offspring -\nCome Out Swinging ... . 2 ****** Welcome to #talkcity_adults ! the\nheck is that a roaring column of flame streaks out of uniform . I dont\nknow how thats facilitated . chat ? .. but hey , a / f / FL pics ,\nwanna chat pm me Whaaa ? amy have offended him ? ACTION this song is\nso good night good u'
text1.generate()
Building ngram index...
long , from one to the top - mast , and no coffin and went out a sea captain -- this peaking of the whales . , so as to preserve all his might had in former years abounding with them , they toil with their lances , strange tales of Southern whaling . at once the bravest Indians he was , after in vain strove to pierce the profundity . ? then ?" a levelled flame of pale , And give no chance , watch him ; though the line , it is to be gainsaid . have been
'long , from one to the top - mast , and no coffin and went out a sea\ncaptain -- this peaking of the whales . , so as to preserve all his\nmight had in former years abounding with them , they toil with their\nlances , strange tales of Southern whaling . at once the bravest\nIndians he was , after in vain strove to pierce the profundity . ?\nthen ?" a levelled flame of pale , And give no chance , watch him ;\nthough the line , it is to be gainsaid . have been'
len(text3)
44764
set(text3)
{'darkne', 'ear', 'knowest', 'run', 'Whereas', 'awaked', 'Jabal', 'Haggi', 'trained', 'age', 'despised', 'occasion', 'grass', 'smelled', 'Horite', 'sleep', 'field', 'southward', '.)', 'Eden', 'chain', 'thi', 'afterward', 'until', 'welfare', 'thirteen', 'pleased', 'espied', 'deliver', 'foot', 'Oh', 'knoweth', 'therefore', 'twelve', 'birthright', 'surely', 'son', 'true', 'law', 'Chaldees', 'walking', 'clothes', 'Guni', 'Ellasar', 'Zemarite', 'add', 'Penuel', 'fame', 'pursued', 'Shobal', 'Shalt', 'stopped', 'hold', 'signs', 'Girgasite', 'grave', 'white', 'believed', 'Malchiel', 'Know', 'gift', 'dove', 'lightly', 'Adam', 'thy', 'Iscah', 'feared', 'beginning', 'separate', 'artificer', 'weapons', 'Japheth', 'getting', 'desired', 'thoroughly', 'See', 'burn', 'cave', 'abode', 'thereon', 'both', 'fifth', 'become', 'flee', 'That', 'laughed', 'asked', 'first', 'storehouses', 'behind', 'Allonbachuth', 'Bozrah', 'kill', 'red', 'commanding', 'stooped', 'bowels', 'shoulders', 'brethren', 'declare', 'Thou', 'Mesha', 'defiledst', 'bou', 'Shepho', 'Phara', 'tent', 'Eshban', 'hath', 'same', 'led', 'Egypt', 'myrrh', 'gathered', 'Jac', 'if', 'Come', 'bracelets', 'been', 'filled', 'heat', 'loud', 'doer', 'saidst', 'Tubalcain', 'abated', 'every', 'EleloheIsrael', 'discern', 'teeth', 'afar', 'husband', 'Cheran', 'colours', 'lifted', 'nine', 'Avith', 'mou', 'dress', 'erected', 'street', 'mountains', 'buy', 'Whose', 'Serug', 'hide', 'got', 'dwe', 'best', 'voice', 'rebelled', 'that', 'damsel', 'washed', 'Who', 'ass', 'Peradventure', 'foolishly', 'floor', 'Ethiopia', 'is', 'woman', 'shouldest', 'Isra', 'Galeed', 'greater', 'Kedar', 'Mibsam', 'firstborn', 'tower', 'closed', 'spoiled', 'seeing', 'goodly', 'Huppim', 'sle', 'Moab', 'Cana', 'established', 'rouse', 'old', 'light', 'nights', 'might', 'Mahanaim', 'sow', 'husbandman', 'passed', 'taken', 'provender', 'voi', 'servants', 'shekels', 'kindled', 'sheep', 'covenant', 'couch', 'Seir', 'seen', 'change', 'keeper', 'spoil', 'fat', 'hardly', 'dipped', 'interpreted', 'acknowledged', 'Peniel', 'Egyptia', 'Emins', 'dost', 'caused', 'Uzal', 'too', 'Get', 'perceived', 'appear', 'men', 'sist', 'tents', 'fou', 'honour', 'born', 'menservants', 'prophet', 'past', 'Shuah', 'scattered', 'heap', 'throughout', 'hind', 'li', 'and', 'glory', 'sweet', 'why', 'Shalem', 'lamb', 'Korah', 'Stand', 'twice', 'Jetheth', 'Moreover', 'his', 'lesser', 'suck', 'Caphtorim', 'spare', 'Beerlahairoi', 'oil', 'governor', 'door', 'interpreter', 'sacks', 'year', 'hills', 'Hazezontamar', 'organ', 'couching', 'Tebah', 'forget', 'Jamin', 'hang', 'lien', 'length', 'elder', 'Pildash', 'magnified', 'We', 'reach', 'violence', 'pulled', 'Hazo', 'gold', 'han', 'restored', 'youngest', 'themselves', 'worthy', 'bought', 'neither', 'build', 'Onam', 'hard', 'plains', 'millions', 'planted', 'talking', 'Din', 'instead', 'nor', 'boys', 'communing', 'forgotten', 'whales', 'window', 'prospered', 'held', 'bodies', 'hastily', 'Jobab', 'imagined', 'sell', 'saddled', 'slay', 'establish', 'creepeth', 'Ezer', 'Manasseh', 'waters', 'besought', 'discreet', 'round', 'Arise', 'seem', 'whether', 'Medan', 'eat', 'golden', 'openly', 'interpret', '(', 'nigh', 'feet', 'Hebron', 'quiver', 'hollow', 'exceeding', 'conspired', 'little', 'wild', 'Reu', 'answer', 'Day', 'peop', 'part', 'ha', 'Zibeon', 'cast', 'birthday', 'forgat', 'ground', 'Mizpah', 'tongue', 'Jerah', 'Shinab', 'on', 'firstlings', 'peaceable', 'removing', 'e', 'rulers', 'sin', 'gotten', 'my', 'truly', 'household', 'withheld', 'Hemdan', 'sevens', 'Some', 'husba', 'ride', 'childr', 'beside', 'Ephra', 'Eshcol', 'ought', 'Hinder', 'whatsoever', 'sorrow', 'fury', 'own', 'wombs', 'Ophir', 'rain', 'tillest', 'deal', 'escape', 'take', 'Twelve', 'worship', 'ever', 'wrapped', 'magicians', 'weig', 'tabret', 'yoke', 'them', 'hou', 'foreskin', 'If', 'skins', 'youth', 'Babel', 'devoured', 'folk', 'knife', 'repenteth', 'Moriah', 'dwell', 'Zebulun', 'chief', 'punishment', 'moveth', 'Elishah', 'wondering', 'by', 'heav', 'anoth', 'Discern', 'bird', 'discerned', 'Asshur', 'buryingplace', 'hearth', 'gather', 'bakers', 'conceal', 'rank', 'pleaseth', 'sporting', 'pitch', 'flaming', 'Hadar', 'lived', 'hired', 'departing', 'secret', 'eighty', 'gathering', 'father', 'Manahath', 'without', 'Wherefore', 'sun', 'bundles', 'wolf', 'Zerah', 'aga', 'serpent', 'Onan', 'bake', 'barr', 'hunter', 'war', 'Seeing', 'salvation', 'verified', 'counted', 'Woman', 'prosper', 'straitly', 'Becher', 'feeble', 'shoelatchet', 'songs', 'Nod', 'defiled', 'everlasting', 'Art', 'tender', 'told', 'knowing', 'smoke', 'Ashbel', 'Rebekah', 'room', 'stay', 'selfsame', 'Potipherah', 'very', 'a', 'sons', 'sinning', 'mules', 'whither', 'clean', 'knew', 'Jubal', 'chesnut', 'bade', 'former', 'could', 'carcases', 'season', 'maid', 'Neither', 'fail', 'till', 'Nineveh', 'meat', 'Kittim', 'Shelah', 'servant', 'physicians', 'would', 'slew', 'always', 'meanest', 'subtilty', 'midwife', 'Japhe', 'together', 'bulls', 'loose', 'dried', 'female', 'tenor', 'northward', 'alive', 'Zoar', 'trade', 'midst', 'While', 'royal', 'blessing', 'denied', 'kine', 'tiller', 'choice', 'officer', 'sixty', 'subtil', 'help', 'Zohar', 'childless', 'ourselves', 'longeth', 'seventeen', 'moving', 'Zo', 'child', 'Hamor', 'ki', 'tr', 'matter', 'better', 'sacrifices', 'four', 'stricken', 'Lot', 'with', 'beget', 'be', 'Do', 'vale', 'ford', 'sinew', 'hated', 'At', 'steal', 'Sered', 'bear', 'wise', 'sacrifice', 'content', 'horses', 'side', 'soul', 'lieth', 'fruits', 'overdrive', 'descending', 'Sabtech', 'either', 'mischief', 'store', 'maiden', 'Spirit', 'mount', 'Mahalaleel', 'raven', 'Shel', 'fallen', 'ours', 'curse', 'friend', 'so', 'speckled', 'Zidon', 'Mam', 'Gad', 'Areli', 'tithes', 'were', 'Where', 'Dishon', 'walk', 'dukes', 'what', 'given', 'bowing', 'pitcher', 'measures', 'harvest', 'some', 'Rehoboth', 'enter', 'rider', 'sinners', 'peace', 'lack', 'present', 'than', 'Temani', 'Abide', 'fellow', 'person', 'anything', 'goat', 'Mezahab', 'bondmen', 'worshipped', 'A', 'fishes', 'endure', 'ey', 'Nahor', 'replenish', 'hadst', 'between', 'Levi', 'there', 'depart', 'Magog', 'haven', 'tempt', 'rise', 'cubits', 'nurse', 'comforted', 'intreated', 'said', 'submit', 'falsely', 'wi', 'With', 'turned', 'ruled', 'lan', 'these', 'From', 'sojourn', 'began', 'called', 'sand', 'butlers', 'you', 'having', 'stalk', 'early', 'elders', 'meant', 'Zarah', 'thigh', 'consumed', 'eaten', 'south', 'continued', 'knead', 'Ziphion', 'Asher', 'Amalek', 'To', 'death', 'Reuel', 'wander', 'Shimron', 'Kedemah', 'provision', 'slimepits', 'Hobah', 'ruler', 'still', 'favour', 'commandments', 'searched', 'Ham', 'handle', 'house', 'fed', 'handmaids', 'sanctified', 'conception', 'mocking', 'wherewith', 'receive', 'dreadful', 'lives', 'continually', 'By', 'wrong', 'fill', 'naked', 'way', 'errand', 'pledge', 'lie', 'Am', 'mist', 'befell', 'spee', 'herb', 'pilled', 'guiding', 'fine', 'Assyr', 'parted', 'hasted', 'urged', 'path', 'Gerar', 'mistress', 'whoredom', 'stood', 'Kemuel', 'Naamah', 'anointedst', 'winter', 'spake', 'Togarmah', 'Mash', 'proceedeth', 'Should', 'none', 'Phallu', 'womenservan', 'bosom', 'mercies', 'Jeush', 'fell', 'bare', 'Dumah', 'mayest', 'Kadmonites', 'asses', 'Hast', 'Unto', 'willing', 'troughs', 'charged', 'honourable', 'delight', 'wrestled', 'covered', 'Let', 'ye', 'eldest', 'breach', 'thereof', 'Haran', 'next', 'sheddeth', 'goats', 'shaved', 'castles', 'Gihon', 'to', 'Bilhan', 'forgive', 'lo', 'few', 'priests', 'pea', 'Moabites', 'plagues', 'Tema', 'ste', 'deeds', 'beguiled', 'beneath', 'feast', 'sheaves', 'clo', 'Padan', 'Aran', 'prince', 'thousand', 'servan', 'pit', 'Whoso', 'Damascus', 'Enoch', 'Admah', 'praise', 'fainted', 'stolen', 'inhabited', 'power', 'ungirded', 'weighed', 'cut', 'himself', ';)', 'Naphish', 'generation', 'let', 'sole', 'shut', 'silv', 'meeteth', 'well', 'obeyed', 'herds', 'dwelling', 'spilled', 'appease', 'destroy', 'charge', 'breathed', 'fath', 'Two', 'fire', 'Amorites', 'blameless', 'shield', 'yearn', 'On', 'attained', 'Ashteroth', 'Peace', 'broth', 'desire', 'scatter', 'Spake', 'Ephrath', 'Shiloh', 'floc', 'pris', 'whensoever', 'Therefore', 'commune', 'onyx', 'Set', 'possession', 'Assyria', 'consent', 'I', 'removed', 'overtook', 'Kiriathaim', 'Duke', 'clusters', 'merciful', 'lodge', 'trees', 'gods', 'vow', 'ewe', 'least', 'This', 'nuts', 'dream', 'joined', 'Save', 'Reumah', 'forbid', 'betwixt', 'refrain', 'waited', 'wells', 'dreamer', 'curseth', 'pleasant', 'She', 'ring', 'long', 'Samlah', 'They', 'Amorite', 'honey', 'binding', 'enemies', 'commanded', 'Tell', 'foal', 'Laban', 'day', 'camel', 'Eri', 'Mibzar', 'countries', 'driven', 'heifer', 'command', 'budded', 'offerings', 'Ishbak', 'wherefore', 'goa', 'forward', 'cry', 'rest', 'seeth', 'wept', 'oak', 'Lehabim', 'Deborah', 'Hiddekel', 'under', 'pluckt', 'Abrah', 'conceive', 'thine', 'belly', 'Havilah', 'your', 'watered', 'Hamul', 'cities', 'breadth', 'good', 'seest', 'darkness', 'garment', 'communed', 'all', 'younge', 'gone', 'sakes', 'mead', 'horse', 'aloud', 'risen', 'drunken', 'sto', 'therein', 'wages', 'ashamed', 'rooms', 'Unstable', 'damsels', 'mightier', 'O', 'innocency', 'east', 'Hivite', 'played', 'droves', 'mourned', 'amongst', 'garments', 'seventeenth', 'Husham', 'shore', 'Eve', 'flesh', 'hear', 'leap', 'went', 'shear', 'money', 'burnt', 'Pathrusim', '!', 'armed', 'Sichem', 'nig', 'pilgrimage', 'liest', 'Pinon', 'met', 'deceived', 'nation', 'But', 'lads', 'may', 'understood', 'places', 'roughly', 'togeth', 'any', 'strive', 'corrupted', 'Our', 'bondwoman', 'reached', 'Melchizedek', 'find', 'love', 'altogether', 'Timna', 'brink', 'grisl', 'Thirty', 'saved', 'Magdiel', 'fair', 'forty', 'terror', 'abundantly', 'Rephaims', 'excel', 'Shuni', 'oa', 'whomsoever', 'dunge', 'only', 'out', ...}
sorted(set(text3))
['!', "'", '(', ')', ',', ',)', '.', '.)', ':', ';', ';)', '?', '?)', 'A', 'Abel', 'Abelmizraim', 'Abidah', 'Abide', 'Abimael', 'Abimelech', 'Abr', 'Abrah', 'Abraham', 'Abram', 'Accad', 'Achbor', 'Adah', 'Adam', 'Adbeel', 'Admah', 'Adullamite', 'After', 'Aholibamah', 'Ahuzzath', 'Ajah', 'Akan', 'All', 'Allonbachuth', 'Almighty', 'Almodad', 'Also', 'Alvah', 'Alvan', 'Am', 'Amal', 'Amalek', 'Amalekites', 'Ammon', 'Amorite', 'Amorites', 'Amraphel', 'An', 'Anah', 'Anamim', 'And', 'Aner', 'Angel', 'Appoint', 'Aram', 'Aran', 'Ararat', 'Arbah', 'Ard', 'Are', 'Areli', 'Arioch', 'Arise', 'Arkite', 'Arodi', 'Arphaxad', 'Art', 'Arvadite', 'As', 'Asenath', 'Ashbel', 'Asher', 'Ashkenaz', 'Ashteroth', 'Ask', 'Asshur', 'Asshurim', 'Assyr', 'Assyria', 'At', 'Atad', 'Avith', 'Baalhanan', 'Babel', 'Bashemath', 'Be', 'Because', 'Becher', 'Bedad', 'Beeri', 'Beerlahairoi', 'Beersheba', 'Behold', 'Bela', 'Belah', 'Benam', 'Benjamin', 'Beno', 'Beor', 'Bera', 'Bered', 'Beriah', 'Bethel', 'Bethlehem', 'Bethuel', 'Beware', 'Bilhah', 'Bilhan', 'Binding', 'Birsha', 'Bless', 'Blessed', 'Both', 'Bow', 'Bozrah', 'Bring', 'But', 'Buz', 'By', 'Cain', 'Cainan', 'Calah', 'Calneh', 'Can', 'Cana', 'Canaan', 'Canaanite', 'Canaanites', 'Canaanitish', 'Caphtorim', 'Carmi', 'Casluhim', 'Cast', 'Cause', 'Chaldees', 'Chedorlaomer', 'Cheran', 'Cherubims', 'Chesed', 'Chezib', 'Come', 'Cursed', 'Cush', 'Damascus', 'Dan', 'Day', 'Deborah', 'Dedan', 'Deliver', 'Diklah', 'Din', 'Dinah', 'Dinhabah', 'Discern', 'Dishan', 'Dishon', 'Do', 'Dodanim', 'Dothan', 'Drink', 'Duke', 'Dumah', 'Earth', 'Ebal', 'Eber', 'Edar', 'Eden', 'Edom', 'Edomites', 'Egy', 'Egypt', 'Egyptia', 'Egyptian', 'Egyptians', 'Ehi', 'Elah', 'Elam', 'Elbethel', 'Eldaah', 'EleloheIsrael', 'Eliezer', 'Eliphaz', 'Elishah', 'Ellasar', 'Elon', 'Elparan', 'Emins', 'En', 'Enmishpat', 'Eno', 'Enoch', 'Enos', 'Ephah', 'Epher', 'Ephra', 'Ephraim', 'Ephrath', 'Ephron', 'Er', 'Erech', 'Eri', 'Es', 'Esau', 'Escape', 'Esek', 'Eshban', 'Eshcol', 'Ethiopia', 'Euphrat', 'Euphrates', 'Eve', 'Even', 'Every', 'Except', 'Ezbon', 'Ezer', 'Fear', 'Feed', 'Fifteen', 'Fill', 'For', 'Forasmuch', 'Forgive', 'From', 'Fulfil', 'G', 'Gad', 'Gaham', 'Galeed', 'Gatam', 'Gather', 'Gaza', 'Gentiles', 'Gera', 'Gerar', 'Gershon', 'Get', 'Gether', 'Gihon', 'Gilead', 'Girgashites', 'Girgasite', 'Give', 'Go', 'God', 'Gomer', 'Gomorrah', 'Goshen', 'Guni', 'Hadad', 'Hadar', 'Hadoram', 'Hagar', 'Haggi', 'Hai', 'Ham', 'Hamathite', 'Hamor', 'Hamul', 'Hanoch', 'Happy', 'Haran', 'Hast', 'Haste', 'Have', 'Havilah', 'Hazarmaveth', 'Hazezontamar', 'Hazo', 'He', 'Hear', 'Heaven', 'Heber', 'Hebrew', 'Hebrews', 'Hebron', 'Hemam', 'Hemdan', 'Here', 'Hereby', 'Heth', 'Hezron', 'Hiddekel', 'Hinder', 'Hirah', 'His', 'Hitti', 'Hittite', 'Hittites', 'Hivite', 'Hobah', 'Hori', 'Horite', 'Horites', 'How', 'Hul', 'Huppim', 'Husham', 'Hushim', 'Huz', 'I', 'If', 'In', 'Irad', 'Iram', 'Is', 'Isa', 'Isaac', 'Iscah', 'Ishbak', 'Ishmael', 'Ishmeelites', 'Ishuah', 'Isra', 'Israel', 'Issachar', 'Isui', 'It', 'Ithran', 'Jaalam', 'Jabal', 'Jabbok', 'Jac', 'Jachin', 'Jacob', 'Jahleel', 'Jahzeel', 'Jamin', 'Japhe', 'Japheth', 'Jared', 'Javan', 'Jebusite', 'Jebusites', 'Jegarsahadutha', 'Jehovahjireh', 'Jemuel', 'Jerah', 'Jetheth', 'Jetur', 'Jeush', 'Jezer', 'Jidlaph', 'Jimnah', 'Job', 'Jobab', 'Jokshan', 'Joktan', 'Jordan', 'Joseph', 'Jubal', 'Judah', 'Judge', 'Judith', 'Kadesh', 'Kadmonites', 'Karnaim', 'Kedar', 'Kedemah', 'Kemuel', 'Kenaz', 'Kenites', 'Kenizzites', 'Keturah', 'Kiriathaim', 'Kirjatharba', 'Kittim', 'Know', 'Kohath', 'Kor', 'Korah', 'LO', 'LORD', 'Laban', 'Lahairoi', 'Lamech', 'Lasha', 'Lay', 'Leah', 'Lehabim', 'Lest', 'Let', 'Letushim', 'Leummim', 'Levi', 'Lie', 'Lift', 'Lo', 'Look', 'Lot', 'Lotan', 'Lud', 'Ludim', 'Luz', 'Maachah', 'Machir', 'Machpelah', 'Madai', 'Magdiel', 'Magog', 'Mahalaleel', 'Mahalath', 'Mahanaim', 'Make', 'Malchiel', 'Male', 'Mam', 'Mamre', 'Man', 'Manahath', 'Manass', 'Manasseh', 'Mash', 'Masrekah', 'Massa', 'Matred', 'Me', 'Medan', 'Mehetabel', 'Mehujael', 'Melchizedek', 'Merari', 'Mesha', 'Meshech', 'Mesopotamia', 'Methusa', 'Methusael', 'Methuselah', 'Mezahab', 'Mibsam', 'Mibzar', 'Midian', 'Midianites', 'Milcah', 'Mishma', 'Mizpah', 'Mizraim', 'Mizz', 'Moab', 'Moabites', 'Moreh', 'Moreover', 'Moriah', 'Muppim', 'My', 'Naamah', 'Naaman', 'Nahath', 'Nahor', 'Naphish', 'Naphtali', 'Naphtuhim', 'Nay', 'Nebajoth', 'Neither', 'Night', 'Nimrod', 'Nineveh', 'Noah', 'Nod', 'Not', 'Now', 'O', 'Obal', 'Of', 'Oh', 'Ohad', 'Omar', 'On', 'Onam', 'Onan', 'Only', 'Ophir', 'Our', 'Out', 'Padan', 'Padanaram', 'Paran', 'Pass', 'Pathrusim', 'Pau', 'Peace', 'Peleg', 'Peniel', 'Penuel', 'Peradventure', 'Perizzit', 'Perizzite', 'Perizzites', 'Phallu', 'Phara', 'Pharaoh', 'Pharez', 'Phichol', 'Philistim', 'Philistines', 'Phut', 'Phuvah', 'Pildash', 'Pinon', 'Pison', 'Potiphar', 'Potipherah', 'Put', 'Raamah', 'Rachel', 'Rameses', 'Rebek', 'Rebekah', 'Rehoboth', 'Remain', 'Rephaims', 'Resen', 'Return', 'Reu', 'Reub', 'Reuben', 'Reuel', 'Reumah', 'Riphath', 'Rosh', 'Sabtah', 'Sabtech', 'Said', 'Salah', 'Salem', 'Samlah', 'Sarah', 'Sarai', 'Saul', 'Save', 'Say', 'Se', 'Seba', 'See', 'Seeing', 'Seir', 'Sell', 'Send', 'Sephar', 'Serah', 'Sered', 'Serug', 'Set', 'Seth', 'Shalem', 'Shall', 'Shalt', 'Shammah', 'Shaul', 'Shaveh', 'She', 'Sheba', 'Shebah', 'Shechem', 'Shed', 'Shel', 'Shelah', 'Sheleph', 'Shem', 'Shemeber', 'Shepho', 'Shillem', 'Shiloh', 'Shimron', 'Shinab', 'Shinar', 'Shobal', 'Should', 'Shuah', 'Shuni', 'Shur', 'Sichem', 'Siddim', 'Sidon', 'Simeon', 'Sinite', 'Sitnah', 'Slay', 'So', 'Sod', 'Sodom', 'Sojourn', 'Some', 'Spake', 'Speak', 'Spirit', 'Stand', 'Succoth', 'Surely', 'Swear', 'Syrian', 'Take', 'Tamar', 'Tarshish', 'Tebah', 'Tell', 'Tema', 'Teman', 'Temani', 'Terah', 'Thahash', 'That', 'The', 'Then', 'There', 'Therefore', 'These', 'They', 'Thirty', 'This', 'Thorns', 'Thou', 'Thus', 'Thy', 'Tidal', 'Timna', 'Timnah', 'Timnath', 'Tiras', 'To', 'Togarmah', 'Tola', 'Tubal', 'Tubalcain', 'Twelve', 'Two', 'Unstable', 'Until', 'Unto', 'Up', 'Upon', 'Ur', 'Uz', 'Uzal', 'We', 'What', 'When', 'Whence', 'Where', 'Whereas', 'Wherefore', 'Which', 'While', 'Who', 'Whose', 'Whoso', 'Why', 'Wilt', 'With', 'Woman', 'Ye', 'Yea', 'Yet', 'Zaavan', 'Zaphnathpaaneah', 'Zar', 'Zarah', 'Zeboiim', 'Zeboim', 'Zebul', 'Zebulun', 'Zemarite', 'Zepho', 'Zerah', 'Zibeon', 'Zidon', 'Zillah', 'Zilpah', 'Zimran', 'Ziphion', 'Zo', 'Zoar', 'Zohar', 'Zuzims', 'a', 'abated', 'abide', 'able', 'abode', 'abomination', 'about', 'above', 'abroad', 'absent', 'abundantly', 'accept', 'accepted', 'according', 'acknowledged', 'activity', 'add', 'adder', 'afar', 'afflict', 'affliction', 'afraid', 'after', 'afterward', 'afterwards', 'aga', 'again', 'against', 'age', 'aileth', 'air', 'al', 'alive', 'all', 'almon', 'alo', 'alone', 'aloud', 'also', 'altar', 'altogether', 'always', 'am', 'among', 'amongst', 'an', 'and', 'angel', 'angels', 'anger', 'angry', 'anguish', 'anointedst', 'anoth', 'another', 'answer', 'answered', 'any', 'anything', 'appe', 'appear', 'appeared', 'appease', 'appoint', 'appointed', 'aprons', 'archer', 'archers', 'are', 'arise', 'ark', 'armed', 'arms', 'army', 'arose', 'arrayed', 'art', 'artificer', 'as', 'ascending', 'ash', 'ashamed', 'ask', 'asked', 'asketh', 'ass', 'assembly', 'asses', 'assigned', 'asswaged', 'at', 'attained', 'audience', 'avenged', 'aw', 'awaked', 'away', 'awoke', 'back', 'backward', 'bad', 'bade', 'badest', 'badne', 'bak', 'bake', 'bakemeats', 'baker', 'bakers', 'balm', 'bands', 'bank', 'bare', 'barr', 'barren', 'basket', 'baskets', 'battle', 'bdellium', 'be', 'bear', 'beari', 'bearing', 'beast', 'beasts', 'beautiful', 'became', 'because', 'become', 'bed', 'been', 'befall', 'befell', 'before', 'began', 'begat', 'beget', 'begettest', 'begin', 'beginning', 'begotten', 'beguiled', 'beheld', 'behind', 'behold', 'being', 'believed', 'belly', 'belong', 'beneath', 'bereaved', 'beside', 'besides', 'besought', 'best', 'betimes', 'better', 'between', 'betwixt', 'beyond', 'binding', 'bird', 'birds', 'birthday', 'birthright', 'biteth', 'bitter', 'blame', 'blameless', 'blasted', 'bless', 'blessed', 'blesseth', 'blessi', 'blessing', 'blessings', 'blindness', 'blood', 'blossoms', 'bodies', 'boldly', 'bondman', 'bondmen', 'bondwoman', 'bone', 'bones', 'book', 'booths', 'border', 'borders', 'born', 'bosom', 'both', 'bottle', 'bou', 'boug', 'bough', 'bought', 'bound', 'bow', 'bowed', 'bowels', 'bowing', 'boys', 'bracelets', 'branches', 'brass', 'bre', 'breach', 'bread', 'breadth', 'break', 'breaketh', 'breaking', 'breasts', 'breath', 'breathed', 'breed', 'brethren', 'brick', 'brimstone', 'bring', 'brink', 'broken', 'brook', 'broth', 'brother', 'brought', 'brown', 'bruise', 'budded', 'build', 'builded', 'built', 'bulls', 'bundle', 'bundles', 'burdens', 'buried', 'burn', 'burning', 'burnt', 'bury', 'buryingplace', 'business', 'but', 'butler', 'butlers', 'butlership', 'butter', 'buy', 'by', 'cakes', 'calf', 'call', 'called', 'came', 'camel', 'camels', 'camest', 'can', 'cannot', 'canst', 'captain', 'captive', 'captives', 'carcases', 'carried', 'carry', 'cast', 'castles', 'catt', 'cattle', 'caught', 'cause', 'caused', 'cave', 'cease', 'ceased', 'certain', 'certainly', 'chain', 'chamber', 'change', 'changed', 'changes', 'charge', 'charged', 'chariot', 'chariots', 'chesnut', 'chi', 'chief', 'child', 'childless', 'childr', 'children', 'chode', 'choice', 'chose', 'circumcis', 'circumcise', 'circumcised', 'citi', 'cities', 'city', 'clave', 'clean', 'clear', 'cleave', 'clo', 'closed', 'clothed', 'clothes', 'cloud', 'clusters', 'co', 'coat', 'coats', 'coffin', 'cold', ...]
len(set(text3))
2789
from_future import division
File "/tmp/ipykernel_6842/3921596077.py", line 1 from_future import division ^ SyntaxError: invalid syntax
len(text3) / len(set(text3))
16.050197203298673
text3.count("smote")
5
100 * text4.count('a') / len(text4)
1.457973123627309
text5.count("lol")
704
100 * text5.count('lol')/ len(text5)
1.5640968673628082
def lexical_diversity(text):
return len(text) / len(set(text))
def percentage(count, total):
return 100 * count / total
def lexical_diversity(text):
... return len(text) / len(set(text))
...
lexical_diversity(text3)
16.050197203298673
lexical_diversity(text5)
7.420046158918563
percentage(4, 5)
80.0
percentage(text4.count('a'), len(text4))
1.457973123627309
sent1 = ['Call', 'me', 'Ishmael', '.']
sent1
['Call', 'me', 'Ishmael', '.']
len(sent1)
4
lexical_diversity(sent1)
1.0
sent2
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.']
sent3
['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
mysent = ['Smell', 'perfume', 'parfum', 'and', 'one', 'who', 'stink', '.']
mysent
['Smell', 'perfume', 'parfum', 'and', 'one', 'who', 'stink', '.']
sorted(mysent)
['.', 'Smell', 'and', 'one', 'parfum', 'perfume', 'stink', 'who']
len(mysent)
8
100 * mysent.count('Smell') / len(mysent)
12.5
lexical_diversity(mysent)
1.0
percentage(mysent.count('Smell'), len(mysent))
12.5
mysent + sent2
['Smell', 'perfume', 'parfum', 'and', 'one', 'who', 'stink', '.', 'The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.']
mysent.append("bee")
mysent
['Smell', 'perfume', 'parfum', 'and', 'one', 'who', 'stink', '.', 'bee']
text1.count('heaven')
40
text6[238]
'?'
text4[173]
'awaken'
text4.index('awaken')
173
text5[16715:16735]
['U86', 'thats', 'why', 'something', 'like', 'gamefly', 'is', 'so', 'good', 'because', 'you', 'can', 'actually', 'play', 'a', 'full', 'game', 'without', 'buying', 'it']
text6[1600:1625]
['We', "'", 're', 'an', 'anarcho', '-', 'syndicalist', 'commune', '.', 'We', 'take', 'it', 'in', 'turns', 'to', 'act', 'as', 'a', 'sort', 'of', 'executive', 'officer', 'for', 'the', 'week']
sent = ['word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8', 'word9', 'word10']
sent[0]
'word1'
sent[9]
'word10'
sent[10]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) /tmp/ipykernel_6842/3463783193.py in <module> ----> 1 sent[10] IndexError: list index out of range
sent[5:8]
['word6', 'word7', 'word8']
sent[5]
'word6'
sent[6]
'word7'
sent[7]
'word8'
sent[:3]
['word1', 'word2', 'word3']
sent[5:]
['word6', 'word7', 'word8', 'word9', 'word10']
text2[141525:]
['among', 'the', 'merits', 'and', 'the', 'happiness', 'of', 'Elinor', 'and', 'Marianne', ',', 'let', 'it', 'not', 'be', 'ranked', 'as', 'the', 'least', 'considerable', ',', 'that', 'though', 'sisters', ',', 'and', 'living', 'almost', 'within', 'sight', 'of', 'each', 'other', ',', 'they', 'could', 'live', 'without', 'disagreement', 'between', 'themselves', ',', 'or', 'producing', 'coolness', 'between', 'their', 'husbands', '.', 'THE', 'END']
sent[0] = 'First'
sent[9] = 'Last'
len(sent)
10
sent[1:9] = ['Second', 'Third']
sent
['First', 'Second', 'Third', 'Last']
sent[9]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) /tmp/ipykernel_6842/2416174415.py in <module> ----> 1 sent[9] IndexError: list index out of range
mysent[2:]
['parfum', 'and', 'one', 'who', 'stink', '.', 'bee']
len(mysent)
9
mysent[10]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) /tmp/ipykernel_6842/3346719197.py in <module> ----> 1 mysent[10] IndexError: list index out of range
mysent
['Smell', 'perfume', 'parfum', 'and', 'one', 'who', 'stink', '.', 'bee']
mysent[6]
'stink'
mysent[6]= 'stinks'
mysent
['Smell', 'perfume', 'parfum', 'and', 'one', 'who', 'stinks', '.', 'bee']
sent1
['Call', 'me', 'Ishmael', '.']
my_sent = ['Bravely', 'bold', 'Sir', 'Robin', ',', 'rode', 'forth', ..., 'from', 'Camelot', '.']
noun_phrase = my_sent[1:4]
noun_phrase
['bold', 'Sir', 'Robin']
wOrDs = sorted (noun_phrase)
wOrDs
['Robin', 'Sir', 'bold']
one = 'two'
two = 3
not = 'Camelot'
File "/tmp/ipykernel_6842/3303447762.py", line 1 not = 'Camelot' ^ SyntaxError: invalid syntax
len(set(text1))
19317
vocab = set(text1)
vocab_size = len(vocab)
vocab_size
19317
123abc = 'bold'
File "/tmp/ipykernel_6842/636884727.py", line 1 123abc = 'bold' ^ SyntaxError: invalid syntax
abc123 = 'bold'
abc123
'bold'
'bold'
'bold'
name = 'Monty' #assign string to variable
name[0] #index a string
'M'
name[:4] #slice a string
'Mont'
name * 2
'MontyMonty'
name + '!'
'Monty!'
name + ' !'
'Monty !'
'?' + name + '!'
'?Monty!'
'My ' + name + ' Python!'
'My Monty Python!'
' '.join(['Monty', 'Python']) #join words of a list to make a string
'Monty Python'
'Monty Python'.split() #split string into list
['Monty', 'Python']
'My Monty Python!'.split()
['My', 'Monty', 'Python!']
saying = ['After', 'all', 'is', 'said', 'and', 'done',
... 'more', 'is', 'said', 'than', 'done']
tokens = set(saying)
tokens = sorted(tokens)
sorted(tokens)
['After', 'all', 'and', 'done', 'is', 'more', 'said', 'than']
tokens[-2:]
['said', 'than']
tokens[-3]
'more'
tokens[0:-3]
['After', 'all', 'and', 'done', 'is']
from nltk.book import *
fdist1 = FreqDist(text1)
fdist1
FreqDist({',': 18713, 'the': 13721, '.': 6862, 'of': 6536, 'and': 6024, 'a': 4569, 'to': 4542, ';': 4072, 'in': 3916, 'that': 2982, ...})
vocabulary1 = fdist1.keys()
vocabulary1[:50]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_6842/3720618040.py in <module> ----> 1 vocabulary1[:50] TypeError: 'dict_keys' object is not subscriptable
fdist1['whale']
906
fdist2 = FreqDist(text2)
fdist2
FreqDist({',': 9397, 'to': 4063, '.': 3975, 'the': 3861, 'of': 3565, 'and': 3350, 'her': 2436, 'a': 2043, 'I': 2004, 'in': 1904, ...})
vocabulary2 = fdist2.keys()
vocabulary2[:50]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_6842/119774201.py in <module> ----> 1 vocabulary2[:50] TypeError: 'dict_keys' object is not subscriptable
fdist2['she']
1333
fdist1.plot(50, cumulative=True)
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/numpy/core/__init__.py in <module> 21 try: ---> 22 from . import multiarray 23 except ImportError as exc: /usr/local/lib/python3.7/dist-packages/numpy/core/multiarray.py in <module> 11 ---> 12 from . import overrides 13 from . import _multiarray_umath /usr/local/lib/python3.7/dist-packages/numpy/core/overrides.py in <module> 6 ----> 7 from numpy.core._multiarray_umath import ( 8 add_docstring, implement_array_function, _get_implementing_args) ImportError: libf77blas.so.3: cannot open shared object file: No such file or directory During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/nltk/probability.py in plot(self, title, cumulative, percents, show, *args, **kwargs) 267 try: --> 268 import matplotlib.pyplot as plt 269 except ImportError as e: /usr/local/lib/python3.7/dist-packages/matplotlib/__init__.py in <module> 106 # definitions, so it is safe to import from it here. --> 107 from . import _api, cbook, docstring, rcsetup 108 from matplotlib.cbook import MatplotlibDeprecationWarning, sanitize_sequence /usr/local/lib/python3.7/dist-packages/matplotlib/cbook/__init__.py in <module> 27 ---> 28 import numpy as np 29 /usr/local/lib/python3.7/dist-packages/numpy/__init__.py in <module> 149 --> 150 from . import core 151 from .core import * /usr/local/lib/python3.7/dist-packages/numpy/core/__init__.py in <module> 47 __version__, exc) ---> 48 raise ImportError(msg) 49 finally: ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.7 from "/usr/bin/python3" * The NumPy version is: "1.21.2" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: libf77blas.so.3: cannot open shared object file: No such file or directory The above exception was the direct cause of the following exception: ValueError Traceback (most recent call last) /tmp/ipykernel_6842/835507351.py in <module> ----> 1 fdist1.plot(50, cumulative=True) /usr/local/lib/python3.7/dist-packages/nltk/probability.py in plot(self, title, cumulative, percents, show, *args, **kwargs) 271 "The plot function requires matplotlib to be installed." 272 "See http://matplotlib.org/" --> 273 ) from e 274 275 if len(args) == 0: ValueError: The plot function requires matplotlib to be installed.See http://matplotlib.org/
fdist1.hapaxes()
['Herman', 'Melville', ']', 'ETYMOLOGY', 'Late', 'Consumptive', 'School', 'threadbare', 'lexicons', 'mockingly', 'flags', 'mortality', 'signification', 'HACKLUYT', 'Sw', 'HVAL', 'roundness', 'Dut', 'Ger', 'WALLEN', 'WALW', 'IAN', 'RICHARDSON', 'KETOS', 'GREEK', 'CETUS', 'LATIN', 'WHOEL', 'ANGLO', 'SAXON', 'WAL', 'HWAL', 'SWEDISH', 'ICELANDIC', 'BALEINE', 'BALLENA', 'FEGEE', 'ERROMANGOAN', 'Librarian', 'painstaking', 'burrower', 'grub', 'Vaticans', 'stalls', 'higgledy', 'piggledy', 'gospel', 'promiscuously', 'commentator', 'belongest', 'sallow', 'Pale', 'Sherry', 'loves', 'bluntly', 'Subs', 'thankless', 'Hampton', 'Court', 'hie', 'refugees', 'pampered', 'Michael', 'Raphael', 'unsplinterable', 'GENESIS', 'JOB', 'JONAH', 'punish', 'ISAIAH', 'soever', 'cometh', 'incontinently', 'perisheth', 'PLUTARCH', 'MORALS', 'breedeth', 'Whirlpooles', 'Balaene', 'arpens', 'PLINY', 'Scarcely', 'TOOKE', 'LUCIAN', 'TRUE', 'catched', 'OCTHER', 'VERBAL', 'TAKEN', 'MOUTH', 'ALFRED', '890', 'gudgeon', 'retires', 'MONTAIGNE', 'APOLOGY', 'RAIMOND', 'SEBOND', 'Nick', 'RABELAIS', 'cartloads', 'STOWE', 'ANNALS', 'LORD', 'BACON', 'Touching', 'ork', 'DEATH', 'sovereignest', 'bruise', 'HAMLET', 'leach', 'Mote', 'availle', 'returne', 'againe', 'worker', 'Dinting', 'paine', 'thro', 'maine', 'FAERIE', 'Immense', 'til', 'DAVENANT', 'PREFACE', 'GONDIBERT', 'spermacetti', 'Hosmannus', 'Nescio', 'VIDE', 'Spencer', 'Talus', 'flail', 'threatens', 'jav', 'lins', 'WALLER', 'SUMMER', 'ISLANDS', 'Commonwealth', 'Civitas', 'OPENING', 'SENTENCE', 'HOBBES', 'LEVIATHAN', 'Silly', 'Mansoul', 'chewing', 'sprat', 'PILGRIM', 'PROGRESS', 'Created', 'PARADISE', 'LOST', '---"', 'Hugest', 'Stretched', 'Draws', 'FULLLER', 'PROFANE', 'HOLY', 'STATE', 'DRYDEN', 'ANNUS', 'MIRABILIS', 'aground', 'EDGE', 'TEN', 'SPITZBERGEN', 'PURCHAS', 'wantonness', 'fuzzing', 'vents', 'HERBERT', 'INTO', 'ASIA', 'AFRICA', 'SCHOUTEN', 'SIXTH', 'CIRCUMNAVIGATION', 'Elbe', 'ducat', 'herrings', 'GREENLAND', 'Several', 'Fife', 'Anno', '1652', 'Pitferren', 'SIBBALD', 'FIFE', 'KINROSS', 'Myself', 'Sperma', 'ceti', 'fierceness', 'RICHARD', 'STRAFFORD', 'LETTER', 'BERMUDAS', 'PHIL', 'TRANS', '1668', 'PRIMER', 'COWLEY', '1729', '"...', 'frequendy', 'insupportable', 'disorder', 'ULLOA', 'SOUTH', 'AMERICA', 'sylphs', 'petticoat', 'Oft', 'Tho', 'RAPE', 'LOCK', 'NAT', 'wales', 'JOHNSON', 'COOK', 'dung', 'lime', 'juniper', 'UNO', 'VON', 'TROIL', 'LETTERS', 'BANKS', 'SOLANDER', '1772', 'Nantuckois', 'JEFFERSON', 'MEMORIAL', 'MINISTER', 'REFERENCE', 'PARLIAMENT', 'SOMEWHERE', 'guarding', 'protecting', 'robbers', 'BLACKSTONE', 'Rodmond', 'suspends', 'attends', 'FALCONER', 'Bright', 'roofs', 'domes', 'rockets', 'Around', 'unwieldy', 'COWPER', 'VISIT', 'LONDON', 'HUNTER', 'DISSECTION', 'SMALL', 'SIZED', 'aorta', 'gushing', 'PALEY', 'THEOLOGY', 'mammiferous', 'hind', 'BARON', 'CUVIER', 'COLNETT', 'PURPOSE', 'EXTENDING', 'SPERMACETI', 'Floundered', 'chace', 'peopling', 'Gather', 'Led', 'instincts', 'trackless', 'Assaulted', 'voracious', 'spiral', 'MONTGOMERY', 'WORLD', 'FLOOD', 'Paean', 'fatter', 'Flounders', 'CHARLES', 'LAMB', 'TRIUMPH', '1690', 'OBED', 'Susan', 'HAWTHORNE', 'TWICE', 'bespeak', 'raal', 'COOPER', 'PILOT', 'Berlin', 'Gazette', 'ECKERMANN', 'CONVERSATIONS', 'GOETHE', 'ESSEX', 'WAS', 'ATTACKED', 'FINALLY', 'DESTROYED', 'OWEN', 'CHACE', 'FIRST', 'SAID', 'VESSEL', 'YORK', '1821', 'piping', 'dimmed', 'phospher', 'ELIZABETH', 'OAKES', 'SMITH', 'amounted', '440', 'SCORESBY', 'Mad', 'agonies', 'endures', 'infuriated', 'rears', 'snaps', 'propelled', 'observers', 'opportunities', 'habitudes', 'BEALE', 'offensively', 'artful', 'mischievous', 'FREDERICK', 'DEBELL', '1840', 'October', 'Raise', 'ay', 'THAR', 'bowes', 'os', 'ROSS', 'ETCHINGS', 'CRUIZE', '1846', 'Globe', 'transactions', 'relate', 'HUSSEY', 'SURVIVORS', 'parried', 'MISSIONARY', 'JOURNAL', 'TYERMAN', 'boldest', 'persevering', 'REPORT', 'DANIEL', 'SPEECH', 'SENATE', 'APPLICATION', 'ERECTION', 'BREAKWATER', 'CAPTORS', 'WHALEMAN', 'ADVENTURES', 'BIOGRAPHY', 'GATHERED', 'HOMEWARD', 'COMMODORE', 'PREBLE', 'REV', 'CHEEVER', 'MUTINEER', 'BROTHER', 'ANOTHER', 'MCCULLOCH', 'COMMERCIAL', 'reciprocal', 'clews', 'SOMETHING', 'UNPUBLISHED', 'CURRENTS', 'Pedestrians', 'recollect', 'gateways', 'VOYAGER', 'ARCTIC', 'NEWSPAPER', 'TAKING', 'RETAKING', 'HOBOMACK', 'MIRIAM', 'FISHERMAN', 'appliance', 'RIBS', 'TRUCKS', 'Terra', 'Del', 'Fuego', 'DARWIN', 'NATURALIST', ";--'", '!\'"', 'WHARTON', 'Loomings', 'spleen', 'regulating', 'circulation', 'Whenever', 'drizzly', 'hypos', 'philosophical', 'Cato', 'Manhattoes', 'reefs', 'downtown', 'gazers', 'Circumambulate', 'Corlears', 'Coenties', 'Slip', 'Whitehall', 'Posted', 'sentinels', 'spiles', 'pier', 'lath', 'counters', 'desks', 'loitering', 'shady', 'Inlanders', 'lanes', 'alleys', 'attract', 'dale', 'dreamiest', 'shadiest', 'quietest', 'enchanting', 'Saco', 'crucifix', 'Deep', 'mazy', 'Tiger', 'Tennessee', 'Rockaway', 'Persians', 'deity', 'Narcissus', 'ungraspable', 'hazy', 'quarrelsome', 'offices', 'abominate', 'toils', 'trials', 'barques', 'schooners', 'broiling', 'buttered', 'judgmatically', 'peppered', 'reverentially', 'idolatrous', 'dotings', 'ibis', 'roasted', 'bake', 'plumb', 'Van', 'Rensselaers', 'Randolphs', 'Hardicanutes', 'lording', 'tallest', 'decoction', 'Seneca', 'Stoics', 'Testament', 'promptly', 'rub', 'infliction', 'BEING', 'PAID', 'urbane', 'ills', 'monied', 'consign', 'prevalent', 'violate', 'Pythagorean', 'commonalty', 'police', 'surveillance', 'programme', 'solo', 'CONTESTED', 'ELECTION', 'PRESIDENCY', 'UNITED', 'STATES', 'ISHMAEL', 'BLOODY', 'AFFGHANISTAN', 'managers', 'genteel', 'comedies', 'farces', 'cunningly', 'disguises', 'cajoling', 'unbiased', 'freewill', 'discriminating', 'overwhelming', 'undeliverable', 'itch', 'forbidden', 'ignoring', 'lodges', 'Carpet', 'Bag', 'Manhatto', 'candidates', 'penalties', 'Tyre', 'Carthage', 'imported', 'cobblestones', 'bitingly', 'shouldering', 'price', 'fervent', 'asphaltic', 'pavement', 'flinty', 'projections', 'soles', 'Too', 'cheapest', 'cheeriest', 'invitingly', 'particles', 'peer', 'Angel', 'Doom', 'wailing', 'gnashing', 'Wretched', 'entertainment', 'Moving', 'emigrant', 'poverty', 'creak', 'lodgings', 'zephyr', 'hob', 'toasting', 'observest', 'sashless', 'glazier', 'reasonest', 'chinks', 'crannies', 'lint', 'chattering', 'shiverings', 'cob', 'redder', 'Orion', 'glitters', 'conservatories', 'president', 'temperance', 'blubbering', 'straggling', 'wainscots', 'reminding', 'oilpainting', 'besmoked', 'defaced', 'unequal', 'crosslights', 'hags', 'delineate', 'bewitched', 'ponderings', 'boggy', 'soggy', 'squitchy', 'froze', 'heath', 'icebound', 'represents', 'Horner', 'foundered', 'clubs', 'harvesting', 'hacking', 'horrifying', 'Mixed', 'Nathan', 'Swain', 'corkscrew', 'Blanco', 'sojourning', 'fireplaces', 'duskier', 'cockpits', 'rarities', 'Projecting', 'Within', 'shelves', 'flasks', 'bustles', 'deliriums', 'Abominable', 'tumblers', 'cylinders', 'goggling', 'deceitfully', 'tapered', 'Parallel', 'pecked', 'footpads', 'Fill', 'shilling', 'examining', 'SKRIMSHANDER', 'accommodated', 'unoccupied', 'haint', 'pose', 'whalin', 'decidedly', 'objectionable', 'wander', 'Battery', 'ruminating', 'adorning', 'potatoes', 'sartainty', 'diabolically', 'steaks', 'undress', 'looker', 'rioting', 'Grampus', 'seed', 'Feegees', 'tramping', 'Enveloped', 'bedarned', 'eruption', 'officiating', 'brimmers', 'complained', 'potion', 'colds', 'catarrhs', 'liquor', 'arrantest', 'topers', 'obstreperously', 'aloof', 'desirous', 'hilarity', 'coffer', 'Southerner', 'mountaineers', 'Alleghanian', 'missed', 'supernaturally', 'congratulate', 'multiply', 'bachelor', 'abominated', 'tidiest', 'bedwards', 'shan', 'tablecloth', 'Skrimshander', 'bump', 'spraining', 'eider', 'yoking', 'rickety', 'whirlwinds', 'knockings', 'dismissed', 'popped', 'cherishing', 'chuckled', 'chuckle', 'mightily', 'catches', 'bamboozingly', 'overstocked', 'toothpick', 'rayther', 'BROWN', 'slanderin', 'farrago', 'BROKE', 'Sartain', 'Mt', 'Hecla', 'persist', 'mystifying', 'unsay', 'criminal', 'Wall', 'purty', 'sarmon', 'rips', 'tellin', 'bought', 'balmed', 'curios', 'sellin', 'inions', 'fooling', 'idolators', 'Depend', 'reg', 'lar', 'spliced', 'Johnny', 'sprawling', 'Arter', 'glim', 'jiffy', 'irresolute', 'vum', 'WON', 'Folding', 'scrutiny', 'porcupine', 'moccasin', 'ponchos', 'parade', 'rainy', 'remembering', 'commended', 'cobs', 'Nod', 'footfall', 'unlacing', 'blackish', 'plasters', 'inkling', 'Placing', 'crammed', 'scalp', 'mildewed', 'Ignorance', 'parent', 'nonplussed', 'undressing', 'checkered', 'Thirty', 'frogs', 'quaked', 'wrapall', 'dreadnaught', 'fumbled', 'Remembering', 'manikin', 'tenpin', 'andirons', 'jambs', 'bricks', 'appropriate', 'applying', 'hastier', 'withdrawals', 'antics', 'devotee', 'extinguishing', 'unceremoniously', 'bagged', 'sportsman', 'woodcock', 'uncomfortableness', 'deliberating', 'puffed', 'sang', 'Stammering', 'conjured', 'responses', 'debel', 'flourishing', 'Angels', 'flourishings', 'peddlin', 'sleepe', 'grunted', 'gettee', 'motioning', 'comely', 'insured', 'Counterpane', 'parti', 'triangles', 'interminable', 'caper', 'supperless', '21st', 'hemisphere', 'sigh', 'Sixteen', 'ached', 'coaches', 'stockinged', 'slippering', 'misbehaviour', 'unendurable', 'stepmothers', 'misfortunes', 'steeped', 'shudderingly', 'confounding', 'soberly', 'recurred', 'predicament', 'unlock', 'bridegroom', 'clasp', 'hugged', 'rouse', 'snore', 'scratch', 'Throwing', 'expostulations', 'unbecomingness', 'matrimonial', 'dawning', 'overture', 'innate', 'compliment', 'civility', 'rudeness', 'toilette', 'dressing', 'donning', 'gaspings', 'booting', 'caterpillar', 'outlandishness', 'manners', 'education', 'undergraduate', 'dreamt', 'cowhide', 'pinched', 'curtains', 'indecorous', 'contented', 'restricting', 'donned', 'lathering', 'unsheathes', 'whets', 'Rogers', 'cutlery', 'Afterwards', 'baton', 'Breakfast', 'pleasantly', 'bountifully', 'laughable', 'bosky', 'unshorn', 'gowns', 'toasted', 'lingers', 'tarried', 'barred', 'Grub', 'Park', 'assurance', 'polish', 'occasioned', 'embarrassed', 'bashfulness', 'duelled', 'winking', 'tastes', 'sheepishly', 'bashful', 'icicle', 'admirer', 'cordially', 'grappling', 'genteelly', 'eschewed', 'undivided', '6', 'circulating', 'nondescripts', 'Chestnut', 'jostle', 'Regent', 'Lascars', 'Bombay', 'Apollo', 'Feegeeans', 'Tongatobooarrs', 'Erromanggoans', 'Pannangians', 'Brighggians', 'weekly', 'Vermonters', 'stalwart', 'frames', 'felled', 'strutting', 'wester', 'bombazine', 'cloak', 'mow', 'gloves', 'joins', 'outfit', 'waistcoats', 'Hay', 'Seed', 'tract', 'dearest', 'pave', 'eggs', 'patrician', 'parks', 'scraggy', 'scoria', 'Herr', 'dowers', 'nieces', 'reservoirs', 'maples', 'bountiful', 'proffer', 'passer', 'cones', 'blossoms', 'superinduced', 'carnation', 'Salem', 'sweethearts', 'Puritanic', 'Whaleman', 'Wrapping', 'Each', 'quote', 'TALBOT', 'Near', 'Desolation', '1st', 'SISTER', 'ROBERT', 'WILLIS', 'ELLERY', 'NATHAN', 'COLEMAN', 'WALTER', 'CANNY', 'SETH', 'GLEIG', 'Forming', 'ELIZA', '31st', 'MARBLE', 'SHIPMATES', 'EZEKIEL', 'HARDY', 'AUGUST', '3d', '1833', 'WIDOW', 'Shaking', 'glazed', 'Affected', 'relatives', 'unhealing', 'sympathetically', 'wounds', 'bleed', 'blanks', ...]
V = set(text1)
long_words = [w for w in V if len(w) > 15 ]
sorted(long_words)
['CIRCUMNAVIGATION', 'Physiognomically', 'apprehensiveness', 'cannibalistically', 'characteristically', 'circumnavigating', 'circumnavigation', 'circumnavigations', 'comprehensiveness', 'hermaphroditical', 'indiscriminately', 'indispensableness', 'irresistibleness', 'physiognomically', 'preternaturalness', 'responsibilities', 'simultaneousness', 'subterraneousness', 'supernaturalness', 'superstitiousness', 'uncomfortableness', 'uncompromisedness', 'undiscriminating', 'uninterpenetratingly']
V2 = set(text5)
long_words_2 = [w for w in V2 if len(w)> 20]
sorted(long_words)
['CIRCUMNAVIGATION', 'Physiognomically', 'apprehensiveness', 'cannibalistically', 'characteristically', 'circumnavigating', 'circumnavigation', 'circumnavigations', 'comprehensiveness', 'hermaphroditical', 'indiscriminately', 'indispensableness', 'irresistibleness', 'physiognomically', 'preternaturalness', 'responsibilities', 'simultaneousness', 'subterraneousness', 'supernaturalness', 'superstitiousness', 'uncomfortableness', 'uncompromisedness', 'undiscriminating', 'uninterpenetratingly']
V3 = set(text3)
long_words_3 = [words for words in V3 if len(words) < 6]
sorted(long_words_3)
['!', "'", '(', ')', ',', ',)', '.', '.)', ':', ';', ';)', '?', '?)', 'A', 'Abel', 'Abide', 'Abr', 'Abrah', 'Abram', 'Accad', 'Adah', 'Adam', 'Admah', 'After', 'Ajah', 'Akan', 'All', 'Also', 'Alvah', 'Alvan', 'Am', 'Amal', 'Ammon', 'An', 'Anah', 'And', 'Aner', 'Angel', 'Aram', 'Aran', 'Arbah', 'Ard', 'Are', 'Areli', 'Arise', 'Arodi', 'Art', 'As', 'Asher', 'Ask', 'Assyr', 'At', 'Atad', 'Avith', 'Babel', 'Be', 'Bedad', 'Beeri', 'Bela', 'Belah', 'Benam', 'Beno', 'Beor', 'Bera', 'Bered', 'Bless', 'Both', 'Bow', 'Bring', 'But', 'Buz', 'By', 'Cain', 'Calah', 'Can', 'Cana', 'Carmi', 'Cast', 'Cause', 'Come', 'Cush', 'Dan', 'Day', 'Dedan', 'Din', 'Dinah', 'Do', 'Drink', 'Duke', 'Dumah', 'Earth', 'Ebal', 'Eber', 'Edar', 'Eden', 'Edom', 'Egy', 'Egypt', 'Ehi', 'Elah', 'Elam', 'Elon', 'Emins', 'En', 'Eno', 'Enoch', 'Enos', 'Ephah', 'Epher', 'Ephra', 'Er', 'Erech', 'Eri', 'Es', 'Esau', 'Esek', 'Eve', 'Even', 'Every', 'Ezbon', 'Ezer', 'Fear', 'Feed', 'Fill', 'For', 'From', 'G', 'Gad', 'Gaham', 'Gatam', 'Gaza', 'Gera', 'Gerar', 'Get', 'Gihon', 'Give', 'Go', 'God', 'Gomer', 'Guni', 'Hadad', 'Hadar', 'Hagar', 'Haggi', 'Hai', 'Ham', 'Hamor', 'Hamul', 'Happy', 'Haran', 'Hast', 'Haste', 'Have', 'Hazo', 'He', 'Hear', 'Heber', 'Hemam', 'Here', 'Heth', 'Hirah', 'His', 'Hitti', 'Hobah', 'Hori', 'How', 'Hul', 'Huz', 'I', 'If', 'In', 'Irad', 'Iram', 'Is', 'Isa', 'Isaac', 'Iscah', 'Isra', 'Isui', 'It', 'Jabal', 'Jac', 'Jacob', 'Jamin', 'Japhe', 'Jared', 'Javan', 'Jerah', 'Jetur', 'Jeush', 'Jezer', 'Job', 'Jobab', 'Jubal', 'Judah', 'Judge', 'Kedar', 'Kenaz', 'Know', 'Kor', 'Korah', 'LO', 'LORD', 'Laban', 'Lasha', 'Lay', 'Leah', 'Lest', 'Let', 'Levi', 'Lie', 'Lift', 'Lo', 'Look', 'Lot', 'Lotan', 'Lud', 'Ludim', 'Luz', 'Madai', 'Magog', 'Make', 'Male', 'Mam', 'Mamre', 'Man', 'Mash', 'Massa', 'Me', 'Medan', 'Mesha', 'Mizz', 'Moab', 'Moreh', 'My', 'Nahor', 'Nay', 'Night', 'Noah', 'Nod', 'Not', 'Now', 'O', 'Obal', 'Of', 'Oh', 'Ohad', 'Omar', 'On', 'Onam', 'Onan', 'Only', 'Ophir', 'Our', 'Out', 'Padan', 'Paran', 'Pass', 'Pau', 'Peace', 'Peleg', 'Phara', 'Phut', 'Pinon', 'Pison', 'Put', 'Rebek', 'Resen', 'Reu', 'Reub', 'Reuel', 'Rosh', 'Said', 'Salah', 'Salem', 'Sarah', 'Sarai', 'Saul', 'Save', 'Say', 'Se', 'Seba', 'See', 'Seir', 'Sell', 'Send', 'Serah', 'Sered', 'Serug', 'Set', 'Seth', 'Shall', 'Shalt', 'Shaul', 'She', 'Sheba', 'Shed', 'Shel', 'Shem', 'Shuah', 'Shuni', 'Shur', 'Sidon', 'Slay', 'So', 'Sod', 'Sodom', 'Some', 'Spake', 'Speak', 'Stand', 'Swear', 'Take', 'Tamar', 'Tebah', 'Tell', 'Tema', 'Teman', 'Terah', 'That', 'The', 'Then', 'There', 'These', 'They', 'This', 'Thou', 'Thus', 'Thy', 'Tidal', 'Timna', 'Tiras', 'To', 'Tola', 'Tubal', 'Two', 'Until', 'Unto', 'Up', 'Upon', 'Ur', 'Uz', 'Uzal', 'We', 'What', 'When', 'Where', 'Which', 'While', 'Who', 'Whose', 'Whoso', 'Why', 'Wilt', 'With', 'Woman', 'Ye', 'Yea', 'Yet', 'Zar', 'Zarah', 'Zebul', 'Zepho', 'Zerah', 'Zidon', 'Zo', 'Zoar', 'Zohar', 'a', 'abide', 'able', 'abode', 'about', 'above', 'add', 'adder', 'afar', 'after', 'aga', 'again', 'age', 'air', 'al', 'alive', 'all', 'almon', 'alo', 'alone', 'aloud', 'also', 'altar', 'am', 'among', 'an', 'and', 'angel', 'anger', 'angry', 'anoth', 'any', 'appe', 'are', 'arise', 'ark', 'armed', 'arms', 'army', 'arose', 'art', 'as', 'ash', 'ask', 'asked', 'ass', 'asses', 'at', 'aw', 'away', 'awoke', 'back', 'bad', 'bade', 'badne', 'bak', 'bake', 'baker', 'balm', 'bands', 'bank', 'bare', 'barr', 'be', 'bear', 'beari', 'beast', 'bed', 'been', 'began', 'begat', 'beget', 'begin', 'being', 'belly', 'best', 'bird', 'birds', 'blame', 'bless', 'blood', 'bone', 'bones', 'book', 'born', 'bosom', 'both', 'bou', 'boug', 'bough', 'bound', 'bow', 'bowed', 'boys', 'brass', 'bre', 'bread', 'break', 'breed', 'brick', 'bring', 'brink', 'brook', 'broth', 'brown', 'build', 'built', 'bulls', 'burn', 'burnt', 'bury', 'but', 'buy', 'by', 'cakes', 'calf', 'call', 'came', 'camel', 'can', 'canst', 'carry', 'cast', 'catt', 'cause', 'cave', 'cease', 'chain', 'chi', 'chief', 'child', 'chode', 'chose', 'citi', 'city', 'clave', 'clean', 'clear', 'clo', 'cloud', 'co', 'coat', 'coats', 'cold', 'colt', 'colts', 'come', 'comi', 'cool', 'corn', 'couch', 'could', 'cried', 'crown', 'cru', 'cry', 'cubit', 'cup', 'curse', 'cut', 'd', 'da', 'dale', 'dark', 'day', 'days', 'dea', 'dead', 'deal', 'dealt', 'death', 'deed', 'deeds', 'deep', 'dew', 'did', 'didst', 'die', 'died', 'dim', 'dine', 'do', 'doe', 'doer', 'doest', 'doeth', 'doing', 'done', 'door', 'dost', 'doth', 'doubt', 'dove', 'down', 'dowry', 'drank', 'draw', 'dread', 'dream', 'dress', 'drew', 'dried', 'drink', 'drove', 'dry', 'duke', 'dukes', 'dunge', 'dust', 'dwe', 'dwell', 'dwelt', 'e', 'ea', 'each', 'ear', 'early', 'ears', 'earth', 'east', 'eat', 'eaten', 'edge', 'eight', 'elder', 'else', 'empty', 'end', 'ended', 'enter', 'ev', 'even', 'ever', 'every', 'evil', 'ewe', 'ewes', 'excel', 'ey', 'eyed', 'eyes', 'fa', 'face', 'faces', 'fai', 'fail', 'fair', 'fall', 'fame', 'far', 'fast', 'fat', 'fath', 'fathe', 'fear', 'feast', 'fed', 'feed', 'feel', 'feet', 'fell', 'felt', 'fema', 'fetch', 'few', 'fie', 'field', 'fifth', 'fifty', 'fig', 'fill', 'find', 'fine', 'fir', 'fire', 'first', 'fish', 'five', 'fle', 'fled', 'flee', 'flesh', 'flo', 'floc', 'flock', 'flood', 'floor', 'fly', 'fo', 'foal', 'foals', 'folk', 'folly', 'food', 'foot', 'for', 'force', 'ford', 'form', 'forth', 'forty', 'fou', 'found', 'four', 'fowl', 'fowls', 'fro', 'from', 'frost', 'fruit', 'full', 'fury', 'gard', 'gat', 'gate', 'gave', 'get', 'ghost', 'gift', 'gifts', 'give', 'given', 'glory', 'go', 'goa', 'goat', 'goats', 'gods', 'goest', 'goeth', 'going', 'gold', 'gone', 'good', 'goods', 'got', 'gr', 'grace', 'grap', 'grass', 'grave', 'gray', 'gre', 'great', 'green', 'grew', 'grief', 'grisl', 'gro', 'grove', 'grow', 'grown', 'guard', 'h', 'ha', 'had', 'hadst', 'hairs', 'hairy', 'half', 'han', 'hand', 'hands', 'hang', 'hard', 'harm', 'harp', 'hast', 'haste', 'hate', 'hated', 'hath', 'have', 'haven', 'hazel', 'he', 'head', 'heads', 'heap', 'hear', 'heard', 'heart', 'heat', 'heav', 'heed', 'heel', 'heels', 'heir', 'held', 'help', 'hence', 'her', 'herb', 'herd', 'herds', 'here', 'hid', 'hide', 'high', 'hil', 'hills', 'him', 'hind', 'hire', 'hired', 'his', 'hith', 'hold', 'home', 'honey', 'hor', 'horse', 'host', 'hotly', 'hou', 'hous', 'house', 'how', 'hunt', 'hurt', 'husba', 'if', 'ill', 'image', 'in', 'inn', 'into', 'ir', 'is', 'isles', 'issue', 'it', 'joint', 'jud', 'judge', 'just', 'keep', 'kept', 'ki', 'kid', 'kids', 'kill', 'kind', 'kinds', 'kine', 'king', 'kings', 'kiss', 'kn', 'knead', 'kneel', 'knees', 'knew', 'knife', 'know', 'known', 'la', 'lack', 'lad', 'lade', 'laded', 'laden', 'lads', 'laid', 'lamb', 'lambs', 'lamp', 'lan', 'land', 'lands', 'large', 'last', 'laugh', 'law', 'laws', 'lay', 'lead', 'leaf', 'lean', 'leap', 'least', 'leave', 'led', 'left', 'lest', 'let', 'li', 'lie', 'lien', 'liest', 'lieth', 'life', 'lift', 'light', 'like', 'linen', 'lion', 'live', 'lived', 'lives', 'lo', 'lodge', 'loins', 'long', 'look', 'loose', 'lord', 'lords', 'loss', 'loud', 'love', 'loved', 'lower', 'lying', 'm', 'ma', 'made', 'maid', 'make', 'male', 'males', 'man', 'many', 'mark', 'marry', 'mast', 'may', 'me', 'mead', 'meal', 'mean', 'meant', 'meat', 'meet', 'men', 'mercy', 'merry', 'mess', 'met', 'mi', 'midst', 'might', 'milch', 'milk', 'mind', 'mine', 'mirth', 'mist', 'mock', 'money', 'month', 'moon', 'more', 'most', 'mou', 'mount', 'mourn', 'mouth', 'moved', 'much', 'mules', 'must', 'my', 'myrrh', 'n', 'na', 'naked', 'name', 'named', 'names', 'nati', 'natio', 'ne', 'near', 'neck', 'needs', 'never', 'next', 'nig', 'nigh', 'night', 'nine', 'no', 'none', 'noon', 'nor', 'north', 'not', 'now', 'nurse', 'nuts', 'o', 'oa', 'oak', 'oath', 'obey', 'of', 'off', 'offer', 'oil', 'old', 'olive', 'on', 'one', 'ones', 'only', 'onyx', 'open', 'or', 'order', 'organ', 'oth', 'other', 'ou', 'ought', 'our', 'ours', 'out', 'over', 'own', 'oxen', 'part', 'parts', 'pass', 'past', 'path', 'pea', 'peace', 'peop', 'piece', 'pit', 'pitch', ...]
fdist5 = FreqDist(text5)
sorted([w for w in set(text5) if len(w) > 7 and fdist5[w] > 7])
['#14-19teens', '#talkcity_adults', '((((((((((', '........', 'Question', 'actually', 'anything', 'computer', 'cute.-ass', 'everyone', 'football', 'innocent', 'listening', 'remember', 'seriously', 'something', 'together', 'tomorrow', 'watching']
fdist1 = FreqDist(text1)
sorted([w for w in set(text1) if len(w) > 9 and fdist1[w] > 10])
['Nantucketer', 'Nevertheless', 'additional', 'afterwards', 'altogether', 'appearance', 'blacksmith', 'circumstance', 'circumstances', 'completely', 'concerning', 'concluding', 'conscience', 'considerable', 'considered', 'considering', 'continually', 'convenient', 'countenance', 'difference', 'disappeared', 'encountered', 'especially', 'everything', 'exceedingly', 'experienced', 'forecastle', 'frequently', 'harpooneer', 'harpooneers', 'horizontal', 'immediately', 'impossible', 'indifferent', 'indirectly', 'indispensable', 'individual', 'infallibly', 'invariably', 'involuntarily', 'lengthwise', 'marvellous', 'monomaniac', 'naturalists', 'nevertheless', 'occasionally', 'originally', 'particular', 'peculiarities', 'perpendicular', 'previously', 'prodigious', 'remarkable', 'scientific', 'significant', 'simultaneously', 'spermaceti', 'straightway', 'subsequent', 'themselves', 'unaccountable', 'understand']
bigrams(['more', 'is', 'said', 'than', 'done'])
# [('more', 'is'), ('is', 'said'), ('said', 'than'), ('than', 'done')]
<generator object bigrams at 0x9ec66c70>
text4.collocations()
United States; fellow citizens; four years; years ago; Federal Government; General Government; American people; Vice President; God bless; Chief Justice; Old World; Almighty God; Fellow citizens; Chief Magistrate; every citizen; one another; fellow Americans; Indian tribes; public debt; foreign nations
text8.collocations()
would like; medium build; social drinker; quiet nights; non smoker; long term; age open; Would like; easy going; financially secure; fun times; similar interests; Age open; weekends away; poss rship; well presented; never married; single mum; permanent relationship; slim build
[len(w) for w in text1]
[1, 4, 4, 2, 6, 8, 4, 1, 9, 1, 1, 8, 2, 1, 4, 11, 5, 2, 1, 7, 6, 1, 3, 4, 5, 2, 10, 2, 4, 1, 5, 1, 4, 1, 3, 5, 1, 1, 3, 3, 3, 1, 2, 3, 4, 7, 3, 3, 8, 3, 8, 1, 4, 1, 5, 12, 1, 9, 11, 4, 3, 3, 3, 5, 2, 3, 3, 5, 7, 2, 3, 5, 1, 2, 5, 2, 4, 3, 3, 8, 1, 2, 7, 6, 8, 3, 2, 3, 9, 1, 1, 5, 3, 4, 2, 4, 2, 6, 6, 1, 3, 2, 5, 4, 2, 4, 4, 1, 5, 1, 4, 2, 2, 2, 6, 2, 3, 6, 7, 3, 1, 7, 9, 1, 3, 6, 1, 1, 5, 6, 5, 6, 3, 13, 2, 3, 4, 1, 3, 7, 4, 5, 2, 3, 4, 2, 2, 8, 1, 5, 1, 3, 2, 1, 3, 3, 1, 4, 1, 4, 6, 2, 5, 4, 9, 2, 7, 1, 3, 2, 3, 1, 5, 2, 6, 2, 7, 2, 2, 7, 1, 1, 10, 1, 5, 1, 3, 2, 2, 4, 11, 4, 3, 3, 1, 3, 3, 1, 6, 1, 1, 1, 1, 1, 4, 1, 3, 1, 2, 4, 1, 2, 6, 2, 2, 10, 1, 1, 10, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 6, 1, 3, 1, 5, 1, 4, 1, 7, 1, 5, 1, 9, 1, 5, 1, 7, 1, 7, 1, 6, 1, 7, 1, 7, 1, 5, 1, 4, 1, 4, 1, 5, 1, 5, 1, 4, 1, 4, 1, 11, 1, 8, 1, 8, 2, 1, 3, 1, 3, 1, 9, 2, 2, 4, 2, 4, 4, 4, 4, 11, 8, 3, 4, 1, 4, 2, 1, 4, 5, 2, 1, 3, 1, 3, 7, 2, 4, 4, 7, 3, 4, 8, 3, 6, 1, 6, 2, 3, 5, 1, 7, 2, 8, 6, 9, 2, 6, 2, 5, 7, 4, 2, 3, 4, 10, 1, 6, 2, 7, 1, 9, 3, 4, 3, 1, 2, 5, 4, 2, 5, 1, 4, 3, 8, 1, 8, 5, 10, 1, 7, 9, 1, 2, 5, 8, 1, 3, 9, 6, 8, 1, 3, 4, 2, 1, 2, 8, 3, 7, 7, 9, 1, 2, 4, 2, 3, 5, 4, 9, 1, 5, 8, 3, 6, 8, 2, 12, 1, 2, 9, 1, 8, 4, 1, 1, 3, 4, 2, 4, 3, 4, 13, 4, 1, 7, 1, 7, 1, 3, 4, 2, 9, 1, 2, 4, 7, 3, 11, 1, 9, 3, 3, 1, 2, 4, 4, 4, 1, 4, 5, 2, 1, 3, 1, 3, 1, 5, 11, 1, 2, 1, 4, 9, 2, 4, 8, 1, 6, 5, 5, 2, 4, 2, 4, 5, 4, 4, 4, 1, 3, 3, 4, 4, 4, 6, 5, 2, 3, 4, 1, 6, 1, 3, 4, 4, 3, 9, 5, 2, 3, 1, 3, 4, 4, 1, 8, 1, 3, 1, 3, 4, 9, 4, 5, 1, 3, 3, 2, 4, 7, 1, 4, 4, 4, 3, 5, 7, 1, 3, 2, 3, 10, 10, 7, 2, 4, 2, 2, 1, 3, 1, 4, 1, 3, 2, 3, 4, 3, 4, 5, 2, 4, 2, 6, 3, 5, 1, 2, 2, 4, 3, 4, 5, 2, 3, 4, 2, 9, 1, 5, 4, 1, 5, 5, 3, 7, 5, 3, 3, 9, 3, 2, 1, 3, 4, 4, 4, 5, 3, 3, 5, 2, 3, 5, 1, 4, 4, 4, 6, 1, 3, 4, 7, 3, 4, 4, 6, 3, 8, 3, 3, 5, 1, 7, 7, 1, 3, 6, 8, 2, 4, 1, 8, 7, 1, 7, 1, 3, 7, 1, 7, 4, 6, 1, 4, 2, 6, 3, 10, 6, 8, 2, 5, 1, 2, 5, 6, 14, 7, 1, 8, 1, 1, 3, 3, 7, 5, 6, 2, 2, 7, 1, 1, 9, 6, 1, 4, 2, 5, 5, 3, 1, 3, 5, 5, 3, 4, 2, 2, 5, 2, 2, 3, 1, 1, 3, 3, 4, 3, 8, 1, 5, 4, 2, 7, 2, 5, 2, 2, 5, 1, 1, 5, 2, 3, 5, 1, 5, 2, 4, 9, 4, 4, 4, 4, 2, 4, 7, 2, 2, 6, 1, 1, 2, 4, 3, 1, 3, 4, 4, 3, 4, 1, 3, 5, 1, 3, 6, 5, 1, 5, 6, 9, 3, 8, 7, 1, 4, 9, 4, 7, 7, 1, 3, 2, 5, 4, 3, 6, 4, 2, 2, 3, 3, 2, 2, 6, 1, 3, 4, 5, 6, 7, 6, 6, 3, 5, 2, 4, 7, 1, 1, 5, 1, 2, 2, 5, 1, 4, 1, 2, 5, 1, 4, 2, 4, 3, 13, 4, 4, 5, 7, 2, 3, 1, 3, 9, 2, 3, 10, 4, 2, 3, 6, 2, 2, 7, 1, 1, 8, 1, 1, 6, 1, 1, 3, 6, 3, 8, 3, 4, 3, 3, 7, 6, 4, 3, 1, 5, 5, 3, 6, 3, 11, 6, 7, 1, 4, 2, 2, 4, 2, 6, 2, 4, 5, 2, 6, 2, 4, 2, 2, 7, 1, 1, 5, 1, 1, 8, 3, 2, 9, 3, 4, 2, 3, 3, 1, 4, 5, 7, 1, 5, 4, 6, 3, 5, 8, 2, 3, 3, 1, 8, 1, 5, 3, 6, 1, 3, 3, 2, 1, 4, 9, 4, 1, 3, 4, 4, 7, 2, 1, 4, 1, 7, 1, 7, 3, 5, 2, 3, 5, 1, 3, 7, 3, 3, 6, 3, 4, 1, 4, 2, 2, 5, 1, 1, 6, 1, 1, 3, 4, 7, 2, 1, 2, 7, 4, 7, 4, 4, 1, 4, 2, 8, 5, 1, 6, 1, 5, 3, 5, 2, 4, 5, 5, 3, 5, 5, 1, 2, 5, 2, 7, 4, 2, 3, 4, 1, 3, 3, 4, 6, 4, 7, 2, 3, 3, 7, 1, 2, 5, 4, 4, 5, 1, 5, 1, 4, 5, 5, 4, 1, 2, ...]
fdist = FreqDist([len(w) for w in text1])
fdist
FreqDist({3: 50223, 1: 47933, 4: 42345, 2: 38513, 5: 26597, 6: 17111, 7: 14399, 8: 9966, 9: 6428, 10: 3528, ...})
fdist.keys()
dict_keys([1, 4, 2, 6, 8, 9, 11, 5, 7, 3, 10, 12, 13, 14, 16, 15, 17, 18, 20])
fdist.items()
dict_items([(1, 47933), (4, 42345), (2, 38513), (6, 17111), (8, 9966), (9, 6428), (11, 1873), (5, 26597), (7, 14399), (3, 50223), (10, 3528), (12, 1053), (13, 567), (14, 177), (16, 22), (15, 70), (17, 12), (18, 1), (20, 1)])
fdist.max()
3
fdist[3]
50223
fdist.freq(3)
0.19255882431878046
sent7
['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.']
[w for w in sent7 if len(w) < 4]
[',', '61', 'old', ',', 'the', 'as', 'a', '29', '.']
[w for w in sent7 if len(w) <= 4]
[',', '61', 'old', ',', 'will', 'join', 'the', 'as', 'a', 'Nov.', '29', '.']
[w for w in sent7 if len(w) == 4]
['will', 'join', 'Nov.']
[w for w in sent7 if len(w) != 4]
['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', '29', '.']
sorted([w for w in set(text1) if w.endswith('ableness')])
['comfortableness', 'honourableness', 'immutableness', 'indispensableness', 'indomitableness', 'intolerableness', 'palpableness', 'reasonableness', 'uncomfortableness']
sorted([term for term in set(text4) if 'gnt' in term])
['Sovereignty', 'sovereignties', 'sovereignty']
sorted([item for item in set(text6) if item.istitle()])
['A', 'Aaaaaaaaah', 'Aaaaaaaah', 'Aaaaaah', 'Aaaah', 'Aaaaugh', 'Aaagh', 'Aaah', 'Aaauggh', 'Aaaugh', 'Aaauugh', 'Aagh', 'Aah', 'Aauuggghhh', 'Aauuugh', 'Aauuuuugh', 'Aauuuves', 'Action', 'Actually', 'African', 'Ages', 'Aggh', 'Agh', 'Ah', 'Ahh', 'Alice', 'All', 'Allo', 'Almighty', 'Alright', 'Am', 'Amen', 'An', 'Anarcho', 'And', 'Angnor', 'Anthrax', 'Antioch', 'Anybody', 'Anyway', 'Apples', 'Aramaic', 'Are', 'Arimathea', 'Armaments', 'Arthur', 'As', 'Ask', 'Assyria', 'At', 'Attila', 'Augh', 'Autumn', 'Auuuuuuuugh', 'Away', 'Ay', 'Ayy', 'B', 'Back', 'Bad', 'Badon', 'Battle', 'Be', 'Beast', 'Bedevere', 'Bedwere', 'Behold', 'Between', 'Beyond', 'Black', 'Bloody', 'Blue', 'Bon', 'Bones', 'Book', 'Bors', 'Brave', 'Bravely', 'Bravest', 'Bread', 'Bridge', 'Bring', 'Bristol', 'Britain', 'Britons', 'Brother', 'Build', 'Burn', 'But', 'By', 'C', 'Caerbannog', 'Camaaaaaargue', 'Camelot', 'Castle', 'Chapter', 'Charge', 'Chaste', 'Cherries', 'Chicken', 'Chickennn', 'Chop', 'Christ', 'Churches', 'Cider', 'Clark', 'Clear', 'Come', 'Concorde', 'Consult', 'Cornwall', 'Could', 'Course', 'Court', 'Crapper', 'Cut', 'Dappy', 'Death', 'Defeat', 'Dennis', 'Did', 'Didn', 'Dingo', 'Dis', 'Divine', 'Do', 'Doctor', 'Does', 'Don', 'Dragon', 'Dramatically', 'Ecky', 'Ector', 'Eee', 'Eh', 'Enchanter', 'England', 'English', 'Erbert', 'Ere', 'Erm', 'Eternal', 'European', 'Even', 'Every', 'Everything', 'Ewing', 'Exactly', 'Excalibur', 'Excuse', 'Explain', 'Far', 'Farewell', 'Father', 'Fetchez', 'Fiends', 'Fine', 'First', 'Firstly', 'Five', 'Follow', 'For', 'Forgive', 'Forward', 'Found', 'Four', 'France', 'Frank', 'French', 'Gable', 'Galahad', 'Gallahad', 'Gawain', 'Get', 'Go', 'God', 'Good', 'Gorge', 'Grail', 'Great', 'Greetings', 'Grenade', 'Guards', 'Guy', 'Ha', 'Hah', 'Hallo', 'Halt', 'Hand', 'Hang', 'Have', 'Haw', 'He', 'Hee', 'Heee', 'Heh', 'Hello', 'Help', 'Herbert', 'Here', 'Hey', 'Hic', 'Hill', 'Himself', 'His', 'Hiyaah', 'Hiyah', 'Hiyya', 'Hm', 'Hmm', 'Ho', 'Hoa', 'Hold', 'Holy', 'Honestly', 'Hoo', 'Hooray', 'How', 'Huh', 'Hurry', 'Huy', 'Huyah', 'Hya', 'Hyy', 'I', 'Idiom', 'Iesu', 'If', 'Iiiiives', 'Iiiives', 'In', 'Is', 'Isn', 'It', 'Ives', 'Jesus', 'Joseph', 'Just', 'Keep', 'King', 'Knight', 'Knights', 'Lady', 'Lake', 'Lancelot', 'Launcelot', 'Lead', 'Leaving', 'Let', 'Lie', 'Like', 'Listen', 'Loimbard', 'Look', 'Looks', 'Lord', 'Lucky', 'Make', 'Man', 'May', 'Maynard', 'Meanwhile', 'Mercea', 'Message', 'Midget', 'Mind', 'Mine', 'Mmm', 'Monsieur', 'More', 'Morning', 'Most', 'Mother', 'Mud', 'Must', 'My', 'N', 'Nador', 'Nay', 'Neee', 'Never', 'Ni', 'Nine', 'Ninepence', 'No', 'None', 'Not', 'Nothing', 'Now', 'Nu', 'O', 'Of', 'Off', 'Oh', 'Ohh', 'Old', 'Olfin', 'On', 'Once', 'One', 'Ooh', 'Oooh', 'Oooo', 'Oooohoohohooo', 'Oooooooh', 'Open', 'Or', 'Order', 'Other', 'Oui', 'Our', 'Over', 'Ow', 'Packing', 'Patsy', 'Pendragon', 'Peng', 'Perhaps', 'Peril', 'Picture', 'Pie', 'Piglet', 'Pin', 'Please', 'Practice', 'Prepare', 'Prince', 'Princess', 'Providence', 'Psalms', 'Pull', 'Pure', 'Put', 'Quick', 'Quickly', 'Quiet', 'Quite', 'Quoi', 'Rather', 'Really', 'Recently', 'Remove', 'Rheged', 'Ridden', 'Right', 'Riiight', 'Robin', 'Robinson', 'Roger', 'Round', 'Run', 'Running', 'S', 'Said', 'Saint', 'Saxons', 'Say', 'Schools', 'See', 'Seek', 'Shall', 'She', 'Shh', 'Shrubber', 'Shrubberies', 'Shut', 'Silence', 'Silly', 'Since', 'Sir', 'Skip', 'So', 'Sorry', 'Speak', 'Splendid', 'Spring', 'Stand', 'Stay', 'Steady', 'Stop', 'Summer', 'Supposing', 'Supreme', 'Surely', 'Swamp', 'Table', 'Tale', 'Tall', 'Tell', 'Thank', 'That', 'The', 'Thee', 'Then', 'There', 'Therefore', 'They', 'This', 'Those', 'Thou', 'Thpppppt', 'Thppppt', 'Thpppt', 'Thppt', 'Three', 'Throw', 'Thsss', 'Thursday', 'Thy', 'Til', 'Tim', 'Tis', 'To', 'Today', 'Together', 'Too', 'Torment', 'Tower', 'True', 'Try', 'Twenty', 'Two', 'U', 'Uh', 'Uhh', 'Ulk', 'Um', 'Umhm', 'Umm', 'Un', 'Unfortunately', 'Until', 'Use', 'Uther', 'Uugh', 'Uuh', 'Very', 'Victory', 'W', 'Waa', 'Wait', 'Walk', 'Wayy', 'We', 'Welcome', 'Well', 'What', 'When', 'Where', 'Which', 'Who', 'Whoa', 'Why', 'Will', 'Winston', 'Winter', 'With', 'Woa', 'Wood', 'Would', 'Y', 'Yapping', 'Yay', 'Yeaaah', 'Yeaah', 'Yeah', 'Yes', 'You', 'Your', 'Yup', 'Zoot']
sorted([item for item in set(sent7) if item.isdigit()])
['29', '61']
sorted([item for item in set(text7) if item.isalnum()])
['0', '1', '10', '100', '101', '102', '103', '105', '106', '107', '108', '10th', '11', '110', '111', '114', '115', '118', '119', '11th', '12', '120', '125', '128', '13', '130', '132', '133', '135', '138', '139', '14', '140', '144', '145', '148', '149', '15', '150', '155', '16', '160', '1614', '1637', '17', '170', '175', '176', '177', '1787', '179', '18', '180', '184', '187', '188', '19', '190', '1901', '1903', '1917', '1920s', '1925', '1929', '1933', '1934', '1940s', '1948', '195', '1950s', '1953', '1955', '1956', '1960s', '1961', '1965', '1966', '1967', '1968', '1969', '1970', '1970s', '1971', '1972', '1973', '1975', '1976', '1977', '1979', '198', '1980', '1980s', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1990s', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '1st', '2', '20', '200', '2000', '2005', '2009', '2017', '2019', '2029', '203', '20s', '21', '210', '212', '214', '22', '220', '225', '227', '228', '23', '235', '24', '240', '241', '245', '25', '250', '257', '26', '260', '266', '27', '270', '274', '275', '28', '280', '282', '286', '29', '295', '29year', '3', '30', '300', '301', '3057', '306', '30s', '31', '310', '313', '32', '320', '321', '326', '33', '339', '34', '343', '35', '350', '353', '36', '360', '37', '370', '38', '380', '386', '388', '39', '397', '4', '40', '400', '405', '41', '415', '42', '420', '43', '430', '44', '445', '45', '450', '451', '454', '458', '46', '467', '47', '472', '48', '49', '490', '492', '5', '50', '500', '501', '51', '512', '52', '53', '534', '54', '55', '56', '57', '570', '576', '58', '59', '598', '6', '60', '600', '605', '609', '61', '62', '620', '63', '64', '644', '65', '666', '672', '68', '69', '692', '7', '70', '700', '701', '71', '72', '721', '722', '73', '730', '75', '750', '753', '76', '767', '77', '777', '778', '78', '79', '8', '80', '800', '83', '8300', '8300s', '847', '85', '850', '86', '879', '88', '89', '890', '9', '90', '900', '909', '913', '917', '92', '93', '94', '95', '960', '963', '98', '99', 'A', 'ABA', 'ABORTION', 'ACCEPTANCES', 'ACCOUNT', 'ACQUISITION', 'ADRs', 'AG', 'AGREES', 'AIDS', 'AMR', 'AN', 'AND', 'APPEARS', 'ASLACTON', 'ASSETS', 'ASSOCIATES', 'ASSOCIATION', 'Abbey', 'Abbot', 'About', 'Above', 'Abrupt', 'Absorbed', 'Academically', 'Acceptance', 'According', 'Account', 'Achievement', 'Ackerman', 'Acquisition', 'Act', 'Activity', 'Actually', 'Ad', 'Adam', 'Adams', 'Adds', 'Administration', 'Adolph', 'Adopting', 'Advance', 'Advanced', 'Advancing', 'Advice', 'Advocates', 'Aerojet', 'Aerospace', 'Affairs', 'Africa', 'African', 'After', 'Again', 'Against', 'Agency', 'Agnew', 'Agriculture', 'Ailes', 'Air', 'Airlines', 'Airways', 'Akerfeldt', 'Akio', 'Aktiebolaget', 'Al', 'Ala', 'Alan', 'Albany', 'Albert', 'Albuquerque', 'Alexander', 'Alfred', 'All', 'Alleghany', 'Allen', 'Allendale', 'Allergan', 'Alliance', 'Allied', 'Almost', 'Aloha', 'Along', 'Already', 'Also', 'Alstyne', 'Altair', 'Although', 'Altogether', 'Alurralde', 'Alvin', 'Always', 'Alysia', 'Alzheimer', 'Am', 'Amendment', 'America', 'American', 'Americana', 'Americans', 'Ames', 'Amin', 'Among', 'Amsterdam', 'An', 'Ana', 'Analysts', 'Ancient', 'And', 'Andean', 'Anderson', 'Andersson', 'Andrea', 'Andrew', 'Andy', 'Angeles', 'Angelo', 'Angels', 'Angier', 'Anglia', 'Anglian', 'Angola', 'Anku', 'Ann', 'Anne', 'Annualized', 'Another', 'Anthony', 'Antinori', 'Antitrust', 'Antonio', 'Any', 'Anything', 'Appeals', 'Appellate', 'Apple', 'Appropriations', 'April', 'Aptitude', 'Aquino', 'Arabia', 'Arabian', 'Arafat', 'Arbitrage', 'Arbitraging', 'Areas', 'Argentina', 'Argentine', 'Ariail', 'Arighi', 'Arizona', 'Ark', 'Arlington', 'Army', 'Arnold', 'Arraignments', 'Arthur', 'Article', 'Articles', 'Artist', 'As', 'Asada', 'Asher', 'Asia', 'Asian', 'Asians', 'Asked', 'Aslacton', 'Assets', 'Assistant', 'Associates', 'Association', 'Assuming', 'Assurance', 'At', 'Atlanta', 'Atlantic', 'Atsushi', 'Attorney', 'Attorneys', 'Attwood', 'Auctions', 'Audit', 'Auditors', 'August', 'Aurora', 'Austin', 'Australia', 'Australian', 'Austria', 'Austrian', 'Authority', 'Automobile', 'Avenue', 'Average', 'Avon', 'Avrett', 'B', 'BALLOT', 'BANKERS', 'BILLS', 'BIRDS', 'BMP', 'BRAMALEA', 'BRIEFS', 'BTR', 'Babcock', 'Back', 'Backe', 'Backer', 'Backseat', 'Bailey', 'Baim', 'Baker', 'Baking', 'Baldwin', 'Ballot', 'Baltimore', 'Bancorp', 'Bangkok', 'Bank', 'Bankers', 'Banking', 'Bankruptcy', 'Banks', 'Banque', 'Bar', 'Barbados', 'Barbara', 'Barbaresco', 'Barclays', 'Barfield', 'Barge', 'Baris', 'Barnett', 'Barney', 'Barnum', 'Barrels', 'Barrett', 'Barron', 'Barth', 'Basham', 'Basic', 'Basin', 'Bass', 'Bates', 'Baton', 'Baum', 'Beach', 'Beall', 'Bears', 'Beatles', 'Beauty', 'Because', 'Before', 'Beginning', 'Behind', 'Beige', 'Beijing', 'Being', 'Beirut', 'Bell', 'Bellows', 'Bells', 'Belt', 'Bendectin', 'Bennett', 'Bentsen', 'Berger', 'Berlin', 'Berliner', 'Berman', 'Bermuda', 'Bernstein', 'Berson', 'Besides', 'Beta', 'Beth', 'Bethlehem', 'Betting', 'Beverly', 'Bew', 'Bhutto', 'Biaggi', 'Biedermann', 'Big', 'Bill', 'Billings', 'Birmingham', 'Biscayne', 'Black', 'Blackstone', 'Blanc', 'Blanchard', 'Blancs', 'Blue', 'Blunt', 'Board', 'Bob', 'Boca', 'Bodner', 'Boeing', 'Boesel', 'Bolduc', 'Bolivia', 'Bon', 'Bond', 'Bonds', 'Bonnell', 'Book', 'Boone', 'Boorse', 'Bordeaux', 'Borge', 'Borough', 'Boston', 'Both', 'Bougainville', 'Boulder', 'Bowery', 'Bowes', 'Bowl', 'Bowman', 'Brace', 'Bradford', 'Bradley', 'Brady', 'Braidwood', 'Bramalea', 'Brands', 'Braun', 'Brazil', 'Brazilian', 'Breakey', 'Brean', 'Breeden', 'Brenda', 'Brent', 'Bretz', 'Brian', 'Bribe', 'Bricklayers', 'Bridge', 'Bridges', 'Bridgeville', 'Brigham', 'Brisk', 'Britain', 'British', 'Britta', 'Broadcasting', 'Broken', 'Bromwich', 'Bronces', 'Bronx', 'Brooke', 'Brooklyn', 'Brooks', 'Brothers', 'Brown', 'Brownell', 'Brownstein', 'Brunei', 'Brunello', 'Brunswick', 'Brussels', 'Buckhead', 'Bucking', 'Budget', 'Bugs', 'Buick', 'Builders', 'Bumkins', 'Bund', 'Bunny', 'Bureau', 'Burgundies', 'Burgundy', 'Burnham', 'Burt', 'Bush', 'Business', 'But', 'Butler', 'Buy', 'Buyers', 'Buying', 'By', 'Byron', 'CALL', 'CAMPAIGN', 'CAT', 'CBS', 'CDC', 'CDs', 'CEO', 'CEOs', 'CERTIFICATES', 'CHANGED', 'CIA', 'CLEARS', 'COLLECTING', 'COMMERCIAL', 'COMMUNICATIONS', 'COMPUTERS', 'COPPER', 'CORP', 'CS', 'CSV', 'CTB', 'CTBS', 'Cab', 'Cabbage', 'Cabernet', 'Cabernets', 'Calder', 'Caldor', 'Calif', 'California', 'Californian', 'Cambridge', 'Camille', 'Camilli', 'Campaign', 'Campbell', 'Campeau', 'Can', 'Canada', 'Canadian', 'Cancer', 'Candela', 'Candlestick', 'Canepa', 'Cannell', 'Capital', 'Capitol', 'Cara', 'Carballo', 'Carbide', 'Card', 'Care', 'Caribbean', 'Carl', 'Carla', 'Carlos', 'Carlton', 'Carney', 'Carnival', 'Carolina', 'Carrier', 'Carson', 'Carter', 'Cartons', 'Cartoonist', 'Cask', 'Cataracts', 'Caters', 'Cathedral', 'Catholic', 'Cathryn', 'Cedric', 'Cellar', 'Cellars', 'Center', 'Centerbank', 'Centers', 'Central', 'Century', 'Cerf', 'Certainly', 'Chabrol', 'Chadha', 'Chafic', 'Chairman', 'Chamber', 'Champagne', 'Champagnes', 'Champion', 'Chandler', 'Chaplin', 'Chapman', 'Chapter', 'Characters', 'Chardonnay', 'Chardonnays', 'Charities', 'Charles', 'Charlie', 'Charlotte', 'Charter', 'Chase', 'Chateau', 'Chatsworth', 'Cheese', 'Cheetham', 'ChemPlus', 'Chemical', 'Chevrolet', 'Chicago', 'Child', 'Chile', 'Chilean', 'Chilver', 'China', 'Chinchon', 'Chinese', 'Chiodo', 'Chivas', 'Choose', 'Christian', 'Christie', 'Christmas', 'Christopher', 'Chrysler', 'Chuck', 'Church', 'Cigna', 'Cincinnati', 'Circle', 'Circuit', 'Circulation', 'Circulations', 'Citadel', 'Citibank', 'Citicorp', 'Citing', 'Citizen', 'Citizens', 'City', 'Civil', 'Civilization', 'Claire', 'Clairton', 'Clara', 'Clarence', 'Clark', 'Class', 'Claude', 'Clays', 'Clemens', 'Cleveland', 'Clinton', 'Clive', 'Close', 'Closes', 'Club', 'Cluff', 'Co', 'Coast', 'Coconut', 'Code', 'Coincident', 'Cole', 'Coleco', 'Coleman', 'Collector', 'College', 'Colleges', 'Collins', 'Colonsville', 'Colony', 'Colorado', 'Colorliner', 'Columbia', 'Columbus', 'Combo', 'Commerce', 'Commission', 'Commissions', 'Committee', 'Commodities', 'Commodity', 'Commodore', 'Common', 'Commonwealth', 'Communication', 'Communications', 'Communist', 'Communists', 'Community', 'Companies', 'Compare', 'Competes', 'Composer', 'Composite', 'Compound', 'Comprehensive', 'Compromises', 'Computer', 'Comtes', 'Concerned', 'Concurrent', 'Conduct', 'Confederation', 'Conference', 'Confidence', 'Confronted', 'Congress', 'Congressman', 'Coniston', 'Conn', 'Connecticut', 'Connections', 'Consent', 'Consequence', 'Consider', 'Consolidated', 'Constitution', 'Constitutional', 'Consumer', 'Containers', 'Contel', 'Continental', 'Continued', 'Continuing', 'Contra', 'Contracts', 'Contras', 'Control', 'Controls', 'Convention', 'Cooper', 'Coors', 'Copperweld', 'Corazon', 'CoreStates', 'Corn', 'Corp', 'Corporate', 'Corporations', 'Corps', 'Correll', 'Corrigan', 'Cosby', 'Cosmopolitan', 'Cote', 'Cotran', 'Cougar', 'Could', 'Council', 'Countries', 'Country', 'County', 'Court', 'Courter', 'Courts', 'Coxon', 'Craftsmen', 'Crane', 'Crash', 'Cray', 'Credit', 'Creek', 'Crew', 'Crime', 'Criminal', 'Cristal', 'Criticism', 'Critics', 'Cross', 'Crown', 'Cru', 'Crude', 'Cruise', 'Crum', 'Cullowhee', 'Cultural', 'Current', 'Currently', 'Curry', 'Customers', 'Cutrer', 'Czech', 'Czechoslovakia', 'DD', 'DDB', 'DEFENSE', 'DEPOSIT', 'DES', 'DIALING', 'DIAPER', 'DISCOUNT', 'DNA', 'DOONESBURY', 'DOT', 'DSM', 'Dahl', 'Daily', 'Daiwa', 'Dakota', 'Dakotas', 'Dale', 'Dallara', 'Dallas', 'Dan', 'Danforth', 'Daniel', 'Daniels', 'Danube', 'Danville', 'Danzig', 'Darkhorse', 'Darrell', 'Data', 'Datapoint', 'Davenport', ...]
sorted([w for w in set(text7) if '-' in w and 'index' in w])
['Stock-index', 'index-arbitrage', 'index-fund', 'index-options', 'index-related', 'stock-index']
sorted([wd for wd in set(text3) if wd.istitle() and len(wd) > 10])
['Abelmizraim', 'Allonbachuth', 'Beerlahairoi', 'Canaanitish', 'Chedorlaomer', 'Girgashites', 'Hazarmaveth', 'Hazezontamar', 'Ishmeelites', 'Jegarsahadutha', 'Jehovahjireh', 'Kirjatharba', 'Melchizedek', 'Mesopotamia', 'Peradventure', 'Philistines', 'Zaphnathpaaneah']
sorted([w for w in set(sent7) if not w.islower()])
[',', '.', '29', '61', 'Nov.', 'Pierre', 'Vinken']
sorted([t for t in set(text2) if 'cie' in t or 'cei' in t])
['ancient', 'ceiling', 'conceit', 'conceited', 'conceive', 'conscience', 'conscientious', 'conscientiously', 'deceitful', 'deceive', 'deceived', 'deceiving', 'deficiencies', 'deficiency', 'deficient', 'delicacies', 'excellencies', 'fancied', 'insufficiency', 'insufficient', 'legacies', 'perceive', 'perceived', 'perceiving', 'prescience', 'prophecies', 'receipt', 'receive', 'received', 'receiving', 'society', 'species', 'sufficient', 'sufficiently', 'undeceive', 'undeceiving']
sorted([w for w in set(text5) if 'stand' in w or len(w) < 20])
['', '!', '!!', '!!!', '!!!!', '!!!!!', '!!!!!!', '!!!!!!!', '!!!!!!!!', '!!!!!!!!!', '!!!!!!!!!!', '!!!!!!!!!!!', '!!!!!!!!!!!!!', '!!!!!!!!!!!!!!!!', '!!!!!!.', '!!!!!.', '!!!!....', '!!!.', '!!.', '!!...', '!.', '!...', '!=', '!?', '!??', '!???', '"', '"...', '"?', '"s', '#', '###', '####', '#14-19teens', '#40sPlus', '#prideIsland', '#prideisland', '#talkcity-20s', '#talkcity_adults', '$', '$$', '$27', '&', '&^', "'", "''", "'.", "'d", "'ello", "'ll", "'m", "'n'", "'re", "'s", "'ve", '(', '( o Y o )', '(((', '((((', '(((((', '((((((', '(((((((', '((((((((', '(((((((((', '((((((((((', '(((((((((((', '((((((((((((', '(((((((((((((', '((((((((((((((', '(((((((((((((((', '(((((((((((((((((', '((((((((((((((((((', '(((((..', '(*&(^', '(.', '(__I__)', ')', ')))', '))))', ')))))', ')))))))', '))))))))', ')))))))))', '))))))))))', ')))))))))))', '))))))))))))', ')))))))))))))', '))))))))))))))', ')))))))))))))))', ')))))))))))))))))', ')))))))))))))))))))', ')?', '*', '******', '*VBS*', '*WOW*', '*blush*', '*drools*', '*grins*', '*hugs*', '*smewchies*', '*sniffs*', '*spank*', '*waves*', '+', '+*+*+*+*', '++', ',', ',,', ',,,', ',,,,', ',,,,,', ',,,,,,,', ',,,,,,,,,,,', '-', '-(', '--', '-------------', '--------->', '-->', '-...)...-', '-17', '-21', '-6', '-_-', '-o', '-s', '-stay-', '.', '. .', '. . .', '. ...', '.(.', '.)', '.).', '..', '.. .', '..(..', '...', '....', '.....', '......', '.......', '........', '.........', '..........', '...........', '............', '.............', '................', '..................', '...................', '.45', '.:', '.;)', '.A.n.a.c.?.n.?.a.', '.op.', '.owner.', '/', '//', '0', '05.', '06.', '1', '1-900-anal-sex', '1.98', '1.99', '10', '100', '100%', '1012.', '1016.', '102.6', '10:49', '10th', '11', '12', '12%', '1200', '121.7', '1299', '13', '138', '14', '14-16', '147.7', '15', '16', '16.', '17', '18', '185', '18ST', '19', '1900', '1930', '1980', '1985', '1996', '1cos', '2', '2.3', '20', '20.', '2006', '20S', '20s', '21', '22', '220', '224', '23', '24', '246', '247', '25', '26', '27', '28', '280', '28147', '29', '29.88.', '295', '29803', '2:55', '2DAY', '2Pac', '2nd', '3', '30', '30.', '30.00.', '300', '31', '32', '33', '3333333', '33982', '34', '35', '36', '360', '37', '38', '39', '39.3', '396', '3:45', '3~<-..4@.', '4', '4.20', '41', '423', '43', '43.', '45', '45.5', '453', '46', '46.', '47', '47.', '49', '4:03', '5', '50', '51', '53', '55', '55%', '55.', '56', '56.', '57', '57401', '579', '59', '59%', '6', '60', '60s', '64.8', '65%', '68%', '69', '6:38', '6:41', '6:51', '6:53', '7', '70%', '700', '73%', '73042', '75', '75%', '76%', '77', '7:45', '8', '80', '8082653953', '818', '85%', '9', '9.53', '90', '92129', '92780', '93', '93%', '95953', '98.5', '98.6', '99', '99701', '99703', '9:10', ':', ':(', ':)', ':):):)', ':-(', ':-)', ':-@', ':-o', ':.', ':/', ':@', ':D', ':O', ':P', ':]', ':beer:', ':blush:', ':love:', ':o *', ':p', ':tongue:', ':|', ';', '; ..', ';)', ';-(', ';-)', ';0', ';]', ';p', '<', '<,', '<-', '<--', '<---', '<----', '<----------', '<3', "<3's", '<33', '<333', '<3333', '<33333', '<333333333', '<3333333333333333', '<33333333333333333', '<<', '<<<', '<<<<', '<<<<,', '<<<<<', '<<<<<<', '<<<<<<,', '<<<<<<<', '<<<<<<<<<<<<<<', '<empty>', '<perk>', '<~~~', '=', "='s", '=(', '=)', '=-\\', '=/', '=D', '=O', '=[', '=]', '=p', '>', '>.>', '>.>->', '>:->', '>>>', '>>>>>>>>>>', '>>>>>>>>>>>', '>>>>>>>>>>>>', '>?', '>_>', '?', '?!', '?!?!', '?!?!?', "?'", '?.', '?..', '?....', '??', '??!!', '??!?!??!', '???', '????', '?????', '??????', '???????', '????????', '?????????', '??@', '@', '@$$', "@-,'~", "@..3-,'~.", 'A', 'ABOUT', 'ACTION', 'AFK', 'AGAIN', 'AHAHH', 'AHAHHA', 'AHHAH', 'AI', 'AKDT', 'AKST', 'ALL', 'AM', 'AND', 'ANY', 'ANYONE', 'AOL.COM', 'ARE', 'AROUND', 'ASS', 'AWAY', 'Aberdeen', 'About', 'Ack', 'Actually', 'Added', 'Advisory', 'Again', 'Ah', 'Ahh', 'Ahhh', 'Ahhhh', 'Aiken', 'Alaska', 'Albany', 'Almost', 'Always', 'Amazingness', 'American', 'Americans', 'Amy', 'An', 'And', 'Any', 'Anyone', 'Anyway', 'Apocalypse', 'Apparently', 'Are', 'Ark', 'Arkansas', 'As', 'Ask', 'At', 'Average', 'Aw', 'Away', 'Aww', 'Awww', 'B', 'BE', 'BIG', 'BLONDES', 'BOOTS', 'BOOTY', 'BOY', 'BUT', 'BUt', 'BYE', 'Back', 'Barbieee', 'Barometer', 'Beach', 'Because', 'Been', 'Ben', 'Benjamin', 'Better', 'Bible', 'Biiiiiitch', 'Biographys', 'Birdgang', 'Bloooooooood', 'Bloooooooooood', 'Bloooooooooooood', 'Bone', 'Bonus', 'Books', 'Boone', 'Booyah', 'Borat', 'Born', 'Box', 'Boyz', 'Break', 'Breaking', 'Broken', 'Bud', 'Burger', 'But', 'Bwhaha', 'Bye', 'C', 'CA', 'CALI', 'CAN', 'CAPS', 'CDT', 'CHAT', 'CHATHIDE', 'CHIPS', 'CHOCO', 'CO', 'COM', 'COME', 'CSI', 'CST', 'CT', 'CUZ', 'California', 'Came', 'Can', 'CanEhda', 'Cardinals', 'Cardnials', 'Cards', 'Care', 'Carolina', 'Catterick', 'Ceiling', 'Chamillionaire', 'Change', 'Changing', 'Chat', 'Check', 'Checked', 'Cheeeez', 'Chica', 'Chickens', 'Children', 'China', 'Chingy', 'Chop', 'Chris', 'Christianity', 'Ciara', 'City', 'Cleveland', 'Clock', 'Coincidence', 'Come', 'Compliments', 'Connected', 'Connecticutt', 'Considerably', 'Constitution', 'Cookies', 'Cool', 'Could', 'Course', 'Covered', 'Cradle', 'Craig', 'Crazy', 'Cream', 'Cry', 'Ct', 'Ctrl', 'Cum', 'Current', 'Cute', 'Cyber', 'D', 'DAMN', 'DAamn', 'DELIGHTFUL', 'DETROIT', 'DING', 'DIRTY', 'DJ', 'DO', 'DOES', 'DOING', 'DON', 'DONT', 'DOWNS', 'DVD', 'Dakota', 'Damn', 'Dang', 'Daniel', 'Daveeee', 'David', 'Dawn', 'Dawnstar', 'Days', 'Death', 'Deep', 'Define', 'Denver', 'Depends', 'Devil', 'Dew', 'Diary', 'Did', 'Diego', 'Dipset', 'Dixie', 'Do', 'Does', 'Doing', 'Dokken', 'Dolls', 'Dood', 'Down', 'Downy', 'Dr', 'Dr.', 'Dreams', 'Drew', 'Drive', 'Drop', 'Dude', 'Dustin', 'Dying', 'ELSE', 'ENOUGH', 'EST', 'EVEN', 'EVERYTHING', 'Earth', 'Easily', 'Eastern', 'Eddie', 'Edgewood', 'Eggs', 'Elev', 'Elle', 'End', 'Eticket', 'Evanescence', 'Even', 'Everyone', 'Everytime', 'Evil', 'Eyes', 'F', 'F5', 'FACE', 'FEMALE', 'FF', 'FINE', 'FL', 'FOLKS', 'FROM', 'Fade', 'Fails', 'Fairbanks', 'Favorite', 'Females', 'Fergalicious', 'Fergie', 'Fetish', 'Fighting', 'Figured', 'Filth', 'Finally', 'Finding', 'Finger', 'Fingers', 'First', 'Fisher', 'Fishers', 'Fix', 'Fixed', 'Flames', 'Flatts', 'Florida', 'Foley', 'Food', 'For', 'Fort', 'Foxwoods', 'FreesBee', 'Friday', 'From', 'Froogle', 'G', 'G-Mobile', 'GA', 'GIRL', 'GIRLS', 'GN', 'GNG', 'GOING', 'GOOD', 'GUYS', 'Gay', 'Geographic', 'Get', 'Ghetto', 'Girl', 'Go', 'God', 'Good', 'Gorda', 'Gosh', 'Gothic', 'Gracemont', 'Great', 'Greetings', 'GrlZ', 'Groups', 'Gs', 'Guess', 'Guy', 'H', 'H0rny', 'HAHA', 'HAHAHA', 'HALO', 'HAVE', 'HE', 'HELLO', 'HERE', 'HEY', 'HI', 'HOT', 'HOTT', 'HOW', 'HUGE', 'HUH', 'Ha', 'Haha', 'Hahaaaa', 'Hahhaa', 'Hail', 'Hallo', 'Hand', 'Hard', 'Harmony', 'Have', 'Hay', 'He', 'Hello', 'Help', 'Her', 'Here', 'Hero', 'Hey', 'Heya', 'Heys', 'Heyy', 'Heyyy', 'Heyyyyyyy', 'Hi', 'High', 'Highway', 'Hill', 'History', 'Hiya', 'Hmm', 'Hold', 'Holla', 'Holland', 'HolocaustYourMom', 'Holy', 'Home', 'Horace', 'Hott', 'How', 'Howdy', 'Hug', 'Hughes', 'Hugs', 'Huh', 'Humidity', 'Hummmm', 'Hungry', 'I', 'ID', 'IF', 'II', 'IL', 'IM', 'IN', 'INTERESTING', 'IRC', 'IS', 'IT', 'Ico', 'Id', 'If', 'Im', 'Ima', 'Images', 'In', 'In.', 'Indeed', 'Indiantown', 'Iowa', 'Is', 'It', 'Its', 'Ive', 'JESUS', 'JOIN', 'JRZ', 'JTo', 'JUST', 'Jam', 'James', 'Jane', 'Jason', 'Jayse', 'Jerketts', 'Jess', 'Jesus', 'Jeter', 'Joe', 'Joey', 'John', 'Jon', 'Jones', 'Jonesboro', 'Jordison', 'Joshy', 'Judy', 'Just', 'Justin', 'K', 'K-Fed', 'KNOW', 'Kansas', 'Kellogs', 'Kent', 'Kentucky', 'Kewl', 'Kick', 'Kids', 'King', 'Kiss', 'Kittie', 'KoOL', 'Kold', 'LA', 'LATE', 'LATER', 'LAst', 'LIVE', 'LIX', 'LMAO', 'LOL', 'LOLOLOLLL', 'LONG', 'LONLEY', 'LOUD', 'LOUDER', 'LOVES', 'LOl', 'LPN', 'Ladies', 'Laguna', 'Lampert', 'Last', 'Laters', 'Lay', 'Lee', 'Leeches', 'Length', 'Let', 'Lets', 'Liam', 'Lies', 'Life', 'Like', 'Lil', 'Lime', 'Lion', 'Lithium', 'Little', 'Live', 'Lives', 'Living', 'Lmao', 'Lmfao', 'LoL', 'LoVe', 'Lol', 'London', 'Long', 'Look', 'Looking', 'Lord', 'Louisville', 'Lousiana', 'Love', 'Lovely', 'LuverZ', 'M', 'MAN', 'MATCH', 'MD', 'ME', 'MISHAP', 'MODE', 'MORE', 'MOUTH', 'MP3', 'MRIs', 'MSN', 'MUAH', 'MY', 'Maidstone', 'Male', 'Man', 'Maps', 'Marlaya', 'Martian', 'Marvin', 'Mary', 'Matt', 'Max', 'Maybe', 'Me', 'Meep', 'Meh', 'Memory', 'Men', 'Mercy', 'Messaging', 'Metallica', 'Michigan', 'Midwest', 'Mine', 'Mmm', 'Mo', 'Mom', 'Money', 'Mono', 'Morgan', 'Mp3', 'Ms', 'MsUtah', 'Muahz', 'Music', 'My', 'N', 'N"T', "N'T", 'NAME', 'NC', 'NICK', 'NIght', 'NO', 'NONE', 'NOT', 'NOTICE', 'NTMN', 'NY', 'Nadda', 'Naples', 'Nashville', 'Need', 'Nevermind', 'New', 'News', 'Nice', 'Niceeee', 'Niceeeee', 'Nickelback', 'Night', 'Niters', 'No', 'None', 'Nooo', 'Nooooooooooooooo', 'Nope', 'Norah', ...]
sorted([w for w in set(text4) if 'stand' in w or len(w) == 20])
['Notwithstanding', 'misunderstand', 'misunderstanding', 'outstanding', 'stand', 'standard', 'standards', 'standing', 'standpoint', 'stands', 'understand', 'understandable', 'understanding', 'understandings', 'understands']
sorted([w for w in set(text2) if 't' in w or w.startswith('Un')])
['About', 'Affecting', 'After', 'Against', 'Almost', 'Altogether', 'Amongst', 'Another', 'Anxiety', 'Astonished', 'Astonishment', 'At', 'Austen', 'Bartlett', 'Barton', 'Bath', 'Beautifully', 'Benevolent', 'Betty', 'Between', 'Bristol', 'But', 'Cartwright', 'Certainly', 'Charlotte', 'Christian', 'Christmas', 'Conduit', 'Constantia', 'Continual', 'Conversation', 'Cottage', 'Court', 'Courtland', 'Dartford', 'Dearest', 'Determined', 'Disappointed', 'Disappointment', 'Doctor', 'Domestic', 'Dorsetshire', 'Duty', 'East', 'Easter', 'Elliott', 'Engagement', 'Esteem', 'Excellent', 'Exert', 'Exeter', 'Extend', 'Extravagance', 'Fifteen', 'Fifty', 'Fortunately', 'Frosts', 'Gentleman', 'Get', 'Gilberts', 'Greatness', 'Hamlet', 'Hitherto', 'Honiton', 'Hunters', 'Impatient', 'Infirmity', 'Instead', 'Invited', 'It', 'Just', 'Kensington', 'Last', 'Let', 'Little', 'Longstaple', 'Margaret', 'Martha', 'Master', 'Middleton', 'Middletons', 'Mistress', 'Months', 'Morton', 'Most', 'Must', 'Neither', 'Newton', 'Not', 'Nothing', 'October', 'Opportunity', 'Opposition', 'Other', 'Others', 'Parliament', 'Pity', 'Plymouth', 'Portman', 'Pratt', 'Preparation', 'Prescriptions', 'Quite', 'Rather', 'Recollecting', 'Reflection', 'Relate', 'Restless', 'Robert', 'Saturday', 'Scotland', 'Scott', 'Sensibility', 'September', 'Short', 'Sit', 'Smith', 'Somerset', 'Somersetshire', 'Something', 'Sometimes', 'St', 'Stanhill', 'Steele', 'Steeles', 'Still', 'Strange', 'Street', 'Streets', 'Supported', 'That', 'Thunderbolts', 'Truth', 'Unaccountable', 'Undoubtedly', 'Ungracious', 'Vanity', 'Wait', 'Want', 'Watched', 'Westminster', 'Westons', 'Weymouth', 'What', 'Whatever', 'Whether', 'Whitakers', 'Whitwell', 'With', 'Within', 'Without', 'Writing', 'Yet', 'abatement', 'abilities', 'ability', 'ablest', 'about', 'abridgement', 'abruptly', 'abruptness', 'absent', 'absolute', 'absolutely', 'abstracted', 'abstraction', 'abstruse', 'absurdity', 'abundantly', 'accelerate', 'accent', 'accents', 'accept', 'acceptable', 'acceptably', 'acceptance', 'accepted', 'accepting', 'accident', 'accidental', 'accidentally', 'accidently', 'accommodate', 'accommodating', 'accommodation', 'accommodations', 'accomplishment', 'accordant', 'accosted', 'account', 'accounted', 'accounts', 'accurately', 'accusation', 'accustom', 'accustomary', 'acknowledgment', 'acknowledgments', 'acquaintance', 'acquaintances', 'acquainted', 'acquisition', 'acquit', 'acquitted', 'acquitting', 'act', 'acted', 'acting', 'action', 'actions', 'active', 'acts', 'actual', 'actually', 'acute', 'acutely', 'acuteness', 'adapted', 'addition', 'additional', 'additions', 'adequate', 'adjusting', 'administer', 'administering', 'admiration', 'admit', 'admittance', 'admitted', 'admitting', 'adopt', 'adopted', 'advancement', 'advantage', 'advantageous', 'advantages', 'affability', 'affect', 'affectation', 'affected', 'affectedly', 'affecting', 'affection', 'affectionate', 'affectionately', 'affections', 'affects', 'affirmative', 'afflict', 'afflicted', 'afflicting', 'affliction', 'afflictions', 'affront', 'affronting', 'after', 'afternoon', 'afterward', 'afterwards', 'against', 'aggrandizement', 'aggravation', 'agitate', 'agitated', 'agitation', 'agreement', 'ailment', 'ailments', 'alacrity', 'alienated', 'alighted', 'alleviation', 'almost', 'alphabet', 'altar', 'alter', 'alteration', 'alterations', 'altered', 'altering', 'alternately', 'alternative', 'although', 'altogether', 'amazement', 'ambition', 'amendment', 'amidst', 'amongst', 'amount', 'amounted', 'amusement', 'amusements', 'ancient', 'animated', 'animating', 'animation', 'annihilation', 'annuities', 'annuity', 'another', 'anticipate', 'anticipated', 'anticipating', 'anticipation', 'anticipations', 'anxiety', 'anything', 'apartment', 'apartments', 'apothecary', 'apparent', 'apparently', 'appetite', 'appetites', 'application', 'appointed', 'appointment', 'approbation', 'appropriate', 'apricot', 'aptitude', 'ardent', 'argument', 'arguments', 'arrangement', 'arrangements', 'art', 'artful', 'article', 'articulate', 'articulation', 'artificial', 'artless', 'artlessness', 'ascertain', 'ascertained', 'aspect', 'assent', 'assented', 'asserted', 'assertion', 'assertions', 'assiduities', 'assist', 'assistance', 'assisted', 'assisting', 'associate', 'associating', 'astonished', 'astonishing', 'astonishment', 'astray', 'at', 'ate', 'atmosphere', 'atone', 'atoned', 'atonement', 'atoning', 'attach', 'attached', 'attaching', 'attachment', 'attachments', 'attack', 'attacked', 'attacks', 'attainable', 'attained', 'attempt', 'attempted', 'attempting', 'attempts', 'attend', 'attendance', 'attendant', 'attendants', 'attended', 'attending', 'attention', 'attentions', 'attentive', 'attentively', 'attitude', 'attract', 'attracted', 'attraction', 'attractions', 'attractive', 'attribute', 'attributed', 'attributing', 'audacity', 'auditors', 'augment', 'augmented', 'augmenting', 'aunt', 'author', 'authorised', 'authority', 'authors', 'autumn', 'await', 'awaited', 'banditti', 'basket', 'bathed', 'beasts', 'beat', 'beauties', 'beautiful', 'beautifully', 'beauty', 'beneath', 'benefactress', 'benefit', 'benefited', 'benignant', 'bent', 'bequeath', 'bequeathed', 'bequest', 'beset', 'best', 'bestow', 'bestowed', 'bestowing', 'betray', 'betrayed', 'betraying', 'better', 'between', 'bewitching', 'birth', 'bit', 'bitch', 'bitter', 'bitterly', 'bitterness', 'blackest', 'blast', 'blasted', 'blights', 'blunt', 'bluntly', 'bluster', 'boast', 'boasting', 'boisterous', 'bonnet', 'both', 'bottom', 'bottoms', 'bout', 'breakfast', 'breakfasting', 'breast', 'breath', 'breathing', 'bright', 'brightened', 'brighter', 'brightness', 'brilliant', 'brother', 'brothers', 'brought', 'built', 'burnt', 'burst', 'bursts', 'bustle', 'but', 'butcher', 'calculate', 'calculated', 'calculation', 'candlelight', 'cannot', 'cant', 'capability', 'captivate', 'captivating', 'caricature', 'carpet', 'casement', 'cast', 'casts', 'catch', 'catching', 'cats', 'caught', 'caution', 'cautious', 'cautiously', 'cautiousness', 'celebrated', 'celebration', 'centered', 'centre', 'certain', 'certainly', 'certainties', 'certainty', 'cessation', 'character', 'characters', 'chariot', 'charity', 'chat', 'chatty', 'cheat', 'cheated', 'circuit', 'circumspection', 'circumstance', 'circumstanced', 'circumstances', 'city', 'civilities', 'civility', 'climate', 'closet', 'clothes', 'coat', 'coats', 'collation', 'collect', 'collected', 'collecting', 'collection', 'combat', 'comfort', 'comfortable', 'comfortably', 'comforted', 'comforter', 'comforts', 'commendation', 'comment', 'comments', 'commiseration', 'commonest', 'communicate', 'communicated', 'communicating', 'communication', 'communicative', 'compact', 'comparative', 'comparatively', 'compassionate', 'competence', 'complaint', 'complaints', 'complete', 'completed', 'completely', 'completion', 'compliant', 'complicated', 'compliment', 'compliments', 'composition', 'compunction', 'concealment', 'conceit', 'conceited', 'concerto', 'conciliate', 'conciliation', 'condemnation', 'condition', 'conditioned', 'conditions', 'conduct', 'conducted', 'confidant', 'confidante', 'confident', 'confidential', 'confinement', 'confirmation', 'conformity', 'congratulate', 'congratulated', 'congratulating', 'congratulations', 'conjectural', 'conjecture', 'conjectured', 'conjectures', 'conjecturing', 'connect', 'connected', 'connection', 'connections', 'conquest', 'conquests', 'conscientious', 'conscientiously', 'consent', 'consented', 'consequent', 'consequently', 'considerate', 'considerately', 'consideration', 'considerations', 'consisted', 'consistency', 'consistent', 'consists', 'consolation', 'constancy', 'constant', 'constantly', 'consternation', 'constitution', 'constitutional', 'constrained', 'construction', 'consult', 'consultation', 'consulted', 'consumption', 'contained', 'containing', 'contempt', 'contemptible', 'contemptuous', 'contemptuously', 'contend', 'content', 'contented', 'contenting', 'contentment', 'contents', 'continual', 'continually', 'continuance', 'continuation', 'continue', 'continued', 'continuing', 'contracted', 'contraction', 'contradict', 'contradicted', 'contradictory', 'contrary', 'contrast', 'contrasted', 'contribute', 'contributed', 'contributing', 'contribution', 'contrition', 'contrivance', 'contrived', 'contrives', 'contriving', 'control', 'controlled', 'convenient', 'conveniently', 'conversation', 'conversations', 'convert', 'conviction', 'cordiality', 'correct', 'correctness', 'cost', 'cote', 'cottage', 'cottages', 'countenance', 'counter', 'counteract', 'counteracted', 'countless', 'country', 'county', 'court', 'courted', 'courtesy', 'courting', 'covenant', 'covert', 'create', 'created', 'creating', 'creation', 'creature', 'creatures', 'credit', 'creditable', 'crept', 'critical', 'critique', 'cruelty', 'cultivated', 'curate', 'curiosity', 'curtsying', 'cut', 'cutlets', 'cutting', 'date', 'daughter', 'daughters', 'dealt', 'dearest', 'death', 'debate', 'debated', 'debating', 'debt', 'debts', 'deceitful', 'decent', 'decently', 'deception', 'declaration', 'deepest', 'defect', 'defective', 'defects', 'deficient', 'dejected', 'dejection', 'deliberate', 'deliberating', 'deliberation', 'delicate', 'delight', 'delighted', 'delightful', 'delightfully', 'delineated', 'demonstrations', 'denote', 'denoted', 'denoting', 'depart', 'departed', 'departing', 'departure', 'dependent', 'depravity', 'descendant', 'description', 'deserted', 'desertion', 'deserts', 'despatch', 'despatching', 'desperate', 'desperately', 'desperation', 'dessert', 'destination', 'destiny', 'destroy', 'destroyed', 'destroys', 'destruction', 'detail', 'detain', 'detaining', 'detected', 'detecting', 'detection', 'deter', 'determinate', 'determination', 'determine', 'determined', 'determining', 'deterred', 'detest', 'detestably', 'detested', 'detract', 'deviation', 'devoted', 'dictate', 'diction', 'different', 'differently', 'difficult', 'difficulties', 'difficulty', 'diffident', 'dignity', 'dilatoriness', 'diminution', 'direct', 'directed', 'directing', 'direction', 'directions', 'directly', 'dirt', 'dirty', 'disadvantage', 'disagreement', 'disagreements', 'disappoint', 'disappointed', 'disappointing', 'disappointment', 'disappointments', 'disapprobation', 'disastrous', 'discernment', 'discontent', 'discontented', 'discontents', 'discreet', 'discretion', 'discrimination', 'disengagement', 'disgust', 'disgusted', 'disgusting', 'disinclination', 'disinherited', 'disinterested', 'disinterestedness', 'dismounted', 'disobedient', 'dispatch', 'dispatched', 'dispatches', 'dispirited', 'disposition', 'disproportion', 'dispute', 'disputes', 'disqualifications', 'disquiet', 'disrespectfully', 'dissatisfaction', 'dissatisfied', 'dissent', 'dissented', 'dissipated', 'dissipation', 'distance', 'distant', 'distinction', 'distinguish', 'distinguished', 'distinguishing', 'distractedly', 'distress', 'distressed', 'distresses', 'distressing', 'distrust', 'distrusts', 'disturb', 'disturbed', 'diverted', 'doat', 'doatingly', 'doctrine', 'domestic', 'doted', 'doting', 'dotted', 'doubt', 'doubted', 'doubtful', 'doubting', 'doubtingly', 'doubtless', 'doubts', 'drift', 'dropt', 'duets', 'duration', 'duties', 'duty', 'dwelt', 'earliest', 'earnest', 'earnestly', 'earnestness', 'earth', 'earthly', 'easiest', 'east', 'eat', 'eating', 'ebullition', 'eclat', 'ecstasy', 'ecstatic', 'editions', 'edtions', 'education', 'effect', 'effected', 'effecting', 'effects', 'effectual', 'effectually', 'effort', 'eight', 'eighteen', 'eighty', 'either', 'ejaculation', 'eldest', 'elect', 'election', 'elegant', 'elevated', 'eligibility', 'elucidation', 'embarrassment', 'embellishment', 'embellishments', 'embitter', 'emigrant', 'emotion', 'employment', 'employments', 'emptiness', 'encouragement', 'encouragements', 'encroachments', 'enforcement', 'enfranchisement', 'engagement', 'engagements', 'enjoyment', 'enjoyments', 'enlargement', 'enlightened', 'entanglement', 'enter', 'entered', 'entering', 'entertain', 'entertained', 'entertainment', 'enthusiasm', 'enthusiastic', 'entire', 'entirely', 'entitled', 'entrance', 'entreat', 'entreated', 'entreaties', 'entreaty', 'entrusted', 'entry', 'enumeration', 'equality', 'essential', 'establish', 'established', 'establishing', 'establishment', 'estate', 'esteem', 'esteemed', 'esteeming', 'esteems', 'estimable', 'estimate', 'estimating', 'estimation', 'estranged', 'eternal', 'etiquette', 'event', 'events', 'eventually', 'everlasting', 'everything', 'evident', 'evidently', 'exact', 'exactly', 'exactness', 'examination', 'excellent', 'except', 'excepting', 'excite', 'excited', ...]
sorted([w for w in set(text2) if 'to' in w or w.startswith('Un')])
['Altogether', 'Astonished', 'Astonishment', 'Barton', 'Bristol', 'Doctor', 'Hitherto', 'Honiton', 'Kensington', 'Middleton', 'Middletons', 'Morton', 'Newton', 'October', 'Unaccountable', 'Undoubtedly', 'Ungracious', 'Westons', 'accustom', 'accustomary', 'altogether', 'astonished', 'astonishing', 'astonishment', 'atone', 'atoned', 'atonement', 'atoning', 'auditors', 'bestow', 'bestowed', 'bestowing', 'bottom', 'bottoms', 'concerto', 'contradictory', 'dilatoriness', 'explanatory', 'extolling', 'extort', 'extorted', 'extorting', 'foretold', 'history', 'hitherto', 'inheritor', 'into', 'intoxication', 'memento', 'meritorious', 'misunderstood', 'mosquitoes', 'mutton', 'narrator', 'necessitous', 'orator', 'partook', 'promontories', 'proprietor', 'rectory', 'restorative', 'restore', 'restored', 'restoring', 'retort', 'satisfactory', 'solicitous', 'stock', 'stockings', 'stocks', 'stole', 'stolen', 'stomach', 'stone', 'stood', 'stop', 'stopped', 'stopping', 'stopt', 'store', 'storm', 'stormy', 'story', 'stout', 'suitor', 'symptom', 'symptoms', 'tiptoe', 'to', 'today', 'together', 'toilet', 'told', 'tolerable', 'tolerably', 'toleration', 'tomorrow', 'tone', 'toned', 'tongue', 'tonight', 'too', 'took', 'toothpick', 'topic', 'tore', 'torment', 'tormented', 'torn', 'torrent', 'torture', 'tortured', 'total', 'totally', 'touch', 'touched', 'toward', 'towards', 'tower', 'town', 'understood', 'untouched', 'visitor', 'visitors', 'wanton', 'withstood']
sorted([w for w in set(text2) if 'stand' in w or w.startswith('Un')])
['Unaccountable', 'Undoubtedly', 'Ungracious', 'stand', 'standard', 'standing', 'understand', 'understanding']
sorted([w for w in set(text2) if 'stand' in w and w.startswith('Un')])
[]
sorted([w for w in set(text8) if len(w) != 17 and w.endswith('be')])
['Maybe', 'be', 'maybe']
sorted([w for w in set(text8) if len(w) != 2 and w.endswith('be')])
['Maybe', 'maybe']
[len(w) for w in text1]
[1, 4, 4, 2, 6, 8, 4, 1, 9, 1, 1, 8, 2, 1, 4, 11, 5, 2, 1, 7, 6, 1, 3, 4, 5, 2, 10, 2, 4, 1, 5, 1, 4, 1, 3, 5, 1, 1, 3, 3, 3, 1, 2, 3, 4, 7, 3, 3, 8, 3, 8, 1, 4, 1, 5, 12, 1, 9, 11, 4, 3, 3, 3, 5, 2, 3, 3, 5, 7, 2, 3, 5, 1, 2, 5, 2, 4, 3, 3, 8, 1, 2, 7, 6, 8, 3, 2, 3, 9, 1, 1, 5, 3, 4, 2, 4, 2, 6, 6, 1, 3, 2, 5, 4, 2, 4, 4, 1, 5, 1, 4, 2, 2, 2, 6, 2, 3, 6, 7, 3, 1, 7, 9, 1, 3, 6, 1, 1, 5, 6, 5, 6, 3, 13, 2, 3, 4, 1, 3, 7, 4, 5, 2, 3, 4, 2, 2, 8, 1, 5, 1, 3, 2, 1, 3, 3, 1, 4, 1, 4, 6, 2, 5, 4, 9, 2, 7, 1, 3, 2, 3, 1, 5, 2, 6, 2, 7, 2, 2, 7, 1, 1, 10, 1, 5, 1, 3, 2, 2, 4, 11, 4, 3, 3, 1, 3, 3, 1, 6, 1, 1, 1, 1, 1, 4, 1, 3, 1, 2, 4, 1, 2, 6, 2, 2, 10, 1, 1, 10, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 6, 1, 3, 1, 5, 1, 4, 1, 7, 1, 5, 1, 9, 1, 5, 1, 7, 1, 7, 1, 6, 1, 7, 1, 7, 1, 5, 1, 4, 1, 4, 1, 5, 1, 5, 1, 4, 1, 4, 1, 11, 1, 8, 1, 8, 2, 1, 3, 1, 3, 1, 9, 2, 2, 4, 2, 4, 4, 4, 4, 11, 8, 3, 4, 1, 4, 2, 1, 4, 5, 2, 1, 3, 1, 3, 7, 2, 4, 4, 7, 3, 4, 8, 3, 6, 1, 6, 2, 3, 5, 1, 7, 2, 8, 6, 9, 2, 6, 2, 5, 7, 4, 2, 3, 4, 10, 1, 6, 2, 7, 1, 9, 3, 4, 3, 1, 2, 5, 4, 2, 5, 1, 4, 3, 8, 1, 8, 5, 10, 1, 7, 9, 1, 2, 5, 8, 1, 3, 9, 6, 8, 1, 3, 4, 2, 1, 2, 8, 3, 7, 7, 9, 1, 2, 4, 2, 3, 5, 4, 9, 1, 5, 8, 3, 6, 8, 2, 12, 1, 2, 9, 1, 8, 4, 1, 1, 3, 4, 2, 4, 3, 4, 13, 4, 1, 7, 1, 7, 1, 3, 4, 2, 9, 1, 2, 4, 7, 3, 11, 1, 9, 3, 3, 1, 2, 4, 4, 4, 1, 4, 5, 2, 1, 3, 1, 3, 1, 5, 11, 1, 2, 1, 4, 9, 2, 4, 8, 1, 6, 5, 5, 2, 4, 2, 4, 5, 4, 4, 4, 1, 3, 3, 4, 4, 4, 6, 5, 2, 3, 4, 1, 6, 1, 3, 4, 4, 3, 9, 5, 2, 3, 1, 3, 4, 4, 1, 8, 1, 3, 1, 3, 4, 9, 4, 5, 1, 3, 3, 2, 4, 7, 1, 4, 4, 4, 3, 5, 7, 1, 3, 2, 3, 10, 10, 7, 2, 4, 2, 2, 1, 3, 1, 4, 1, 3, 2, 3, 4, 3, 4, 5, 2, 4, 2, 6, 3, 5, 1, 2, 2, 4, 3, 4, 5, 2, 3, 4, 2, 9, 1, 5, 4, 1, 5, 5, 3, 7, 5, 3, 3, 9, 3, 2, 1, 3, 4, 4, 4, 5, 3, 3, 5, 2, 3, 5, 1, 4, 4, 4, 6, 1, 3, 4, 7, 3, 4, 4, 6, 3, 8, 3, 3, 5, 1, 7, 7, 1, 3, 6, 8, 2, 4, 1, 8, 7, 1, 7, 1, 3, 7, 1, 7, 4, 6, 1, 4, 2, 6, 3, 10, 6, 8, 2, 5, 1, 2, 5, 6, 14, 7, 1, 8, 1, 1, 3, 3, 7, 5, 6, 2, 2, 7, 1, 1, 9, 6, 1, 4, 2, 5, 5, 3, 1, 3, 5, 5, 3, 4, 2, 2, 5, 2, 2, 3, 1, 1, 3, 3, 4, 3, 8, 1, 5, 4, 2, 7, 2, 5, 2, 2, 5, 1, 1, 5, 2, 3, 5, 1, 5, 2, 4, 9, 4, 4, 4, 4, 2, 4, 7, 2, 2, 6, 1, 1, 2, 4, 3, 1, 3, 4, 4, 3, 4, 1, 3, 5, 1, 3, 6, 5, 1, 5, 6, 9, 3, 8, 7, 1, 4, 9, 4, 7, 7, 1, 3, 2, 5, 4, 3, 6, 4, 2, 2, 3, 3, 2, 2, 6, 1, 3, 4, 5, 6, 7, 6, 6, 3, 5, 2, 4, 7, 1, 1, 5, 1, 2, 2, 5, 1, 4, 1, 2, 5, 1, 4, 2, 4, 3, 13, 4, 4, 5, 7, 2, 3, 1, 3, 9, 2, 3, 10, 4, 2, 3, 6, 2, 2, 7, 1, 1, 8, 1, 1, 6, 1, 1, 3, 6, 3, 8, 3, 4, 3, 3, 7, 6, 4, 3, 1, 5, 5, 3, 6, 3, 11, 6, 7, 1, 4, 2, 2, 4, 2, 6, 2, 4, 5, 2, 6, 2, 4, 2, 2, 7, 1, 1, 5, 1, 1, 8, 3, 2, 9, 3, 4, 2, 3, 3, 1, 4, 5, 7, 1, 5, 4, 6, 3, 5, 8, 2, 3, 3, 1, 8, 1, 5, 3, 6, 1, 3, 3, 2, 1, 4, 9, 4, 1, 3, 4, 4, 7, 2, 1, 4, 1, 7, 1, 7, 3, 5, 2, 3, 5, 1, 3, 7, 3, 3, 6, 3, 4, 1, 4, 2, 2, 5, 1, 1, 6, 1, 1, 3, 4, 7, 2, 1, 2, 7, 4, 7, 4, 4, 1, 4, 2, 8, 5, 1, 6, 1, 5, 3, 5, 2, 4, 5, 5, 3, 5, 5, 1, 2, 5, 2, 7, 4, 2, 3, 4, 1, 3, 3, 4, 6, 4, 7, 2, 3, 3, 7, 1, 2, 5, 4, 4, 5, 1, 5, 1, 4, 5, 5, 4, 1, 2, ...]
[w.upper() for w in text1]
['[', 'MOBY', 'DICK', 'BY', 'HERMAN', 'MELVILLE', '1851', ']', 'ETYMOLOGY', '.', '(', 'SUPPLIED', 'BY', 'A', 'LATE', 'CONSUMPTIVE', 'USHER', 'TO', 'A', 'GRAMMAR', 'SCHOOL', ')', 'THE', 'PALE', 'USHER', '--', 'THREADBARE', 'IN', 'COAT', ',', 'HEART', ',', 'BODY', ',', 'AND', 'BRAIN', ';', 'I', 'SEE', 'HIM', 'NOW', '.', 'HE', 'WAS', 'EVER', 'DUSTING', 'HIS', 'OLD', 'LEXICONS', 'AND', 'GRAMMARS', ',', 'WITH', 'A', 'QUEER', 'HANDKERCHIEF', ',', 'MOCKINGLY', 'EMBELLISHED', 'WITH', 'ALL', 'THE', 'GAY', 'FLAGS', 'OF', 'ALL', 'THE', 'KNOWN', 'NATIONS', 'OF', 'THE', 'WORLD', '.', 'HE', 'LOVED', 'TO', 'DUST', 'HIS', 'OLD', 'GRAMMARS', ';', 'IT', 'SOMEHOW', 'MILDLY', 'REMINDED', 'HIM', 'OF', 'HIS', 'MORTALITY', '.', '"', 'WHILE', 'YOU', 'TAKE', 'IN', 'HAND', 'TO', 'SCHOOL', 'OTHERS', ',', 'AND', 'TO', 'TEACH', 'THEM', 'BY', 'WHAT', 'NAME', 'A', 'WHALE', '-', 'FISH', 'IS', 'TO', 'BE', 'CALLED', 'IN', 'OUR', 'TONGUE', 'LEAVING', 'OUT', ',', 'THROUGH', 'IGNORANCE', ',', 'THE', 'LETTER', 'H', ',', 'WHICH', 'ALMOST', 'ALONE', 'MAKETH', 'THE', 'SIGNIFICATION', 'OF', 'THE', 'WORD', ',', 'YOU', 'DELIVER', 'THAT', 'WHICH', 'IS', 'NOT', 'TRUE', '."', '--', 'HACKLUYT', '"', 'WHALE', '.', '...', 'SW', '.', 'AND', 'DAN', '.', 'HVAL', '.', 'THIS', 'ANIMAL', 'IS', 'NAMED', 'FROM', 'ROUNDNESS', 'OR', 'ROLLING', ';', 'FOR', 'IN', 'DAN', '.', 'HVALT', 'IS', 'ARCHED', 'OR', 'VAULTED', '."', '--', 'WEBSTER', "'", 'S', 'DICTIONARY', '"', 'WHALE', '.', '...', 'IT', 'IS', 'MORE', 'IMMEDIATELY', 'FROM', 'THE', 'DUT', '.', 'AND', 'GER', '.', 'WALLEN', ';', 'A', '.', 'S', '.', 'WALW', '-', 'IAN', ',', 'TO', 'ROLL', ',', 'TO', 'WALLOW', '."', '--', 'RICHARDSON', "'", 'S', 'DICTIONARY', 'KETOS', ',', 'GREEK', '.', 'CETUS', ',', 'LATIN', '.', 'WHOEL', ',', 'ANGLO', '-', 'SAXON', '.', 'HVALT', ',', 'DANISH', '.', 'WAL', ',', 'DUTCH', '.', 'HWAL', ',', 'SWEDISH', '.', 'WHALE', ',', 'ICELANDIC', '.', 'WHALE', ',', 'ENGLISH', '.', 'BALEINE', ',', 'FRENCH', '.', 'BALLENA', ',', 'SPANISH', '.', 'PEKEE', '-', 'NUEE', '-', 'NUEE', ',', 'FEGEE', '.', 'PEKEE', '-', 'NUEE', '-', 'NUEE', ',', 'ERROMANGOAN', '.', 'EXTRACTS', '(', 'SUPPLIED', 'BY', 'A', 'SUB', '-', 'SUB', '-', 'LIBRARIAN', ').', 'IT', 'WILL', 'BE', 'SEEN', 'THAT', 'THIS', 'MERE', 'PAINSTAKING', 'BURROWER', 'AND', 'GRUB', '-', 'WORM', 'OF', 'A', 'POOR', 'DEVIL', 'OF', 'A', 'SUB', '-', 'SUB', 'APPEARS', 'TO', 'HAVE', 'GONE', 'THROUGH', 'THE', 'LONG', 'VATICANS', 'AND', 'STREET', '-', 'STALLS', 'OF', 'THE', 'EARTH', ',', 'PICKING', 'UP', 'WHATEVER', 'RANDOM', 'ALLUSIONS', 'TO', 'WHALES', 'HE', 'COULD', 'ANYWAYS', 'FIND', 'IN', 'ANY', 'BOOK', 'WHATSOEVER', ',', 'SACRED', 'OR', 'PROFANE', '.', 'THEREFORE', 'YOU', 'MUST', 'NOT', ',', 'IN', 'EVERY', 'CASE', 'AT', 'LEAST', ',', 'TAKE', 'THE', 'HIGGLEDY', '-', 'PIGGLEDY', 'WHALE', 'STATEMENTS', ',', 'HOWEVER', 'AUTHENTIC', ',', 'IN', 'THESE', 'EXTRACTS', ',', 'FOR', 'VERITABLE', 'GOSPEL', 'CETOLOGY', '.', 'FAR', 'FROM', 'IT', '.', 'AS', 'TOUCHING', 'THE', 'ANCIENT', 'AUTHORS', 'GENERALLY', ',', 'AS', 'WELL', 'AS', 'THE', 'POETS', 'HERE', 'APPEARING', ',', 'THESE', 'EXTRACTS', 'ARE', 'SOLELY', 'VALUABLE', 'OR', 'ENTERTAINING', ',', 'AS', 'AFFORDING', 'A', 'GLANCING', 'BIRD', "'", 'S', 'EYE', 'VIEW', 'OF', 'WHAT', 'HAS', 'BEEN', 'PROMISCUOUSLY', 'SAID', ',', 'THOUGHT', ',', 'FANCIED', ',', 'AND', 'SUNG', 'OF', 'LEVIATHAN', ',', 'BY', 'MANY', 'NATIONS', 'AND', 'GENERATIONS', ',', 'INCLUDING', 'OUR', 'OWN', '.', 'SO', 'FARE', 'THEE', 'WELL', ',', 'POOR', 'DEVIL', 'OF', 'A', 'SUB', '-', 'SUB', ',', 'WHOSE', 'COMMENTATOR', 'I', 'AM', '.', 'THOU', 'BELONGEST', 'TO', 'THAT', 'HOPELESS', ',', 'SALLOW', 'TRIBE', 'WHICH', 'NO', 'WINE', 'OF', 'THIS', 'WORLD', 'WILL', 'EVER', 'WARM', ';', 'AND', 'FOR', 'WHOM', 'EVEN', 'PALE', 'SHERRY', 'WOULD', 'BE', 'TOO', 'ROSY', '-', 'STRONG', ';', 'BUT', 'WITH', 'WHOM', 'ONE', 'SOMETIMES', 'LOVES', 'TO', 'SIT', ',', 'AND', 'FEEL', 'POOR', '-', 'DEVILISH', ',', 'TOO', ';', 'AND', 'GROW', 'CONVIVIAL', 'UPON', 'TEARS', ';', 'AND', 'SAY', 'TO', 'THEM', 'BLUNTLY', ',', 'WITH', 'FULL', 'EYES', 'AND', 'EMPTY', 'GLASSES', ',', 'AND', 'IN', 'NOT', 'ALTOGETHER', 'UNPLEASANT', 'SADNESS', '--', 'GIVE', 'IT', 'UP', ',', 'SUB', '-', 'SUBS', '!', 'FOR', 'BY', 'HOW', 'MUCH', 'THE', 'MORE', 'PAINS', 'YE', 'TAKE', 'TO', 'PLEASE', 'THE', 'WORLD', ',', 'BY', 'SO', 'MUCH', 'THE', 'MORE', 'SHALL', 'YE', 'FOR', 'EVER', 'GO', 'THANKLESS', '!', 'WOULD', 'THAT', 'I', 'COULD', 'CLEAR', 'OUT', 'HAMPTON', 'COURT', 'AND', 'THE', 'TUILERIES', 'FOR', 'YE', '!', 'BUT', 'GULP', 'DOWN', 'YOUR', 'TEARS', 'AND', 'HIE', 'ALOFT', 'TO', 'THE', 'ROYAL', '-', 'MAST', 'WITH', 'YOUR', 'HEARTS', ';', 'FOR', 'YOUR', 'FRIENDS', 'WHO', 'HAVE', 'GONE', 'BEFORE', 'ARE', 'CLEARING', 'OUT', 'THE', 'SEVEN', '-', 'STORIED', 'HEAVENS', ',', 'AND', 'MAKING', 'REFUGEES', 'OF', 'LONG', '-', 'PAMPERED', 'GABRIEL', ',', 'MICHAEL', ',', 'AND', 'RAPHAEL', ',', 'AGAINST', 'YOUR', 'COMING', '.', 'HERE', 'YE', 'STRIKE', 'BUT', 'SPLINTERED', 'HEARTS', 'TOGETHER', '--', 'THERE', ',', 'YE', 'SHALL', 'STRIKE', 'UNSPLINTERABLE', 'GLASSES', '!', 'EXTRACTS', '.', '"', 'AND', 'GOD', 'CREATED', 'GREAT', 'WHALES', '."', '--', 'GENESIS', '.', '"', 'LEVIATHAN', 'MAKETH', 'A', 'PATH', 'TO', 'SHINE', 'AFTER', 'HIM', ';', 'ONE', 'WOULD', 'THINK', 'THE', 'DEEP', 'TO', 'BE', 'HOARY', '."', '--', 'JOB', '.', '"', 'NOW', 'THE', 'LORD', 'HAD', 'PREPARED', 'A', 'GREAT', 'FISH', 'TO', 'SWALLOW', 'UP', 'JONAH', '."', '--', 'JONAH', '.', '"', 'THERE', 'GO', 'THE', 'SHIPS', ';', 'THERE', 'IS', 'THAT', 'LEVIATHAN', 'WHOM', 'THOU', 'HAST', 'MADE', 'TO', 'PLAY', 'THEREIN', '."', '--', 'PSALMS', '.', '"', 'IN', 'THAT', 'DAY', ',', 'THE', 'LORD', 'WITH', 'HIS', 'SORE', ',', 'AND', 'GREAT', ',', 'AND', 'STRONG', 'SWORD', ',', 'SHALL', 'PUNISH', 'LEVIATHAN', 'THE', 'PIERCING', 'SERPENT', ',', 'EVEN', 'LEVIATHAN', 'THAT', 'CROOKED', 'SERPENT', ';', 'AND', 'HE', 'SHALL', 'SLAY', 'THE', 'DRAGON', 'THAT', 'IS', 'IN', 'THE', 'SEA', '."', '--', 'ISAIAH', '"', 'AND', 'WHAT', 'THING', 'SOEVER', 'BESIDES', 'COMETH', 'WITHIN', 'THE', 'CHAOS', 'OF', 'THIS', 'MONSTER', "'", 'S', 'MOUTH', ',', 'BE', 'IT', 'BEAST', ',', 'BOAT', ',', 'OR', 'STONE', ',', 'DOWN', 'IT', 'GOES', 'ALL', 'INCONTINENTLY', 'THAT', 'FOUL', 'GREAT', 'SWALLOW', 'OF', 'HIS', ',', 'AND', 'PERISHETH', 'IN', 'THE', 'BOTTOMLESS', 'GULF', 'OF', 'HIS', 'PAUNCH', '."', '--', 'HOLLAND', "'", 'S', 'PLUTARCH', "'", 'S', 'MORALS', '.', '"', 'THE', 'INDIAN', 'SEA', 'BREEDETH', 'THE', 'MOST', 'AND', 'THE', 'BIGGEST', 'FISHES', 'THAT', 'ARE', ':', 'AMONG', 'WHICH', 'THE', 'WHALES', 'AND', 'WHIRLPOOLES', 'CALLED', 'BALAENE', ',', 'TAKE', 'UP', 'AS', 'MUCH', 'IN', 'LENGTH', 'AS', 'FOUR', 'ACRES', 'OR', 'ARPENS', 'OF', 'LAND', '."', '--', 'HOLLAND', "'", 'S', 'PLINY', '.', '"', 'SCARCELY', 'HAD', 'WE', 'PROCEEDED', 'TWO', 'DAYS', 'ON', 'THE', 'SEA', ',', 'WHEN', 'ABOUT', 'SUNRISE', 'A', 'GREAT', 'MANY', 'WHALES', 'AND', 'OTHER', 'MONSTERS', 'OF', 'THE', 'SEA', ',', 'APPEARED', '.', 'AMONG', 'THE', 'FORMER', ',', 'ONE', 'WAS', 'OF', 'A', 'MOST', 'MONSTROUS', 'SIZE', '.', '...', 'THIS', 'CAME', 'TOWARDS', 'US', ',', 'OPEN', '-', 'MOUTHED', ',', 'RAISING', 'THE', 'WAVES', 'ON', 'ALL', 'SIDES', ',', 'AND', 'BEATING', 'THE', 'SEA', 'BEFORE', 'HIM', 'INTO', 'A', 'FOAM', '."', '--', 'TOOKE', "'", 'S', 'LUCIAN', '.', '"', 'THE', 'TRUE', 'HISTORY', '."', '"', 'HE', 'VISITED', 'THIS', 'COUNTRY', 'ALSO', 'WITH', 'A', 'VIEW', 'OF', 'CATCHING', 'HORSE', '-', 'WHALES', ',', 'WHICH', 'HAD', 'BONES', 'OF', 'VERY', 'GREAT', 'VALUE', 'FOR', 'THEIR', 'TEETH', ',', 'OF', 'WHICH', 'HE', 'BROUGHT', 'SOME', 'TO', 'THE', 'KING', '.', '...', 'THE', 'BEST', 'WHALES', 'WERE', 'CATCHED', 'IN', 'HIS', 'OWN', 'COUNTRY', ',', 'OF', 'WHICH', 'SOME', 'WERE', 'FORTY', '-', 'EIGHT', ',', 'SOME', 'FIFTY', 'YARDS', 'LONG', '.', 'HE', ...]
len(text1)
260819
len(set(text1))
19317
len(set([word.lower() for word in text1])) #capitalisation out
17231
len(set([word.lower() for word in text1 if word.isalpha()])) #nonalphabetic items out
16948
word = 'cat'
if len(word) < 5:
... print ('word length is less than 5')
...
word length is less than 5
if len(word) >=5:
... print ('word length is greater than or equal to 5')
...
for word in ['Call', 'me', 'Ishmael', '.']:
... print (word)
...
Call me Ishmael .
sent1 = ['Call', 'me', 'Ishmael', '.']
for xyzzy in sent1:
... if xyzzy.endswith('l'):
... print (xyzzy)
...
Call Ishmael
for token in sent1:
... if token.islower():
... print (token, 'is a lowercase word')
... elif token.istitle():
... print (token, 'is a titlecase word')
... else:
... print (token, 'is punctuation')
...
Call is a titlecase word me is a lowercase word Ishmael is a titlecase word . is punctuation
# starts with titlecase because of the loop!
tricky = sorted([w for w in set(text2) if 'cie' in w or 'cei' in w])
for word in tricky:
print (word,)
ancient ceiling conceit conceited conceive conscience conscientious conscientiously deceitful deceive deceived deceiving deficiencies deficiency deficient delicacies excellencies fancied insufficiency insufficient legacies perceive perceived perceiving prescience prophecies receipt receive received receiving society species sufficient sufficiently undeceive undeceiving
# we disambiguate words using context
# identify 'who did what to whom' by finding
# antecedant
# > anaphora resolution & semantic role labeling
# solve problems of language understanding and be able to move on to tasks as
# > question/answering & machine translation
# aka MT
from nltk.book import *
babelize_shell() #???
--------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_6842/1520477239.py in <module> ----> 1 babelize_shell() #??? NameError: name 'babelize_shell' is not defined
#NLTK Babelizer : type 'help' for a list of commands.
#Babel> how long before the next flight to Alice Springs?
#Babel> german
#Babel> run
babelize_shell()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_6842/3344165984.py in <module> ----> 1 babelize_shell() NameError: name 'babelize_shell' is not defined
from nltk.book import *
nltk.chat.chatbots()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_6842/3022652667.py in <module> ----> 1 nltk.chat.chatbots() NameError: name 'nltk' is not defined
#prove hypothesis
12 /4 + 1
4.0
26 ** 10
141167095653376
['Monty', 'Python'] * 20
['Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python']
3 * sent1
['Call', 'me', 'Ishmael', '.', 'Call', 'me', 'Ishmael', '.', 'Call', 'me', 'Ishmael', '.']
len(text2)
141576
len(set(text2))
6833
text5.collocations()
wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART; cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime Player; Player 12%; dont know; lez gurls; long time
len(set(text4)) #number of tokens&number of distinct words?
9913
len(text4)
149797
my_string = ['me', 'myself', 'I']
my_string
['me', 'myself', 'I']
for word in my_string:
print (word)
me myself I
my_string + my_string
['me', 'myself', 'I', 'me', 'myself', 'I']
my_string * 3
['me', 'myself', 'I', 'me', 'myself', 'I', 'me', 'myself', 'I']
for word in my_string:
print (word * 3)
mememe myselfmyselfmyself III
for word in my_string * 3:
print (word)
me myself I me myself I me myself I
my_sent
['Bravely', 'bold', 'Sir', 'Robin', ',', 'rode', 'forth', Ellipsis, 'from', 'Camelot', '.']
my_sent = ['Bravely', 'bold', 'Sir', 'Robin', ',', 'rode', 'forth', 'Ellipsis', 'from', 'Camelot', '.']
' '.join(my_sent)
'Bravely bold Sir Robin , rode forth Ellipsis from Camelot .'
'Bravely bold Sir Robin , rode forth Ellipsis from Camelot .'.split()
['Bravely', 'bold', 'Sir', 'Robin', ',', 'rode', 'forth', 'Ellipsis', 'from', 'Camelot', '.']
phrase1 = 'I am'
phrase2 = 'You are I'
len(phrase1 + phrase2)
13
len(phrase1) + len(phrase2)
13
"Monty Python" [6:12]
'Python'
["Monty", "Python"] [1]
'Python'
["Monty", "Python"] [0]
'Monty'
sent1[2][2]
'h'
phrase1[2]
'a'
sent1
['Call', 'me', 'Ishmael', '.']
sent1[0][3] #l > 2nd word, 3rd letter of that said word
'l'
sent1[2][1:4] #shm
'shm'
sent3 #6&9
['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
sorted([w for w in set(text5) if w.startswith('b')])
['b', 'b-day', 'b/c', 'b4', 'babay', 'babble', 'babblein', 'babe', 'babes', 'babi', 'babies', 'babiess', 'baby', 'babycakeses', 'bachelorette', 'back', 'backatchya', 'backfrontsidewaysandallaroundtheworld', 'backroom', 'backup', 'bacl', 'bad', 'bag', 'bagel', 'bagels', 'bahahahaa', 'bak', 'baked', 'balad', 'balance', 'balck', 'ball', 'ballin', 'balls', 'ban', 'band', 'bandito', 'bandsaw', 'banjoes', 'banned', 'baord', 'bar', 'barbie', 'bare', 'barely', 'bares', 'barfights', 'barks', 'barn', 'barrel', 'base', 'bases', 'basically', 'basket', 'battery', 'bay', 'bbbbbyyyyyyyeeeeeeeee', 'bbiam', 'bbl', 'bbs', 'bc', 'be', 'beach', 'beachhhh', 'beam', 'beams', 'beanbag', 'beans', 'bear', 'bears', 'beat', 'beaten', 'beatles', 'beats', 'beattles', 'beautiful', 'because', 'beckley', 'become', 'bed', 'bedford', 'bedroom', 'beeeeehave', 'beeehave', 'been', 'beer', 'before', 'beg', 'begin', 'behave', 'behind', 'bein', 'being', 'beleive', 'believe', 'belive', 'bell', 'belly', 'belong', 'belongings', 'ben', 'bend', 'benz', 'bes', 'beside', 'besides', 'best', 'bet', 'betrayal', 'betta', 'better', 'between', 'beuty', 'bf', 'bi', 'biatch', 'bible', 'biebsa', 'bied', 'big', 'bigest', 'biggest', 'biiiatch', 'bike', 'bikes', 'bikini', 'bio', 'bird', 'birfday', 'birthday', 'bisexual', 'bishes', 'bit', 'bitch', 'bitches', 'bitdh', 'bite', 'bites', 'biyatch', 'biz', 'bj', 'black', 'blade', 'blah', 'blank', 'blankie', 'blazed', 'bleach', 'blech', 'bless', 'blessings', 'blew', 'blind', 'blinks', 'bliss', 'blocking', 'bloe', 'blood', 'blooded', 'bloody', 'blow', 'blowing', 'blowjob', 'blowup', 'blue', 'blueberry', 'bluer', 'blues', 'blunt', 'board', 'bob', 'bodies', 'body', 'boed', 'boght', 'boi', 'boing', 'boinked', 'bois', 'bomb', 'bone', 'boned', 'bones', 'bong', 'boning', 'bonus', 'boo', 'booboo', 'boobs', 'book', 'boom', 'boooooooooooglyyyyyy', 'boost', 'boot', 'bootay', 'booted', 'boots', 'booty', 'border', 'borderline', 'bored', 'boredom', 'boring', 'born', 'born-again', 'bosom', 'boss', 'bossy', 'bot', 'both', 'bother', 'bothering', 'bottle', 'bought', 'bounced', 'bouncer', 'bouncers', 'bound', 'bout', 'bouts', 'bow', 'bowl', 'box', 'boy', 'boyfriend', 'boys', 'bra', 'brad', 'brady', 'brain', 'brakes', 'brass', 'brat', 'brb', 'brbbb', 'bread', 'break', 'breaks', 'breath', 'breathe', 'bred', 'breeding', 'bright', 'brightened', 'bring', 'brings', 'bro', 'broke', 'brooklyn', 'brother', 'brothers', 'brought', 'brown', 'brrrrrrr', 'bruises', 'brunswick', 'brwn', 'btw', 'bucks', 'buddyyyyyy', 'buff', 'buffalo', 'bug', 'bugs', 'buh', 'build', 'builds', 'built', 'bull', 'bulls', 'bum', 'bumber', 'bummer', 'bumped', 'bumper', 'bunch', 'bunny', 'burger', 'burito', 'burned', 'burns', 'burp', 'burpin', 'burps', 'burried', 'burryed', 'bus', 'buses', 'bust', 'busted', 'busy', 'but', 'butt', 'butter', 'butterscotch', 'button', 'buttons', 'buy', 'buying', 'bwahahahahahahahahahaha', 'by', 'byb', 'bye', 'byeee', 'byeeee', 'byeeeeeeee', 'byeeeeeeeeeeeee', 'byes']
range(10)
range(0, 10)
range(10,20)
range(10, 20)
range(10, 20, 2)
range(10, 20, 2)
range(20, 10, -2)
range(20, 10, -2)
text9.index('sunset')
629
text9[628:632]
['the', 'sunset', 'side', 'of']
text9[620:640]
['PARK', 'THE', 'suburb', 'of', 'Saffron', 'Park', 'lay', 'on', 'the', 'sunset', 'side', 'of', 'London', ',', 'as', 'red', 'and', 'ragged', 'as', 'a']
text9[621:660]
['THE', 'suburb', 'of', 'Saffron', 'Park', 'lay', 'on', 'the', 'sunset', 'side', 'of', 'London', ',', 'as', 'red', 'and', 'ragged', 'as', 'a', 'cloud', 'of', 'sunset', '.', 'It', 'was', 'built', 'of', 'a', 'bright', 'brick', 'throughout', ';', 'its', 'sky', '-', 'line', 'was', 'fantastic', ',']
text9[621:644]
['THE', 'suburb', 'of', 'Saffron', 'Park', 'lay', 'on', 'the', 'sunset', 'side', 'of', 'London', ',', 'as', 'red', 'and', 'ragged', 'as', 'a', 'cloud', 'of', 'sunset', '.']
sorted(set(sent1))
['.', 'Call', 'Ishmael', 'me']
len(sent1)
4
len(sent8)
14
sent1 + sent8
['Call', 'me', 'Ishmael', '.', '25', 'SEXY', 'MALE', ',', 'seeks', 'attrac', 'older', 'single', 'lady', ',', 'for', 'discreet', 'encounters', '.']
sorted([len(sent1 + sent8)])
[18]
sorted(set([w.lower() for w in text1]))
['!', '!"', '!"--', "!'", '!\'"', '!)', '!)"', '!*', '!--', '!--"', "!--'", '"', '"\'', '"--', '"...', '";', '$', '&', "'", "',", "',--", "'-", "'--", "';", '(', ')', '),', ')--', ').', ').--', '):', ');', ');--', '*', ',', ',"', ',"--', ",'", ",'--", ',)', ',*', ',--', ',--"', ",--'", '-', '--', '--"', "--'", '--\'"', '--(', '---"', '---,', '.', '."', '."*', '."--', ".'", '.\'"', '.)', '.*', '.*--', '.,', '.--', '.--"', '...', '....', '.]', '000', '1', '10', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '11', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '12', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '13', '130', '131', '132', '133', '134', '135', '14', '144', '1492', '15', '150', '15th', '16', '1652', '1668', '1671', '1690', '1695', '16th', '17', '1726', '1729', '1750', '1772', '1775', '1776', '1778', '1779', '1788', '1791', '1793', '18', '180', '1807', '1819', '1820', '1821', '1825', '1828', '1833', '1836', '1839', '1840', '1842', '1846', '1850', '1851', '19', '1st', '2', '20', '2000', '200th', '21', '21st', '22', '23', '24', '25', '26', '27', '275th', '28', '29', '2nd', '3', '30', '31', '31st', '32', '33', '34', '35', '36', '37', '38', '39', '3d', '4', '40', '400', '41', '42', '43', '44', '440', '45', '46', '47', '48', '49', '4th', '5', '50', '500', '51', '52', '53', '54', '55', '550', '56', '57', '58', '59', '5th', '6', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '7', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '8', '80', '800', '81', '82', '83', '84', '85', '86', '87', '88', '89', '890', '9', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', ':', ':"-', ':--', ':--"', ":--'", ';', ';"', ';"--', ';"--(', ";'", ';*', ';--', ';--"', ";--'", '?', '?"', '?"--', "?'", "?'--'", '?--', '?--"', "?--'", '[', ']', '_____________', 'a', 'aback', 'abaft', 'abandon', 'abandoned', 'abandonedly', 'abandonment', 'abased', 'abasement', 'abashed', 'abate', 'abated', 'abatement', 'abating', 'abbreviate', 'abbreviation', 'abeam', 'abed', 'abednego', 'abel', 'abhorred', 'abhorrence', 'abhorrent', 'abhorring', 'abide', 'abided', 'abiding', 'ability', 'abjectly', 'abjectus', 'able', 'ablutions', 'aboard', 'abode', 'abominable', 'abominate', 'abominated', 'abomination', 'aboriginal', 'aboriginally', 'aboriginalness', 'abortion', 'abortions', 'abound', 'abounded', 'abounding', 'aboundingly', 'about', 'above', 'abraham', 'abreast', 'abridged', 'abroad', 'abruptly', 'absence', 'absent', 'absolute', 'absolutely', 'absorbed', 'absorbing', 'absorbingly', 'abstained', 'abstemious', 'abstinence', 'abstract', 'abstracted', 'abstraction', 'absurd', 'absurdly', 'abundance', 'abundant', 'abundantly', 'academy', 'accelerate', 'accelerated', 'accelerating', 'accept', 'accessible', 'accessory', 'accident', 'accidental', 'accidentally', 'accidents', 'accommodate', 'accommodated', 'accommodation', 'accompanied', 'accompanies', 'accompaniments', 'accompany', 'accompanying', 'accomplish', 'accomplished', 'accomplishing', 'accomplishment', 'accordance', 'according', 'accordingly', 'accosted', 'account', 'accountable', 'accountants', 'accounted', 'accounting', 'accounts', 'accumulate', 'accumulated', 'accumulating', 'accuracy', 'accurate', 'accurately', 'accursed', 'accustomed', 'acerbities', 'ache', 'ached', 'achieve', 'achieved', 'achilles', 'acknowledges', 'acknowledging', 'acquaintance', 'acquaintances', 'acquainted', 'acquiesce', 'acquiesced', 'acquiescence', 'acquired', 'acre', 'acres', 'acridness', 'across', 'act', 'acted', 'actest', 'action', 'actions', 'actium', 'active', 'actively', 'activity', 'acts', 'actual', 'actually', 'actuated', 'acushnet', 'acute', 'acuteness', 'adam', 'adamite', 'adapted', 'add', 'added', 'adding', 'addition', 'additional', 'address', 'addressed', 'addressing', 'adds', 'adequate', 'adequately', 'adhering', 'adhesiveness', 'adieu', 'adieux', 'adios', 'adjacent', 'adjoining', 'adjust', 'adjusting', 'admeasurement', 'admeasurements', 'administered', 'administering', 'admirable', 'admirably', 'admiral', 'admirals', 'admire', 'admirer', 'admirers', 'admit', 'admits', 'admitted', 'admitting', 'admonish', 'admonished', 'admonishing', 'admonitions', 'admonitory', 'ado', 'adolescence', 'adopt', 'adopted', 'adopting', 'adoption', 'adoration', 'adoring', 'adorned', 'adorning', 'adornment', 'adown', 'adrift', 'adroit', 'adroitly', 'adroop', 'adult', 'adulterer', 'advance', 'advanced', 'advancement', 'advances', 'advancing', 'advantage', 'advantages', 'advent', 'adventure', 'adventures', 'adventurous', 'adventurously', 'adverse', 'advert', 'advertised', 'advice', 'advised', 'advocate', 'aerated', 'aesthetically', 'aesthetics', 'afar', 'affair', 'affairs', 'affect', 'affected', 'affecting', 'affection', 'affectionate', 'affectionately', 'affghanistan', 'affidavit', 'affinities', 'affirm', 'affirmative', 'affirms', 'affixed', 'afflicted', 'afflictions', 'affluent', 'afford', 'afforded', 'affording', 'affords', 'affright', 'affrighted', 'affrights', 'affronted', 'afire', 'afloat', 'afoam', 'afore', 'aforesaid', 'aforethought', 'afoul', 'afraid', 'afresh', 'afric', 'africa', 'african', 'africans', 'aft', 'after', 'afternoon', 'afternoons', 'afterwards', 'again', 'againe', 'against', 'agassiz', 'age', 'aged', 'agencies', 'agency', 'agent', 'agents', 'ages', 'aggravate', 'aggregate', 'aggregated', 'aggregation', 'aggregations', 'aggrieved', 'aghast', 'agile', 'agitated', 'aglow', 'ago', 'agonies', 'agonized', 'agonizing', 'agonizingly', 'agony', 'agrarian', 'agree', 'agreeable', 'agreed', 'agrees', 'aground', 'ague', 'ah', 'ahab', 'ahabs', 'ahasuerus', 'ahaz', 'ahead', 'ahoy', 'aid', 'aides', 'ails', 'aim', 'aimed', 'aimlessly', 'ain', 'aint', 'air', 'airley', 'airth', 'aisle', 'ajar', 'akin', 'alabama', 'aladdin', 'alarm', 'alarmed', 'alarms', 'alas', 'alb', 'albatross', 'albatrosses', 'albemarle', 'albert', 'albicore', 'albino', 'alcoves', 'aldermen', 'aldrovandi', 'aldrovandus', 'ale', 'aleak', 'alert', 'alewives', 'alexander', 'alexanders', 'alfred', 'algerine', 'algiers', 'alien', 'aliens', 'alights', 'alike', 'aliment', 'alive', 'all', 'allay', 'allaying', 'alleged', 'alleghanian', 'alleghanies', 'allegiance', 'allegorical', 'allegory', 'alley', 'alleys', 'allies', 'allotted', 'allow', 'allowance', 'allowances', 'allowed', 'allowing', 'allude', 'alluded', 'alluding', 'allured', 'allurements', 'allures', 'alluring', 'alluringly', 'allurings', 'allusion', 'allusions', 'almanac', 'almanack', 'almighty', 'almost', 'alms', 'aloft', 'alone', 'along', 'alongside', 'aloof', 'aloud', 'alow', 'alpacas', 'alpine', 'alps', 'already', 'also', 'altar', 'alter', 'altered', 'altering', 'alternate', 'alternately', 'alternating', 'although', 'altitude', 'altitudes', 'altogether', 'always', 'am', 'amain', 'amaze', 'amazement', 'amazing', 'amazingly', 'amber', 'ambergriese', 'ambergris', 'ambiguous', 'ambition', 'ambitious', 'amelia', 'amen', 'amend', 'america', 'american', 'americans', 'americas', 'amid', 'amidst', 'amittai', 'among', 'amongst', 'amorous', 'amount', 'amounted', 'amounts', 'amours', 'amphibious', 'amphitheatrical', 'ample', 'amplified', 'amplify', 'amputate', 'amputated', 'amputating', 'amputation', 'amputations', 'amsterdam', 'amuck', 'amusing', 'an', 'anacharsis', 'anaconda', 'anacondas', 'anak', 'analogical', 'analogies', 'analogous', 'analogy', 'analyse', 'analysed', 'analysis', 'analytic', 'anathemas', 'anatomical', 'anatomist', 'anatomy', 'ancestors', 'ancestress', 'ancestry', 'anchor', 'anchored', 'anchors', 'ancient', 'ancientest', 'and', 'andes', 'andirons', 'andrew', 'andromeda', 'anew', 'angel', 'angelo', 'angels', 'anger', 'angle', 'angles', 'anglo', 'angrily', 'angry', 'anguish', 'angular', 'angularly', 'animal', 'animals', 'animate', 'animated', 'animating', 'animation', 'animosity', 'ankers', 'ankles', 'annals', 'annawon', 'anne', 'annihilated', 'annihilating', 'annihilation', 'anno', 'announced', 'announcement', 'announces', 'announcing', 'annual', 'annually', 'annuitants', 'annus', 'anoint', 'anointed', 'anointing', 'anoints', 'anomalous', 'anomalously', 'anomaly', 'anon', 'anonymous', 'another', 'answer', 'answered', 'answers', 'ant', 'antagonistic', 'antarctic', 'antecedent', 'antediluvian', 'antelope', 'antemosaic', 'anti', 'antichronical', 'anticipated', 'anticipatingly', 'anticipation', 'anticipative', 'antics', 'antidote', 'antilles', 'antiochus', 'antique', 'antiquities', 'antiquity', 'antlered', 'antlers', 'antony', 'ants', 'antwerp', 'anus', 'anvil', 'anxieties', 'anxiety', 'anxious', 'any', 'anybody', 'anyhow', 'anyone', 'anything', 'anyway', 'anyways', 'anywhere', 'aorta', 'apart', 'apartment', 'ape', 'apeak', 'apertures', 'apex', 'apollo', 'apology', 'apoplectic', 'apoplexy', 'apostolic', 'apothecary', 'apotheosis', 'appal', 'appalled', 'appalling', 'appallingly', 'appals', 'apparatus', 'apparel', 'apparelled', 'apparent', 'apparently', 'apparition', 'appeal', 'appeals', 'appear', 'appearance', 'appearances', 'appeared', 'appearing', 'appears', 'appellation', 'appellations', 'appellative', 'append', 'appendage', 'appetite', 'appetites', 'apple', 'appliance', 'applicable', 'application', 'applied', 'applies', 'apply', 'applying', 'appoint', 'appointed', 'appointments', 'apportioned', 'appreciative', 'apprehension', 'apprehensions', 'apprehensiveness', 'apprise', 'apprised', 'approach', 'approached', 'approaches', 'approaching', 'appropriate', 'appropriated', 'approval', 'approve', 'approved', 'approving', 'approvingly', 'approximate', 'apricot', 'april', 'apron', 'apt', 'aptitude', 'aptitudes', 'aquarius', 'arbitrary', 'arboring', 'arbours', 'arc', 'arch', 'archaeological', 'archaeologists', 'archangel', 'archangelic', 'archangelical', 'archangels', 'archbishop', 'archbishopric', 'arched', 'archer', 'arches', 'archiepiscopacy', 'arching', 'archipelagoes', 'architect', 'architects', 'architecture', 'archy', 'arctic', 'ardour', 'are', 'area', 'arethusa', 'argo', 'argosy', 'argue', 'argued', 'arguing', 'argument', 'arguments', 'arid', 'aries', 'aright', 'arion', 'arise', 'arisen', 'arises', 'arising', 'aristotle', 'arithmetic', 'ark', 'arkansas', 'arkite', 'arm', 'armada', 'armed', 'armies', 'armor', 'arms', 'army', 'arnold', 'aroma', 'aromas', 'aromatic', 'aroostook', 'arose', 'around', 'arpens', ...]
sorted([w.lower() for w in set(text1)])
['!', '!"', '!"--', "!'", '!\'"', '!)', '!)"', '!*', '!--', '!--"', "!--'", '"', '"\'', '"--', '"...', '";', '$', '&', "'", "',", "',--", "'-", "'--", "';", '(', ')', '),', ')--', ').', ').--', '):', ');', ');--', '*', ',', ',"', ',"--', ",'", ",'--", ',)', ',*', ',--', ',--"', ",--'", '-', '--', '--"', "--'", '--\'"', '--(', '---"', '---,', '.', '."', '."*', '."--', ".'", '.\'"', '.)', '.*', '.*--', '.,', '.--', '.--"', '...', '....', '.]', '000', '1', '10', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '11', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '12', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '13', '130', '131', '132', '133', '134', '135', '14', '144', '1492', '15', '150', '15th', '16', '1652', '1668', '1671', '1690', '1695', '16th', '17', '1726', '1729', '1750', '1772', '1775', '1776', '1778', '1779', '1788', '1791', '1793', '18', '180', '1807', '1819', '1820', '1821', '1825', '1828', '1833', '1836', '1839', '1840', '1842', '1846', '1850', '1851', '19', '1st', '1st', '2', '20', '2000', '200th', '21', '21st', '22', '23', '24', '25', '26', '27', '275th', '28', '29', '2nd', '3', '30', '31', '31st', '32', '33', '34', '35', '36', '37', '38', '39', '3d', '3d', '4', '40', '400', '41', '42', '43', '44', '440', '45', '46', '47', '48', '49', '4th', '5', '50', '500', '51', '52', '53', '54', '55', '550', '56', '57', '58', '59', '5th', '6', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '7', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '8', '80', '800', '81', '82', '83', '84', '85', '86', '87', '88', '89', '890', '9', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', ':', ':"-', ':--', ':--"', ":--'", ';', ';"', ';"--', ';"--(', ";'", ';*', ';--', ';--"', ";--'", '?', '?"', '?"--', "?'", "?'--'", '?--', '?--"', "?--'", '[', ']', '_____________', 'a', 'a', 'aback', 'abaft', 'abandon', 'abandoned', 'abandonedly', 'abandonment', 'abased', 'abasement', 'abashed', 'abashed', 'abate', 'abated', 'abatement', 'abating', 'abbreviate', 'abbreviation', 'abeam', 'abed', 'abednego', 'abel', 'abhorred', 'abhorrence', 'abhorrent', 'abhorring', 'abide', 'abided', 'abiding', 'ability', 'abjectly', 'abjectus', 'able', 'ablutions', 'aboard', 'aboard', 'abode', 'abominable', 'abominable', 'abominate', 'abominated', 'abomination', 'aboriginal', 'aboriginally', 'aboriginalness', 'abortion', 'abortions', 'abound', 'abounded', 'abounding', 'aboundingly', 'about', 'about', 'about', 'above', 'above', 'abraham', 'abreast', 'abridged', 'abroad', 'abruptly', 'absence', 'absent', 'absolute', 'absolutely', 'absorbed', 'absorbing', 'absorbingly', 'abstained', 'abstemious', 'abstinence', 'abstract', 'abstracted', 'abstraction', 'absurd', 'absurdly', 'abundance', 'abundant', 'abundantly', 'academy', 'accelerate', 'accelerated', 'accelerating', 'accept', 'accessible', 'accessory', 'accessory', 'accident', 'accidental', 'accidentally', 'accidents', 'accommodate', 'accommodated', 'accommodation', 'accompanied', 'accompanies', 'accompaniments', 'accompany', 'accompanying', 'accomplish', 'accomplished', 'accomplishing', 'accomplishment', 'accordance', 'according', 'according', 'accordingly', 'accordingly', 'accosted', 'account', 'account', 'accountable', 'accountants', 'accounted', 'accounting', 'accounts', 'accumulate', 'accumulated', 'accumulating', 'accuracy', 'accurate', 'accurately', 'accursed', 'accursed', 'accustomed', 'acerbities', 'ache', 'ached', 'achieve', 'achieved', 'achilles', 'acknowledges', 'acknowledging', 'acquaintance', 'acquaintances', 'acquainted', 'acquiesce', 'acquiesced', 'acquiescence', 'acquired', 'acre', 'acres', 'acridness', 'across', 'act', 'acted', 'actest', 'action', 'actions', 'actium', 'active', 'actively', 'activity', 'acts', 'actual', 'actually', 'actuated', 'acushnet', 'acute', 'acuteness', 'adam', 'adamite', 'adapted', 'add', 'added', 'adding', 'addition', 'additional', 'additional', 'address', 'addressed', 'addressing', 'adds', 'adequate', 'adequately', 'adhering', 'adhesiveness', 'adieu', 'adieu', 'adieux', 'adios', 'adjacent', 'adjoining', 'adjust', 'adjusting', 'admeasurement', 'admeasurements', 'administered', 'administering', 'admirable', 'admirably', 'admiral', 'admiral', 'admirals', 'admirals', 'admire', 'admirer', 'admirers', 'admit', 'admits', 'admitted', 'admitting', 'admonish', 'admonished', 'admonishing', 'admonitions', 'admonitory', 'ado', 'adolescence', 'adopt', 'adopted', 'adopting', 'adoption', 'adoration', 'adoring', 'adorned', 'adorning', 'adornment', 'adown', 'adrift', 'adroit', 'adroitly', 'adroop', 'adult', 'adulterer', 'advance', 'advance', 'advanced', 'advancement', 'advances', 'advancing', 'advancing', 'advantage', 'advantages', 'advent', 'adventure', 'adventures', 'adventures', 'adventures', 'adventurous', 'adventurously', 'adverse', 'advert', 'advertised', 'advice', 'advised', 'advocate', 'advocate', 'aerated', 'aesthetically', 'aesthetics', 'afar', 'affair', 'affairs', 'affect', 'affected', 'affected', 'affecting', 'affection', 'affectionate', 'affectionately', 'affghanistan', 'affidavit', 'affidavit', 'affinities', 'affirm', 'affirmative', 'affirms', 'affixed', 'afflicted', 'afflictions', 'affluent', 'afford', 'afforded', 'affording', 'affords', 'affright', 'affrighted', 'affrighted', 'affrights', 'affronted', 'afire', 'afloat', 'afoam', 'afore', 'aforesaid', 'aforethought', 'afoul', 'afraid', 'afresh', 'afric', 'africa', 'africa', 'african', 'africans', 'aft', 'aft', 'after', 'after', 'after', 'afternoon', 'afternoons', 'afterwards', 'afterwards', 'again', 'again', 'againe', 'against', 'against', 'against', 'agassiz', 'age', 'aged', 'agencies', 'agency', 'agent', 'agents', 'ages', 'ages', 'aggravate', 'aggregate', 'aggregated', 'aggregation', 'aggregations', 'aggrieved', 'aghast', 'agile', 'agitated', 'aglow', 'ago', 'agonies', 'agonized', 'agonizing', 'agonizingly', 'agony', 'agrarian', 'agree', 'agreeable', 'agreed', 'agrees', 'aground', 'ague', 'ah', 'ah', 'ahab', 'ahab', 'ahabs', 'ahasuerus', 'ahaz', 'ahead', 'ahoy', 'ahoy', 'aid', 'aides', 'ails', 'aim', 'aimed', 'aimlessly', 'ain', 'ain', 'aint', 'air', 'air', 'airley', 'airth', 'aisle', 'ajar', 'akin', 'akin', 'alabama', 'aladdin', 'alarm', 'alarmed', 'alarmed', 'alarms', 'alas', 'alas', 'alb', 'albatross', 'albatross', 'albatrosses', 'albemarle', 'albert', 'albicore', 'albino', 'alcoves', 'aldermen', 'aldrovandi', 'aldrovandus', 'ale', 'aleak', 'alert', 'alewives', 'alexander', 'alexanders', 'alfred', 'alfred', 'algerine', 'algerine', 'algiers', 'alien', 'aliens', 'alights', 'alike', 'alike', 'aliment', 'alive', 'alive', 'alive', 'all', 'all', 'all', 'allay', 'allaying', 'alleged', 'alleghanian', 'alleghanies', 'allegiance', 'allegorical', 'allegory', 'alley', 'alleys', 'allies', 'allotted', 'allow', 'allowance', 'allowances', 'allowed', 'allowing', 'allude', 'alluded', 'alluding', 'allured', 'allurements', 'allures', 'alluring', 'alluringly', 'allurings', 'allusion', 'allusions', 'almanac', 'almanack', 'almighty', 'almighty', 'almost', 'almost', 'alms', 'aloft', 'aloft', 'alone', 'alone', 'alone', 'along', 'alongside', 'aloof', 'aloud', 'alow', 'alpacas', 'alpine', 'alps', 'already', 'already', 'also', 'also', 'altar', 'alter', 'altered', 'altering', 'alternate', 'alternately', 'alternating', 'although', 'altitude', 'altitudes', 'altogether', 'always', 'am', 'am', 'am', 'amain', 'amaze', 'amazement', 'amazing', 'amazingly', 'amber', 'ambergriese', 'ambergris', 'ambergris', 'ambiguous', 'ambition', 'ambitious', 'amelia', 'amen', 'amend', 'america', 'america', 'american', 'americans', 'americas', 'amid', 'amidst', 'amittai', 'among', 'among', 'among', 'amongst', 'amorous', 'amount', 'amounted', 'amounts', 'amours', 'amphibious', 'amphitheatrical', 'ample', 'amplified', 'amplify', 'amputate', 'amputated', 'amputating', 'amputation', 'amputations', 'amsterdam', 'amuck', 'amusing', 'an', 'an', 'anacharsis', 'anaconda', 'anacondas', 'anak', 'analogical', 'analogies', 'analogous', 'analogy', 'analyse', 'analysed', 'analysis', 'analytic', 'anathemas', 'anatomical', 'anatomist', 'anatomy', 'ancestors', 'ancestress', 'ancestry', 'anchor', 'anchored', 'anchors', 'anchors', 'ancient', 'ancientest', 'and', 'and', 'and', 'andes', 'andirons', 'andrew', 'andromeda', 'anew', 'angel', 'angel', 'angelo', 'angels', 'angels', 'anger', 'angle', 'angles', 'anglo', 'angrily', 'angry', 'anguish', 'angular', 'angularly', 'animal', 'animal', 'animals', 'animate', 'animated', 'animated', 'animating', 'animation', 'animosity', 'ankers', 'ankles', 'annals', 'annals', 'annawon', 'anne', 'annihilated', 'annihilating', 'annihilation', 'anno', 'announced', 'announcement', 'announces', 'announcing', 'annual', 'annually', 'annuitants', 'annus', 'anoint', 'anointed', 'anointing', 'anoints', 'anomalous', 'anomalous', 'anomalously', 'anomaly', 'anon', 'anonymous', 'another', 'another', 'another', 'answer', 'answer', 'answered', 'answers', 'ant', 'antagonistic', 'antarctic', 'antecedent', 'antediluvian', 'antelope', 'antemosaic', 'anti', 'antichronical', 'anticipated', 'anticipatingly', 'anticipation', 'anticipative', 'antics', 'antidote', 'antilles', 'antiochus', 'antique', 'antiquities', 'antiquity', 'antlered', 'antlers', 'antony', 'ants', 'antwerp', 'anus', 'anvil', 'anvil', 'anxieties', 'anxiety', 'anxious', 'any', 'any', 'any', 'anybody', 'anyhow', 'anyhow', 'anyone', 'anything', 'anything', 'anyway', 'anyways', 'anywhere', 'aorta', 'apart', 'apartment', 'ape', 'apeak', 'apertures', 'apex', 'apollo', 'apology', 'apoplectic', 'apoplexy', 'apoplexy', 'apostolic', 'apothecary', 'apotheosis', 'appal', 'appalled', 'appalling', 'appallingly', 'appals', 'apparatus', 'apparel', 'apparelled', 'apparent', 'apparently', 'apparition', 'appeal', 'appeals', 'appear', 'appearance', 'appearances', 'appeared', 'appearing', 'appears', 'appellation', 'appellations', 'appellative', 'append', 'appendage', 'appetite', 'appetites', 'apple', 'appliance', 'applicable', 'application', 'application', 'applied', 'applied', 'applies', 'apply', 'apply', 'applying', 'appoint', 'appointed', 'appointments', 'apportioned', 'appreciative', 'apprehension', ...]
len(text2)
141576
text2[-2:]
['THE', 'END']
from nltk.book import *
*** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908
fdist5 = FreqDist(text5)
sorted([w for w in set(text5) if len(w) == 4])
['!!!!', '!!!.', '!...', '!???', '"...', '####', '((((', '))))', ',,,,', '.. .', '....', '.op.', '1.98', '1.99', '100%', '10th', '1200', '1299', '18ST', '1900', '1930', '1980', '1985', '1996', '1cos', '2006', '2:55', '2DAY', '2Pac', '39.3', '3:45', '4.20', '45.5', '4:03', '64.8', '6:38', '6:41', '6:51', '6:53', '7:45', '9.53', '98.5', '98.6', '9:10', ':o *', '; ..', '<---', "<3's", '<333', '<<<<', '<~~~', '>:->', '?!?!', '??!!', '????', 'AKDT', 'AKST', 'AWAY', 'Ahhh', 'Away', 'Awww', 'Back', 'Been', 'Bone', 'Born', 'Boyz', 'CALI', 'CAPS', 'CHAT', 'COME', 'Came', 'Care', 'Chat', 'Chop', 'City', 'Come', 'Cool', 'Ctrl', 'Cute', 'DAMN', 'DING', 'DOES', 'DONT', 'Damn', 'Dang', 'Dawn', 'Days', 'Deep', 'Does', 'Dood', 'Down', 'Drew', 'Drop', 'Dude', 'ELSE', 'EVEN', 'Eggs', 'Elev', 'Elle', 'Even', 'Evil', 'Eyes', 'FACE', 'FINE', 'FROM', 'Fade', 'Food', 'Fort', 'From', 'GIRL', 'GOOD', 'GUYS', 'Girl', 'Good', 'Gosh', 'GrlZ', 'HAHA', 'HALO', 'HAVE', 'HERE', 'HOTT', 'HUGE', 'Haha', 'Hail', 'Hand', 'Hard', 'Have', 'Help', 'Here', 'Hero', 'Heya', 'Heys', 'Heyy', 'High', 'Hill', 'Hiya', 'Hold', 'Holy', 'Home', 'Hott', 'Hugs', 'Iowa', 'JOIN', 'JUST', 'Jane', 'Jess', 'Joey', 'John', 'Judy', 'Just', 'KNOW', 'Kent', 'Kewl', 'Kick', 'Kids', 'King', 'Kiss', 'KoOL', 'Kold', 'LATE', 'LAst', 'LIVE', 'LMAO', 'LONG', 'LOUD', 'Last', 'Lets', 'Liam', 'Lies', 'Life', 'Like', 'Lime', 'Lion', 'Live', 'Lmao', 'LoVe', 'Long', 'Look', 'Lord', 'Love', 'MODE', 'MORE', 'MRIs', 'MUAH', 'Male', 'Maps', 'Mary', 'Matt', 'Meep', 'Mine', 'Mono', 'NAME', 'NICK', 'NONE', 'NTMN', 'Need', 'News', 'Nice', 'None', 'Nooo', 'Nope', 'Nova', 'O.k.', 'OOPS', 'Ohhh', 'Ohio', 'Okay', 'Only', 'Oops', 'Over', 'PART', "PM's", 'PMSL', 'Paul', 'Phil', 'Poor', 'Pour', 'Prof', 'QUIT', "RN's", 'ROFL', 'ROOM', 'Rang', 'Reub', 'Rick', 'Road', 'Rock', 'Rofl', 'Room', 'Rule', 'Rush', 'Ruth', 'SEEN', 'SExy', 'SIZE', 'SOME', 'SSRI', 'STOP', 'Same', 'Sat.', 'Save', 'Seee', 'Sexy', 'Show', 'Slip', 'Song', 'Stop', 'Sure', 'Swim', 'TALK', 'TEXT', 'THAT', 'THEY', 'TIME', 'TYPR', 'Take', 'Talk', 'Teck', 'Tell', 'That', 'Then', 'They', 'This', 'Tide', 'Tiff', 'Time', 'Tina', 'Tisk', 'Troy', 'Turn', 'Type', 'U100', 'U101', 'U102', 'U103', 'U104', 'U105', 'U106', 'U107', 'U108', 'U109', 'U110', 'U111', 'U112', 'U113', 'U114', 'U115', 'U116', 'U117', 'U118', 'U119', 'U120', 'U121', 'U122', 'U123', 'U126', 'U128', 'U129', 'U130', 'U132', 'U133', 'U134', 'U136', 'U137', 'U138', 'U139', 'U141', 'U142', 'U143', 'U144', 'U145', 'U146', 'U147', 'U148', 'U149', 'U150', 'U153', 'U154', 'U155', 'U156', 'U158', 'U163', 'U164', 'U165', 'U168', 'U169', 'U170', 'U172', 'U175', 'U181', 'U190', 'U196', 'U197', 'U219', 'U520', 'U542', 'U819', 'U820', 'U988', 'U989', 'Uhhh', 'Ummm', 'VBox', 'VVil', 'Very', 'WHEN', 'WHOA', 'WILL', 'WITH', 'Well', 'Werd', 'Were', 'West', 'What', 'When', 'Will', 'Wind', 'Wyte', 'YALL', 'YOUR', 'Yeah', 'Yoko', 'York', 'Your', 'able', 'abou', 'acid', 'adds', 'addy', 'ages', 'ahah', 'ahem', 'ahhh', 'aime', 'aint', 'akon', 'allo', 'ally', 'alot', 'also', 'amen', 'anal', 'anti', 'any1', 'area', 'argh', 'arms', 'army', 'asks', 'asss', 'aunt', 'away', 'awww', 'babe', 'babi', 'baby', 'back', 'bacl', 'ball', 'band', 'bare', 'barn', 'base', 'beam', 'bear', 'beat', 'been', 'beer', 'bein', 'bell', 'bend', 'benz', 'best', 'bied', 'bike', 'bird', 'bite', 'blah', 'blew', 'bloe', 'blow', 'blue', 'body', 'boed', 'bois', 'bomb', 'bone', 'bong', 'book', 'boom', 'boot', 'born', 'boss', 'both', 'bout', 'bowl', 'boys', 'brad', 'brat', 'bred', 'brwn', 'buff', 'bugs', 'bull', 'burp', 'bust', 'busy', 'butt', 'byes', 'caan', 'caca', 'cali', 'call', 'calm', 'came', 'cams', 'cant', 'caps', 'card', 'care', 'cars', 'case', 'cash', 'cast', 'cell', 'cepn', 'chat', 'chik', 'chip', 'chit', 'choc', 'ciao', 'city', 'clap', 'clay', 'club', 'clue', 'cmon', 'coat', 'cock', 'coem', 'cold', 'come', 'comp', 'cook', 'cool', 'cops', 'corn', 'cost', 'crap', 'crib', 'crop', 'cums', 'cure', 'cuss', 'cute', 'cyas', 'daft', 'damn', 'dang', 'dark', 'date', 'dawg', 'days', 'dead', 'deaf', 'deal', 'dear', 'deep', 'deop', 'dick', 'died', 'dies', 'dint', 'dirt', 'disc', 'dman', 'docs', 'does', 'dogs', 'doin', 'dojn', 'doll', 'done', 'dont', 'door', 'dork', 'dotn', 'down', 'draw', 'drew', 'drop', 'drug', 'dude', 'duet', 'dumb', 'dump', 'dust', 'dyed', 'each', 'ears', 'east', 'easy', 'eats', 'ebay', 'eeek', 'eeww', 'elle', 'ello', 'else', 'enuf', 'eric', 'este', 'evah', 'even', 'ever', 'evil', 'ewww', "ex's", 'exit', 'eyes', 'face', 'fair', 'fake', 'fall', 'fart', 'fast', 'fawk', 'fear', 'feat', 'febe', 'feel', 'feet', 'felt', 'find', 'fine', 'fire', 'firs', 'fish', 'fits', 'five', 'flaw', 'flow', 'fock', 'food', 'fool', 'foot', 'form', 'four', 'free', 'from', 'frst', 'fuck', 'full', 'gags', 'gals', 'game', 'gawd', 'gays', 'gear', 'gees', 'geez', 'gets', 'ghet', 'gift', 'gimp', 'girl', 'giva', 'give', 'givs', 'glad', 'goes', 'goin', 'gold', 'golf', 'gone', 'good', 'goof', 'gooo', 'gosh', 'gray', 'grea', 'gret', 'grew', 'grin', 'grrl', 'grrr', 'guns', 'guts', 'guys', 'guyz', 'haaa', 'haha', 'hail', 'hair', 'half', 'hall', 'halo', 'hand', 'hang', 'hank', 'hard', 'hate', 'have', 'hawT', 'hawt', 'haze', 'hazy', 'head', 'heal', 'hear', 'heat', 'heck', 'heee', 'hehe', 'hell', 'help', 'herE', 'herd', 'here', 'heya', 'heyy', 'hgey', 'hick', 'hide', 'high', 'hiii', 'hill', 'hint', 'hiom', 'hits', 'hiya', 'hmmm', 'hmph', 'hogs', 'hola', 'hold', 'holy', 'home', 'hong', 'hook', 'hooo', 'hope', 'hots', 'hott', 'hour', 'howl', 'hows', 'howz', 'http', 'huge', 'hugs', 'humm', 'hump', 'hurr', 'hurt', 'icky', 'idea', 'idnt', 'imma', 'inch', 'into', 'isnt', "it's", 'itch', 'jack', 'jail', 'jeep', 'jeff', 'jerk', 'john', 'joke', 'jude', 'jump', 'junk', 'jush', 'just', 'keep', 'kent', 'kept', 'kewl', 'keys', 'kick', 'kids', 'kill', 'kina', 'kind', 'king', 'kiss', 'kmph', 'knee', 'knew', 'know', 'kold', 'kong', 'kool', 'lady', 'ladz', 'laid', 'lake', 'lala', 'lame', 'land', 'lapd', 'last', 'late', 'lawl', 'lazy', 'lead', 'left', 'legs', 'lets', 'lick', 'lies', 'life', 'like', 'limp', 'line', 'lisa', 'list', 'live', 'lmao', 'lois', 'lol.', 'long', 'look', 'lool', 'lord', 'lose', 'loss', 'lost', 'lots', 'loud', 'love', 'ltnc', 'ltns', 'lube', 'luck', 'lung', 'lust', 'luvs', 'lyin', 'made', 'mahn', 'main', 'make', 'male', 'mama', 'mame', 'mami', 'mang', 'many', 'mark', 'mary', 'mass', 'mauh', 'mean', 'meat', 'meds', 'meet', 'mena', 'menu', 'mess', 'mike', 'mind', 'mine', 'mins', 'miss', 'mite', 'mkay', 'mmmm', 'mode', 'mofo', 'moms', 'mono', 'moon', 'more', 'most', 'move', 'much', 'must', 'n9ne', 'nada', 'nads', 'name', 'nana', 'nawp', 'nawt', 'near', 'neck', 'need', 'nerd', 'newp', 'next', 'nice', 'nick', 'nite', 'nods', 'none', 'nope', 'nose', 'note', 'noth', 'nude', 'nuff', 'numb', 'o.k.', 'offa', 'ogan', 'ohhh', 'ohio', 'ohwa', "ok'd", 'okay', 'okey', 'once', 'ones', 'only', 'ooer', 'oooh', 'oops', 'open', 'opps', 'orgy', 'orta', 'otay', 'ouch', 'out.', 'outa', 'outs', 'over', 'owww', 'page', 'paid', 'pain', 'pair', 'park', 'part', 'pasa', 'pass', 'past', 'peek', 'peel', 'perk', 'perv', 'pfft', 'phil', 'pick', 'pics', 'pies', 'piff', 'pigs', 'pimp', 'pine', 'pink', 'plan', 'play', 'plow', 'plus', "pm'n", "pm's", 'pmsl', 'poem', 'poll', 'poof', 'pool', 'poop', 'poor', 'poot', 'pope', 'pork', 'porn', 'post', 'pour', 'pray', 'prep', 'prob', 'puff', 'puke', 'pull', 'pure', 'push', 'puts', 'pwns', 'ques', 'quit', 'quiz', 'raed', 'rain', 'rang', 'rape', 'rats', 'read', 'real', 'rent', 'rest', 'ribs', 'rich', 'ride', 'ring', 'road', 'rock', 'rofl', 'roll', 'roof', 'room', 'root', 'rose', 'rubs', 'ruff', 'rule', 'runs', 'rush', 'saME', 'safe', 'said', 'salt', 'same', 'samn', 'sand', 'sang', 'sayn', 'says', 'scar', 'scuk', 'scum', 'sean', 'seat', 'seee', 'seem', 'seen', 'self', 'sell', 'send', 'sent', 'serg', 'seth', 'sets', 'sexi', 'sexs', 'sext', 'sexy', 'shes', 'shit', 'shop', 'shot', 'show', 'shup', 'shut', 'sick', 'side', 'sigh', 'sign', 'sing', 'sink', 'sips', 'site', 'sits', 'size', 'skin', ...]
len(text5)
45010
vocabulary5 = fdist5.keys()
vocabulary5[-45010:]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_13517/2516992796.py in <module> ----> 1 vocabulary5[-45010:] TypeError: 'dict_keys' object is not subscriptable
vocab = set(text5)
vocab_size = len(vocab)
vocab_size
6066
sorted([w for w in set(text6) if w.istitle()])
File "/tmp/ipykernel_13517/1787209606.py", line 2 print sorted ^ IndentationError: unexpected indent
for token in text6:
if token.endswith('ize'):
print (token)
if 'z' in token:
print (token)
if 'pt' in token:
print (token)
if token.istitle():
print (token)
Whoa Halt Who It I Arthur Uther Pendragon Camelot King Britons Saxons England Pull I Patsy We Camelot I What Ridden Yes You What You empty So We Mercea Where We Found In Mercea The What Well zone The Are Not They What A It It It A Well Will Arthur Court Camelot Listen In Please Am I I It African Oh African European That Oh I Will Camelot But African Oh So Wait Supposing No Well They What Well Bring Bring Bring Bring Bring Bring Bring Bring Ninepence Bring Bring Bring Bring Here Ninepence I What Nothing Here I Ere He Yes I He Well He I No You Oh I It I Oh I I Well I Well He No I Robinson They Well Thursday I I You Look I I Ah Not See Thursday Right All Who I Must Why He King Arthur King Arthur Old Man Man Sorry What I I I I Well I Man Well Dennis Well I Dennis Well I What I Well I Oh And By By If Dennis Oh How How I Arthur King Britons Who King The Britons Who Britons Well We Britons I I I You We A Oh That If Please I Who No Then We What I We We Yes But Yes I By Be But Be I Order Who Heh I Well I You Well The Lady Lake Excalibur Divine Providence I Arthur Excalibur That I Listen Supreme Be Well Shut I I I Shut Shut Ah Shut Oh Come Help I Bloody Oh Did Did That I Did You King Arthur Aaagh King Arthur Aaagh Ooh King Arthur Aagh Oh King Arthur Ooh Aaagh Agh Aaaaaah Aaaaaaaaah Umm You Sir Knight I Arthur King Britons I Camelot You Will You So Come Patsy None What None I Sir Knight I Then I King Britons I So Aaah Now Tis A Your No Well I You Come Huyah Hiyaah Aaaaaaaah Victory We Thee Lord Thy Hah Come What Have Eh You Sir Knight Oh Look You Yes I Look Just Look Chicken Chickennn Look I Right Right I You Come What I You The Black Knight Have Come Ooh All Come Patsy Oh Oh I Running You Come I Pie Iesu Pie Iesu Pie Iesu A A A A Pie Iesu A A A A We A A A A We A A Burn Burn Burn We We A A A We May Burn Burn Burn Burn How She Right Yeah Yeah Bring I I Uh They Augh We And It Well Well The And Yeah We Right Yeaaah Yeaah Did No No No No No No Yes Yes Yes Yeah A A A She What Well A I Burn Burn Burn Burn Burn Quiet Quiet Quiet Quiet There Are Ah What Tell Tell Tell Burn Burn Burn Burn Burn And More Shh Wood So B Good Heh Oh Oh So Build Ah Oh Oh True Uhh Does No No No It Throw The Throw What Bread Apples Uh Cider Uh Cherries Mud Churches Churches Lead Lead A Oooh Exactly So If And A A A A Here Use We Ohh Ohh Burn Burn Burn Burn Burn Burn Burn Burn Burn Ahh Ahh Right Remove A A A It Burn Burn Burn Burn Burn Burn Who I Arthur King Britons My Good Sir Knight Camelot Round Table My I What Bedevere Then I Sir Bedevere Knight Round Table The Sir Bedevere King Arthur Sir Lancelot Brave Sir Gallahad Pure Sir Robin Sir Lancelot Dragon Angnor Chicken Bristol Battle Badon Hill aptly Sir Not Together Knights Round Table And This amazes Sir Bedevere Explain Oh Look Camelot Camelot Camelot It Shh Knights I Let Camelot We We We With We Camelot We We Round Table Our But That We Camelot We In Quite Between Clark Gable It Camelot I Well Camelot It Right Right Arthur Arthur King Britons Oh One I Sorry And Every I I What I O Lord Well It Psalms Now Yes Lord Right Arthur King Britons Knights Round Table Good O Lord Course Behold Arthur Holy Grail Look Arthur That Arthur Holy Grail A A Lord God King Arthur Halt Hallo Hallo Allo Who It King Arthur Knights Round Table Who This Guy Loimbard Go God If Holy Grail Well I I Uh What He Are Oh I Well Of You English Well I French Why I What England Mind If Grail You English Go I Arthur King English Thpppppt Thpppppt Thppt Thppt Thppt Thppt What Now I empty I You Is No I Now I Fetchez Fetchez Quoi Fetchez Fetchez If I Jesus Christ Christ Ah Ohh Right Charge Charge Hey There And Run Run Thppppt Thppppt Fiends I No No Sir I C Quoi Un What A Oh Oui Hurry What Let Oh On Bon Over What Well Launcelot Galahad I French Not Who U Launcelot Galahad I Uh Ohh Oh Um Run Run Run Run Run Run Run Run Oh Haw Haw Picture Schools Action Defeat King Arthur The French Arthur Holy Grail Arthur Grail Now Launcelot Aaaah S Frank The Tale Sir Robin So Sir Robin Ewing Bravely Sir Robin Camelot He O Sir Robin He Brave Sir Robin He Or To And Sir Robin His And And And That Heh Looks Anarcho Oh Dennis We Halt Who He Sir Robin Sir Robin Shut Um I What To Shut Um I Sir Knight I Ah W I I Knight Round Table You Knight Round Table I In I Shall I Oh I Well I I Oh Oh Perhaps I And Oh Get I Oh Yes What Yapping You You What You Oh I Anyway Well Oh Oh All All We Yes Oh All All Right He So He Brave Sir Robin No Bravely I When No Yes Sir Robin I And I He All Bravest Sir Robin I Pie Iesu Heh Pie Iesu Wayy Ho Woa Heh Heh Wayy Wayy Forgive Oh Oooo The Tale Sir Galahad Open Open In King Arthur Hello Welcome Sir Knight Welcome Castle Anthrax The Castle Anthrax Yes Oh Oh You Holy Grail The The Grail It Oh Midget Crapper Yes O Zoot Prepare Oh Thank Thank Thank Thank Thank Away The Well I I What Sir Galahad Chaste Mine Zoot Just Zoot Oh Look In God Grail Oh You No I It Sir Galahad You Well I I Oh I We Oooh It We Nay Nay Come Come You Oh No Oh No Lie Well They Uh B Oh You Doctor Piglet Doctor Winston Practice Try Are We There Please We Look This I Back At Torment I Grail There I I I Hello Oh Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Zoot No I Zoot Dingo Oh I Where I Grail I Oh Oh Bad Zoot Well Oh Zoot She I It It Grail Oh Zoot She Do We It I At Well It Get Yes Yes Oh I Get Oh Zoot Oh And Castle Anthrax You A A You And And And And And Yes A A There And The The Well I Sir Galahad Oh Quick What Quick Why You No Silence temptress You Come We Look I Come Sir Galahad No Look I Yes Let Yes Let No Sir Galahad Come No Really Honestly I I Oh Let Yes Let No Quick Quick Please I There Yes He We We He Oh We You I I Yes You Look No Look I No Holy Grail Come Oh No It I No I Sir Launcelot Sir Galahad temptation Grail Meanwhile King Arthur Sir Bedevere Oh I I Get Oh On Arthur I Heh Hee And Grail Ha Heh Ha Ha Where Heh Old Hee He And Grail The Grail There Gorge Eternal Peril But Grail Where Grail Seek Bridge Death The Bridge Death Grail Heh Ha Hee Ni Ni Ni Ni Ni Ni Who We Knights Who Say Ni Ni No Not Knights Who Say Ni The Who We Ni Peng Neee Neee Those The Knights Who Say Ni Knights Ni Ni Ni Ni Ni Ni Ni Ow Ow Ow Agh We Well We A Ni Ni Ni Ni Ow Oh Please No We You O Knights Ni One Of And Yes Now Hmm Oh Great Hm Hmm Hm Hmm Ohh Ay Thsss Ayy Thsss Ayy Stop Stop Ay Stop Look Clear Go Go Go Go And Clear Hah Bloody The Tale Sir Launcelot One What No Not All This But Mother Father Father B Father I Listen I When I Other I I It So I That So I That But And But I I Rather I Stop Stop You I Now In Britain B I Listen Alice Herbert Erbert We We But I Don What She She She I I I Cut Cut Look Princess Lucky Guards Make Prince I Not Hic No Until I Until No No You And Hic Right We No Leaving Leaving Yes All Right Hic Right Oh Yes What Oh Look Uh You Alright Hic Right Oh I Uhh N No You Oh We But No Just Until No Just Just Hic Get Get All Right We Hic And What Make The Prince Yes Make Oh Hic Ah I You Is Hic Oh No Right Where We No I Oh I Right But Father Shut And And Hic Oh Well Concorde Thank Most And Over Good Steady And Uuh Come Concorde Message Concorde Concorde Speak To I Please I Tall Tower Swamp Castle At A A This Holy Grail Brave Concorde Uh I I Well I I I I I Oh I Actually I I No Concorde Stay I I Idiom Idiom No I Farewell Concorde I I Shall I Yeah Morning Morning Oooh ptoo Ha Hiyya Hey Hiyya Ha Ha Huy Uuh Aaah Ha And Aah Hiyah Aah Aaah Hyy Hya Hiyya Ha Now O Sir Launcelot Camelot I Oh I You Uh I I You Uh You I I I Well I Stop Stop Stop Stop Who I No Uh I Sir Launcelot He Father Well Did Uh Oh Sorry They Well I Um I Don Sir Launcelot I You Well I I Hurry Sir Launcelot Hurry Shut You Well I Didn You Oh Is You This Well I I Camelot I Camelot Are Camelot Hurry Sir Launcelot Uh I Knight King Arthur Very Camelot Uh Is Hurry I Would Well I I Um Oooh I I I Oh Oooh Well We There Oh Ha Hey Ha Hold Stop Hold Hold Hold Hold Hold Please Sorry Sorry You I I I Sorry Sorry He Hold Hold Please Hold This Sir Launcelot Court Camelot Hello He Please Please This Let We Unfortunately Herbert Oh Oh But I I For He Since He For S Uugh Oh And I And I Princess Sir Launcelot Camelot What Look The Prince Oooh The Prince He No I You Tall Tower No I How Well I Not Not No Stop He He Shut He Shut He Shut He Not He He He He Quickly He Come He He No It He I Oh Dramatically Dramatically But Heee Hoa Hoo What Excuse Could King Arthur Old Is Who The Knights Who Say Ni Aggh No Never We If I Agh Do Very If No Never No Ni Nu No Nu No Nu No You No Ni Ni That That You Ni Ohh Ni Ni Agh Ni Ni Ni Ni Ni Are Erm Oh There Nothing Even Did Yes Shrubberies I My Roger Shrubber I Ni No No No O Knights Ni May It I But What We Knights Who Say Ni Ni Shh Shh We Knights Who Say Ecky zoop zoo zhiv Ni Therefore What O Knights Knights Who Til Recently Said Ni Firstly Not Ni Then A A A Ni Shh Ni Ni Ni Shh Shh Then We Oh Cut It Aaaugh Aaaugh Augh Ohh Don What I Knights Ni How Aaaaugh You What Agh No No You No Not My Sir Robin Packing And And Yes Sir Robin My It Now Surely Holy Grail He Shut No No Far He Aaaaugh I Aaaaugh Uh No Aaaaugh Aaaaugh Stop The Oh Ow He Patsy Wait I I Ooh I And That Ohh Aaaaugh And Arthur Bedevere Sir Robin Beyond Launcelot Galahad Yay Yay In frozen Nador Robin Get Eee And Yay A Winter Spring Mmm Spring Summer Oh Ahh Summer Winter Oh And Winter Spring Summer Autumn Aah Oh Waa Until King Arthur Eh Oh See Oh Oh Knights Forward What I By There Tim Greetings Tim Enchanter Greetings King Arthur You I zoosh You Holy Grail That You O Tim Quite Oh Yes Holy Grail Our Holy Grail Yeah Yes It It Yeah Yup Yup Hm And Yes Yeah We We We Ages Umhm Uh Look Fine Um I I A A A Yes I Y Yes Yup That Yes Oh Oh Thank Ahh Oh Fine Thank Splendid Aah Look Yes I Holy Grail Oh Oh To Caerbannog Olfin Bedwere Rheged Holy Grail Where O Tim Follow But Follow Bones So What They Then Dis Behold Caerbannog Right Keep What W Too What There Where There What It You What You Well Ohh That You I I Look Get He Oh You I What He Go Bors Chop Right Silly One Look Aaaugh Jesus Christ I I I Oh Oh Well I Oh Do Right Oh Charge Aaaaugh Aaaugh Run Run Run Run Ha Ha Ha Ha Right How Gawain Ector And Bors That Three Three Three And That Would Oh Let It Like Well Have No We Holy Hand Grenade Yes The Holy Hand Grenade Antioch Tis Brother Maynard Brother Maynard Bring Holy Hand Grenade Pie Iesu Pie Iesu Pie Iesu Pie Iesu How I Consult Book Armaments Armaments Chapter Chapter Two Nine Twenty And Saint Attila O Lord And Lord Skip Brother And Lord First Holy Pin Then Three Four excepting Five Once Holy Hand Grenade Antioch Amen Amen Right One Three Three There Look What What Brother Maynard You It Aramaic Of Joseph Arimathea Course What It Here Joseph Arimathea He Holy Grail Castle What Castle What He Oh Well Look He Well Perhaps Oh Well No Just Aauuggghhh Aaauggh Do Camaaaaaargue Where France I Isn Saint Aauuuves Cornwall No Saint Ives Oh Saint Iiiives Iiiiives Oooohoohohooo No Aauuuuugh Aauuugh N No Oooooooh Oh Yes I Oooh My God It Black Beast Aaauugh Black Beast Aaauugh That That Run Run Run Run Run Keep Shh Shh Shh Shh Shh Shh Shh Shh We Aagh As Black Beast Arthur Ulk The The Holy Grail There The Bridge Death Oh Look There What He Bridge Death He Three Three He Three Three What Then Gorge Eternal Peril Oh I Who Sir Robin Yes Brave Sir Robin Hey I Why Launcelot Yes Let I I No No Hang Hang Hang Just Three Three And I Good Sir Launcelot God Stop Who Bridge Death Ask I What My Sir Launcelot Camelot What To Holy Grail What Blue Right Off Oh Thank That Stop Who Bridge Death Ask I What Sir Robin Camelot What To Holy Grail What Assyria I Auuuuuuuugh Stop What Sir Galahad Camelot What I Grail What Blue No Hee Stop What It Arthur King Britons What To Holy Grail What What An African European Huh I I Auuuuuuuugh How Well Launcelot Launcelot Launcelot Launcelot Launcelot Launcelot Launcelot Launcelot Launcelot The Castle Aaagh Our God Almighty God Thee Thou Jesus Christ Allo English Monsieur Arthur King So French How I Knights Camelot God Himself How English I So French I In Lord No English I If In God Agh Right That Yes Ha Walk Just And And English Thpppt Thpppt We Yes Stand French Dappy Today In God Hoo Ohh Holy Grail God Ha Charge Hooray S Yes They I Come Anybody All Come Back S Get Back Right Just Come Come Put Clear Come With Which Oh Come Put Get We Ahh Ooh Come Back Riiight Come Run Run Pull My Come Back Back Right Come Everything All That Just Christ
for token in sent1:
... if token.islower():
... print (token, 'is a lowercase word')
... elif token.istitle():
... print (token, 'is a titlecase word')
... else:
... print (token, 'is punctuation')
...