- mashup 
Mashup() is a function that compare two similar texts and produce a third text that randomly choose between the original texts. The outcome is a text with the differences chosen through random choice.
A function that: 1. takes into account 2 similar texts (example: 2 different translations of a poems) 2. finds the fixed_words and uses them as the fixed text for the new piece of text 3. puts the results together into a new piece, randomly choosing the different options 4. html output that highlights the different random choices of the translations
how to use: to use this function it is necessary to have two texts with the same number of lines (as it goes throught the two texts and compares them line by line). It can be use also with list of strings.
input: 2 texts that are similar but not the exact copy of each other --
output: a new text that showcase the differences // a new text made out of random choice // still not clear yet // extracted pdf/txt file?
# define texts text1= ''' The glasses were empty The bottle was shattered The bed was wide open The door was tight shuttered Each shard was a star Of bliss and of beauty That flashed on the floor All dusty and dirty And I was dead drunk Lit up wildly ablaze You were drunk and alive In a naked embrace! ''' text2= ''' So the glasses were empty and the bottle broken And the bed was wide open and the door closed And all of the glass stars of happiness and beauty were sparkling in the dust of the poorly dusted room. And I was dead drunk And I was a bonfire And you were alive, drunk, all naked in my arms. '''
import difflib from random import choice def mashup(text1,text2): #take into account 2 texts text1 = text1.splitlines() #split texts in lines text2 = text2.splitlines() fixed_words=  #define empty list for fixed_words (words that are the same in both texts) // a list of lists of words for line_A, line_B in zip(text1, text2): #start the first loop reading line by line from both texts at the same time (=zip) words_A = line_A.split() #split lines in lists of words words_B = line_B.split() d = difflib.Differ() #Differ compare sequences of lines of text, and produce human-readable differences ('+' in text1), ('-' in text2), ('' fixed_Words) diff = d.compare(words_A, words_B) #compare the difference between the two lists of words linelist =  #define empty list for result in diff: #second loop that goes through all the lines and then the words of both texts simultaneously code, word = result.split(' ', 1) #split result of diff in code [('+'), ('-') or ('')] and the resulting word (is it the same or is it just in one of the two texts?) word = word.strip() #to be sure it doesn't have any weird things /n at the ends of the lines if code == '' : #if the code is ' ' (nothing) it means that the word can be found in both texts linelist.append(word) #if this happens, put the corresponding words in the linelist fixed_words.append(linelist) #afterwards, add linelist to fixed_words (linelist is inside the loop so all the words in every line are put in there, but fixed_words is outside so that just the words are added just once) length = len(text1) #define lenght of text1 for linenumber in range(length): #for the number of the lines in the lenght of the text cut_left1 = 0 #the beginning of both texts is position n°0 (on the left side of the lines) cut_left2 = 0 words_1 = text1[linenumber].split() #words_1 is split in words keeping the position in the lines words_2 = text2[linenumber].split() if len(fixed_words[linenumber]) > 0: #if the index on the fixed words in the line is more than 0 (it's not the first one) for fixed_word in fixed_words[linenumber]: #for all the fixed_words that are in the fixed_words list always following the linenumbers cut_right1 = words_1.index(fixed_word) #finding the first fixed_word from the left (beginning / position 0) to the right(end of sentence / last word in the line) cut_right2 = words_2.index(fixed_word) #in both texts slice_1 = words_1[cut_left1 : cut_right1] #create slice_1 slice_2 = words_2[cut_left2 : cut_right2] print(choice([slice_1, slice_2])) cut_left1 = cut_right1 #now invert, when it's gone through all the words till finding the last fixed word cut_left2 = cut_right2 slice_1 = words_1[cut_left1 :] #from the last fixed_word found to the right slice_2 = words_2[cut_left2 :] print(choice([slice_1, slice_2])) #choose else: slice_1 = words_1[cut_left1 :] #here is doing it outside of the loop ( it gets the last word of the line if it's not a slice_2 = words_2[cut_left2 :] print(choice([slice_1, slice_2])) #choose print('--------')
 -------- ['So', 'the'] ['glasses'] ['were'] ['empty'] -------- ['and', 'the'] ['bottle', 'was', 'shattered'] -------- ['The'] ['bed'] ['was'] ['wide'] ['open'] -------- ['and', 'the'] ['door', 'was', 'tight', 'shuttered'] -------- ['And', 'all', 'of', 'the', 'glass', 'stars'] -------- ['of', 'happiness'] ['and'] ['beauty'] -------- ['were', 'sparkling', 'in'] ['the', 'dust'] -------- ['All', 'dusty', 'and', 'dirty'] --------  ['And'] ['I'] ['was'] ['dead'] ['drunk'] -------- ['And', 'I', 'was', 'a', 'bonfire'] -------- ['And', 'you'] ['were', 'alive,', 'drunk,'] -------- ['In', 'a'] ['naked', 'in', 'my', 'arms.'] --------