简体   繁体   中英

compare two text files (order does not matter) and output the words the two files have in common to a third file

I just started programming and I am trying to compare two files that look like this:

file1:
tootsie roll
apple
in the evening

file2:
hello world
do something
apple

output:
"Apple appears x times in file 1 and file 2"

I am honestly stumped. I have tried creating dictionaries, lists, tuples, sets and I cannot seem to get the output I want. The closest I got was having the lines outputted as exactly as shown for file1/file2.

I have tried several snippets of code from here and I cannot seem to get any of them to output what I want. Any help would be greatly appreciated!!

Here is the last bit of code that I tried and it did not give me any output to my third file.

f1 = open("C:\\Users\\Cory\\Desktop\\try.txt", 'r')
f2 = open("C:\\Users\\Cory\\Desktop\\match.txt", 'r')
output = open("C:\\Users\\Cory\\Desktop\\output.txt", 'w')

file1 = set(f1)
file2 = set(f2)
file(word,freq)
for line in f2:
    word, freq = line.split()
    if word in words:
        output.write("Both files have the following words: " + file1.intersection(file2))
f1.close()
f2.close()
output.close()

You don't need all those loops - if the files are small (ie, less than several hundred MB), you can work with them more directly:

words1 = f1.read().split()
words2 = f2.read().split()
words = set(words1) & set(words2)

words will then be a set containing all the words those files have in common. You can ignore case by using lower() before splitting the text.

To have a count of each word as you mention in a comment, simply use the count() method:

with open('outfile.txt', 'w') as output:
    for word in words:
        output.write('{} appears {} times in f1 and {} times in f2.\n'.format(word, words1.count(word), words2.count(word))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM