Alright, I have a school assignment where I neet to compare two files to one another. It's very simple, the program needs to show things like all of the unique words in these two file, example;
file1: This is a test
file2: This is not a test
output: ["This", "is", "a", "test", "not"]
That's the output I expected from this little piece of code:
def unique_words(file_1, file_2):
unique_words_list = []
for word in file_1:
unique_words_list.append(word)
for word in file_2:
if word not in file_1:
unique_words_list.append(word)
return unique_words_list
but that doesn't happen, sadly this is the output:
['this\\n', 'is\\n', 'a\\n', 'test', 'this\\n', 'is\\n', 'not\\n', 'a\\n', 'test']
I have multiple functions that pretty much work the same way and also have similar outputs. I know why the \\n appears, I don't know how to get rid of it though. If anyone could help me get the right output with this that would be a great help :)
The solution from Steampunkery is incorrect: (1) it doesn't handle files with >1 word per line, and (2) it doesn't account for repeated words in file1.txt (try it with file1 lines "word word word word" -- should get one "word" output, but you get four). Also the for/if
construct is unneeded.
Here is a compact and correct solution.
Contents of file1.txt:
the cat and the dog
the lime and the lemon
Contents of file2.txt:
the mouse and the bunny
dogs really like meat
The code:
def unique(infiles):
words = set()
for infile in infiles:
words.update(set([y for x in [l.strip().split() for l in open(infile, 'r').readlines()] for y in x]))
return words
print unique(['file1.txt'])
print unique(['file2.txt'])
print unique(['file1.txt', 'file2.txt',])
The output:
set(['and', 'lemon', 'the', 'lime', 'dog', 'cat'])
set(['and', 'like', 'bunny', 'the', 'really', 'mouse', 'dogs', 'meat'])
set(['and', 'lemon', 'like', 'mouse', 'dog', 'cat', 'bunny', 'the', 'really', 'meat', 'dogs', 'lime'])
Two lessons for Python learners:
set
Here is a little snippet I wrote reusing some of your code:
#!/usr/bin/env python3.6
with open('file1.txt', 'r') as file1, open('file2.txt', 'r') as file2:
file_1 = file1.readlines()
file_1 = [line.rstrip() for line in file_1]
file_2 = file2.readlines()
file_2 = [line.rstrip() for line in file_2]
def unique_words(file_1, file_2):
unique_words_list = file_1
for word in file_2:
if word not in unique_words_list:
unique_words_list.append(word)
return unique_words_list
print(unique_words(file_1, file_2))
This script assumes that you have 2 files named file1.txt
and file2.txt
, respectively in the same directory as the script. From your example, we are also assuming that each word is on it's own line. Here's a walk through:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.