简体   繁体   中英

Comparing two files with python

Alright, I have a school assignment where I neet to compare two files to one another. It's very simple, the program needs to show things like all of the unique words in these two file, example;

file1: This is a test

file2: This is not a test

output: ["This", "is", "a", "test", "not"]

That's the output I expected from this little piece of code:

def unique_words(file_1, file_2):
    unique_words_list = []
    for word in file_1:
        unique_words_list.append(word)
    for word in file_2:
        if word not in file_1:
            unique_words_list.append(word)
    return unique_words_list

but that doesn't happen, sadly this is the output:

['this\\n', 'is\\n', 'a\\n', 'test', 'this\\n', 'is\\n', 'not\\n', 'a\\n', 'test']

I have multiple functions that pretty much work the same way and also have similar outputs. I know why the \\n appears, I don't know how to get rid of it though. If anyone could help me get the right output with this that would be a great help :)

The solution from Steampunkery is incorrect: (1) it doesn't handle files with >1 word per line, and (2) it doesn't account for repeated words in file1.txt (try it with file1 lines "word word word word" -- should get one "word" output, but you get four). Also the for/if construct is unneeded.

Here is a compact and correct solution.

Contents of file1.txt:

the cat and the dog
the lime and the lemon

Contents of file2.txt:

the mouse and the bunny
dogs really like meat

The code:

def unique(infiles):
    words = set()
    for infile in infiles:
        words.update(set([y for x in [l.strip().split() for l in open(infile, 'r').readlines()] for y in x]))
    return words

print unique(['file1.txt'])
print unique(['file2.txt'])
print unique(['file1.txt', 'file2.txt',])

The output:

set(['and', 'lemon', 'the', 'lime', 'dog', 'cat'])
set(['and', 'like', 'bunny', 'the', 'really', 'mouse', 'dogs', 'meat'])
set(['and', 'lemon', 'like', 'mouse', 'dog', 'cat', 'bunny', 'the', 'really', 'meat', 'dogs', 'lime'])

Two lessons for Python learners:

  1. Use the tools the language gives you, like set
  2. Think about input conditions that break your algorithm

Here is a little snippet I wrote reusing some of your code:

#!/usr/bin/env python3.6

with open('file1.txt', 'r') as file1, open('file2.txt', 'r') as file2:
    file_1 = file1.readlines()
    file_1 = [line.rstrip() for line in file_1]
    file_2 = file2.readlines()
    file_2 = [line.rstrip() for line in file_2]


def unique_words(file_1, file_2):
    unique_words_list = file_1
    for word in file_2:
        if word not in unique_words_list:
            unique_words_list.append(word)
    return unique_words_list


print(unique_words(file_1, file_2))

This script assumes that you have 2 files named file1.txt and file2.txt , respectively in the same directory as the script. From your example, we are also assuming that each word is on it's own line. Here's a walk through:

  1. Open both files and read their lines into a list, removing newlines with a list comprehension
  2. Define a function that adds all the words in the first file to a list then adds all the words that are not in that list from the second file to the list
  3. Print the output of that function using our files we read in earlier as input.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM