简体   繁体   中英

compare two file and find matching words in python

I have a two file: the first one includes terms and their frequency:

table 2
apple 4
pencil 89

The second file is a dictionary:

abroad
apple
bread
...

I want to check whether the first file contains any words from the second file. For example both the first file and the second file contains "apple". I am new to python. I try something but it does not work. Could you help me ? Thank you

for line in dictionary:
    words = line.split()
    print words[0]

for line2 in test:
    words2 = line2.split()
    print words2[0]

Something like this:

with open("file1") as f1,open("file2") as f2:
    words=set(line.strip() for line in f1)   #create a set of words from dictionary file

    #why sets? sets provide an O(1) lookup, so overall complexity is O(N)

    #now loop over each line of other file (word, freq file)
    for line in f2:
        word,freq=line.split()   #fetch word,freq 
        if word in words:        #if word is found in words set then print it
            print word

output:

apple

It may help you :

file1 = set(line.strip() for line in open('file1.txt'))

file2 = set(line.strip() for line in open('file2.txt'))

for line in file1 & file2:

    if line:

        print line

Here's what you should do:

  • First, you need to put all the dictionary words in some place where you can easily look them up. If you don't do that, you'd have to read the whole dictionary file every time you want to check one single word in the other file.

  • Second, you need to check if each word in the file is in the words you extracted from the dictionary file.

For the first part, you need to use either a list or a set . The difference between these two is that list keeps the order you put the items in it. A set is unordered, so it doesn't matter which word you read first from the dictionary file. Also, a set is faster when you look up an item, because that's what it is for.

To see if an item is in a set, you can do: item in my_set which is either True or False.

I have your first double list in try.txt and the single list in try_match.txt

f = open('try.txt', 'r')
f_match = open('try_match.txt', 'r')
print f
dictionary = []
for line in f:
    a, b = line.split()
    dictionary.append(a)

for line in f_match:
    if line.split()[0] in dictionary:
        print line.split()[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM