简体   繁体   中英

How to get the word which exactly match with word in file in python

infile = open('file.txt', 'r')
string = infile.read()

def extract_edu(string):
    with open('totaleducation.txt', 'r') as totaledu:
        edu_set=[]
        for edu in totaledu:
            if edu in string:
                print(edu)
                edu_set.append(edu)
    return edu_set

I want to extract the word from string which match in totaleducation file. If return's correctly if it is in one word like BCA but when i extract like MCA (Master of Computer Application) it ignore this line.

String is just a document text file like ACADEMICS:

Year
Degree
Institute/College
University
CGPA/Percentage
2016
MSc (Computer-Science)
South Asian University
South Asian University
6.6/9
2012
BCA
Ignou, Patna
IGNOU
65
2009
Class XII(Science)
BSSRPP Inter College Deoria
BHSIE
61
2006
Class X
Buxar High School
BSEB
67.8
, and totaleducation.txt is just like 
MSc
BCA
MCA
Master's of Science

.

After long discussions, we clarified that the question is:

I have one text file called totaleducation.txt , and another text/csv file called sample.txt ; and I want to find every word in totaleducation.txt that also exists in sample.txt .

So for this, you should read totaleducation.txt line-by-line, and check if each of those lines exist in any line of sample.txt .

def match():
    words = []
    with open('totaleducation.txt', 'r') as f1:
        for edu in f1:
            with open('sample.txt', 'r') as f2:
                for string in f2:
                    if edu.strip('\n') in string.strip('\n'):
                        words.append(edu.strip('\n'))
    return words

Calling match() will give you all of the words of totaleducation.txt that also exist in any line of sample.txt .

Pay attention to the .strip('\\n') . Say in file1 you have 'MSc' and in file2 you have 'MSc (Computer-Science)'. It will fail to verify that 'MSc' is in 'MSc (Computer-Science)' if you omit the .strip('\\n') ; because actually the two lines are 'MSc\\n' and 'MSc (Computer-Science)\\n', and the first one is not in the second.

Second way of doing the same thing -if your files are not too big to cause memory issues-:

education = []
with open('totaleducation.txt', 'r') as f1:
    for line in f1:
        education.append(line.strip('\n'))

sample = []
with open('sample.txt', 'r') as f2:
    for line in f2:
        sample.append(line.strip('\n'))

match = [e for e in education for s in sample if e in s]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM