infile = open('file.txt', 'r')
string = infile.read()
def extract_edu(string):
with open('totaleducation.txt', 'r') as totaledu:
edu_set=[]
for edu in totaledu:
if edu in string:
print(edu)
edu_set.append(edu)
return edu_set
I want to extract the word from string which match in totaleducation file. If return's correctly if it is in one word like BCA but when i extract like MCA (Master of Computer Application) it ignore this line.
String is just a document text file like ACADEMICS:
Year
Degree
Institute/College
University
CGPA/Percentage
2016
MSc (Computer-Science)
South Asian University
South Asian University
6.6/9
2012
BCA
Ignou, Patna
IGNOU
65
2009
Class XII(Science)
BSSRPP Inter College Deoria
BHSIE
61
2006
Class X
Buxar High School
BSEB
67.8
, and totaleducation.txt is just like
MSc
BCA
MCA
Master's of Science
.
After long discussions, we clarified that the question is:
I have one text file called totaleducation.txt
, and another text/csv file called sample.txt
; and I want to find every word in totaleducation.txt
that also exists in sample.txt
.
So for this, you should read totaleducation.txt
line-by-line, and check if each of those lines exist in any line of sample.txt
.
def match():
words = []
with open('totaleducation.txt', 'r') as f1:
for edu in f1:
with open('sample.txt', 'r') as f2:
for string in f2:
if edu.strip('\n') in string.strip('\n'):
words.append(edu.strip('\n'))
return words
Calling match()
will give you all of the words of totaleducation.txt
that also exist in any line of sample.txt
.
Pay attention to the .strip('\\n')
. Say in file1 you have 'MSc' and in file2 you have 'MSc (Computer-Science)'. It will fail to verify that 'MSc' is in 'MSc (Computer-Science)' if you omit the .strip('\\n')
; because actually the two lines are 'MSc\\n' and 'MSc (Computer-Science)\\n', and the first one is not in the second.
Second way of doing the same thing -if your files are not too big to cause memory issues-:
education = []
with open('totaleducation.txt', 'r') as f1:
for line in f1:
education.append(line.strip('\n'))
sample = []
with open('sample.txt', 'r') as f2:
for line in f2:
sample.append(line.strip('\n'))
match = [e for e in education for s in sample if e in s]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.