I have two CSV files, one contains sentences with abbreviations, the other one is a list of abbreviations and their expansion. I want to identify each abbreviation in the first CSV file and replace it with its expansion. This is how this CSV files look:
sample of first file:
vp academic
vp finance and administration
vp academic and student affairs
vp corporate services and external relat. ....
sample of second file:
elect'l. : electrical
vp. : vice president
...
this is my code:
import csv
with open('firstFile.csv', 'rb') as sentence, open('secondFile.csv', 'rb')
as word,open('new.csv', 'wb') as out:
reader = csv.reader(sentence)
reader2 = csv.reader(word)
abbr_list = list(reader2)
filewriter = csv.writer(out, delimiter=' ')
result = ''
for row in reader:
for i in range (0,1453):
temp = abbr_list[i][0]
temp1 = abbr_list[i][1]
if temp in row[0]:
result = row[0].replace(temp,temp1)
row[0] = result
filewriter.writerow(row)
however, the result I get is not what I was expecting:
result file:
vice president academic
vice president financiale and administrategytegyyion
vice president academic and student affairs
vice president corporate services and executivecutiveternal relatin
Can someone help me to correct my code?
Your string replacement ( row[0].replace
) is not checking whether it matches an entire word. Thus, it's matching 'strat' and turning 'administration' into 'administrategyion', then changing it again into 'administrategyegyion' with the next replacement, etc.
You can either switch to the re
module to use regular expressions for string replacement, or you can use spaces as part of the match (eg row[0].replace(' '+temp+' ',' '+temp1+' ')
) - but be aware that the spaces approach will fail if the match is at the start or end of the string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.