简体   繁体   English

如何在python中的csv文件中识别和替换单词

[英]how to identify and replace words in a csv file in python

I have two CSV files, one contains sentences with abbreviations, the other one is a list of abbreviations and their expansion. 我有两个CSV文件,一个包含缩写词的句子,另一个是缩写词及其扩展名的列表。 I want to identify each abbreviation in the first CSV file and replace it with its expansion. 我想在第一个CSV文件中标识每个缩写,并用其扩展名替换它。 This is how this CSV files look: 此CSV文件的外观如下:

sample of first file: 第一个文件样本:

vp academic 虚拟学术

vp finance and administration 副总裁财务和行政

vp academic and student affairs vp学术和学生事务

vp corporate services and external relat. vp公司服务和外部关系。 .... ....

sample of second file: 第二个文件的样本:

elect'l. 当选 : electrical :电气

vp. 副总裁 : vice president : 副总统

... ...

this is my code: 这是我的代码:

import csv
with open('firstFile.csv', 'rb') as sentence, open('secondFile.csv', 'rb')
as word,open('new.csv', 'wb') as out:   
reader = csv.reader(sentence)
reader2 = csv.reader(word)
abbr_list = list(reader2) 
filewriter = csv.writer(out, delimiter=' ') 

result = ''
for row in reader:
    for i in range (0,1453):
        temp = abbr_list[i][0]
        temp1 = abbr_list[i][1]
        if temp in row[0]:
            result = row[0].replace(temp,temp1)
            row[0] = result

    filewriter.writerow(row)

however, the result I get is not what I was expecting: 但是,我得到的结果不是我所期望的:

result file: 结果文件:

vice president academic 学术副校长

vice president financiale and administrategytegyyion 财务与行政副总裁

vice president academic and student affairs 学术和学生事务副总裁

vice president corporate services and executivecutiveternal relatin 公司服务副总裁和执行官内部关系

Can someone help me to correct my code? 有人可以帮助我更正我的代码吗?

Your string replacement ( row[0].replace ) is not checking whether it matches an entire word. 字符串替换( row[0].replace )不检查它是否与整个单词匹配。 Thus, it's matching 'strat' and turning 'administration' into 'administrategyion', then changing it again into 'administrategyegyion' with the next replacement, etc. 因此,它匹配“ strat”并将“ administration”转换为“ administrategyion”,然后将其再次更改为“ administrategyegyion”,并进行下一次替换,依此类推。

You can either switch to the re module to use regular expressions for string replacement, or you can use spaces as part of the match (eg row[0].replace(' '+temp+' ',' '+temp1+' ') ) - but be aware that the spaces approach will fail if the match is at the start or end of the string. 您可以切换到re模块以使用正则表达式进行字符串替换,也可以使用空格作为匹配项的一部分(例如row[0].replace(' '+temp+' ',' '+temp1+' ') )) -但请注意,如果匹配位于字符串的开头或结尾,则空格方法将失败。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM