繁体   English   中英

Python正则表达式从csv文件中获取一些/不是所有引号

[英]Python regex to get some/not all quote marks out of csv file

我有一个.csv文件,所有字段都用双引号隔开,但是某些字段中有随机双引号/ UPDATE这有点不正确,我包括两行,其中第二行是问题。 在原始版本中,我没有在结尾加上双引号,这是第一个解决方案的问题,该解决方案可以正常工作,但在/ n之前删除引号:

"20135025373","25","2013-08-24 00:00:00","WOOD","CHRISTY","","","2679 W. LONG CIRCLE","","LITTLETON","CO","80120","","3510862","2013-09-03 00:00:00","Monetary (Itemized)","Credit/Debit Card","Individual","","Issue Committee","A WHOLE LOT OF PEOPLE FOR JOHN MORSE","","","","N","N","0","STATEWIDE",""

“ 20135025373”,“ 10”,“ 2013-08-24 00:00:00”,“ DAVIS”,“ JOHN”,“”,“”,“ 2822 THIRD”“,”,“ BOULDER”,“ CO “,” 80304“,”“,” 3510863“,” 2013-09-03 00:00:00“,”货币(分项)“,”信用卡/借记卡“,”个人“,”“,”发行委员会“ “,”约翰逊的一大批人“,”“,”“,”,“ N”,“ N”,“ 0”,“ STATEWIDE”,“”

我尝试了这段代码,但它也删除了行首和结尾的引号。

import re

with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
    new.writelines(re.sub(r'(?<!,)"(?!,)', '', line) for line in old)

任何想法表示赞赏!

如果可以使用csv模块,请先查看删除csv文件中的字段内引号

如果您想通过使用正则表达式来做到这一点,我想这就足够了。

re.sub(r'(?<=[^,])"(?=[^,])', '', line)

查看工作Demo

如果您不想在行的开头和结尾处匹配引号,则可以使用此正则表达式:

(?<!,|^)\"(?!,|$)

代替:

(?<!,)"(?!,)

在此处查看演示: http : //regex101.com/r/cI7mW5

您可以使用csv模块代替re吗? 它可能已经内置了此智能。

我对csv感到生锈。 以下代码未经测试,但可以为您提供一个起点。

import csv

with open('ColoSOS/2014_ContData.csv') as old, open('2014contx.csv', 'w') as new:
    reader = csv.reader(old, delimiter = ','; quotechar = '"')
    new.writelines(row) for row in reader    

参考: https : //docs.python.org/2/library/csv.html

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM