[英]How to remove new-line characters from in between a line without removing the new-line from end of the line python?
我的输入是一个大的 csv 文件,其中的行如下:
"7807371008","Sat Jan 16 00:07:46 +0000 2010","@bigg_robb welcome to the party life of politics","T 33.417474,-86.705343","al","23845121","1381","502","Wed Mar 11 22:38:27 +0000 2009","2468"
我想要的输出是一个新文件,其中第一列和第三列仅删除了所有特殊字符:
7807371008, bigg robb welcome to the party life of politics
但是在文本之间有一些换行符的行,即使它在技术上不是该行的末尾。 在这种情况下,我收到错误:
IndexError: list index out of range
此类行的一个示例是:
"7807376607","Sat Jan 16 00:07:57 +0000 2010","RT @CBS8News:The commander of Gov. Riley's task
force on illegal gambling resigns after winning $2,300 at a MS casino.
gt;#conflictofinterest","Montgomery, Alabama","al","33358058","84","164","Mon Apr 20 00:48:37 +0000 2009","4509"
我的代码是:
import csv
import sys
import re
with open('al.csv') as f:
for line in f:
j = next(csv.reader([line]))
id1 = j[0]
id2 = re.sub('[^A-Za-z0-9\.]+',' ',id1)
tt1 = j[2]
tt2 = re.sub('[^A-Za-z0-9\.]+',' ',tt1)
print id2.strip()+", "+tt2.lower()
我该如何解决? 请帮忙。
您应该指定逗号,
因为您的 csv 文件分隔符(或基于您的文件的正确分隔符)而且 csv 阅读器对象没有您需要循环的行,您需要通过循环reader
对象( spamreader
)来访问行:
>>> import csv
>>> with open('al.csv', 'rb') as csvfile:
... spamreader = csv.reader(csvfile, delimiter=',')
... for row in spamreader:
print re.sub('[^A-Za-z0-9\.]+',' ',row[2]) + row[0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.