简体   繁体   English

python - 如何从行之间删除换行符而不从行尾删除换行符?

[英]How to remove new-line characters from in between a line without removing the new-line from end of the line python?

My input is a big csv file with rows like:我的输入是一个大的 csv 文件,其中的行如下:

"7807371008","Sat Jan 16 00:07:46 +0000 2010","@bigg_robb welcome to the party life of politics","T 33.417474,-86.705343","al","23845121","1381","502","Wed Mar 11 22:38:27 +0000 2009","2468"

My desired output is a new file with first and 3rd columns only with all special characters removed:我想要的输出是一个新文件,其中第一列和第三列仅删除了所有特殊字符:

7807371008,  bigg robb welcome to the party life of politics

But there are some lines wich newline characters in between the text even though it is not technically the end of that row.但是在文本之间有一些换行符的行,即使它在技术上不是该行的末尾。 In such cases, I'm getting the error:在这种情况下,我收到错误:

IndexError: list index out of range

An example of such rows is:此类行的一个示例是:

"7807376607","Sat Jan 16 00:07:57 +0000 2010","RT @CBS8News:The commander of Gov. Riley's task
force on illegal gambling resigns after winning $2,300 at a MS casino.
gt;#conflictofinterest","Montgomery, Alabama","al","33358058","84","164","Mon Apr 20 00:48:37 +0000 2009","4509"

My code is:我的代码是:

import csv
import sys
import re

with open('al.csv') as f:
    for line in f:

        j = next(csv.reader([line]))
        id1 = j[0]
        id2 = re.sub('[^A-Za-z0-9\.]+',' ',id1)
        tt1 = j[2]
        tt2 = re.sub('[^A-Za-z0-9\.]+',' ',tt1)
        print id2.strip()+", "+tt2.lower()

How do I resolve this?我该如何解决? Please help.请帮忙。

You should specified the comma , as your csv file delimiter (or a correct delimiter based on your file) also csv reader object hasn't lines that you loop over that you need to access to rows by looping over a reader object ( spamreader ) :您应该指定逗号,因为您的 csv 文件分隔符(或基于您的文件的正确分隔符)而且 csv 阅读器对象没有您需要循环的行,您需要通过循环reader对象( spamreader )来访问行:

>>> import csv
>>> with open('al.csv', 'rb') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=',')
...     for row in spamreader:
            print re.sub('[^A-Za-z0-9\.]+',' ',row[2]) + row[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM