[英]How to read wrongly formatted string in csv properly?
在csv中,一列在字符串中存在歧义。 因此,我在列表中获得6
值,而不是5
值作为输出。
码:
import csv
csv_data = csv.reader(file('test.csv'))
for row in csv_data:
print row
我试图用space
替换"
,以使普通字符串至少不带引号,如下所示,
for row in csv_data:
print [r.replace('"',' ') for r in row] # This did't worked as expected.
输入:
CSV文件中的行看起来像这样,
1,2,"text1", "Sample text ""present" in csv, as this",5
"Sample text "present" in csv, as this" # Error due to this value.
输出:
['1', '2', 'text1', 'Sample text present" in csv', 'as this', 5]
预期产量:
['1', '2', 'text1', 'Sample text "present" in csv, as this', 5]
这几乎令人尴尬,但是似乎至少可以在问题中显示的示例输入上起作用。 它通过对csvreader
读取的每一行进行后处理来工作,并尝试检测何时由于格式错误而错误地读取了它们,然后对其进行更正。
import csv
def read_csv(filename):
with open(filename, 'rb') as file:
for row in csv.reader(file, skipinitialspace=True, quotechar=None):
newrow = []
use_a = True
for a, b in zip(row, row[1:]):
# Detect bad formatting.
if (a.startswith('"') and not a.endswith('"')
and not b.startswith('"') and b.endswith('"')):
# Join misread field backs together.
newrow.append(', '.join((a,b)))
use_a = False
else:
if use_a:
newrow.append(a)
else:
newrow.append(b)
use_a = True
yield [field.replace('""', '"').strip('"') for field in newrow]
for row in read_csv('fmt_test2.csv'):
print(row)
输出:
['1', '2', 'text1', 'Sample text "present" in csv, as this', '5']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.