如何正确读取csv中格式错误的字符串？

Question

在csv中，一列在字符串中存在歧义。 因此，我在列表中获得6值，而不是5值作为输出。

码：

import csv
csv_data = csv.reader(file('test.csv'))
for row in csv_data:
    print row

我试图用space替换" ，以使普通字符串至少不带引号，如下所示，

for row in csv_data:
    print [r.replace('"',' ') for r in row] # This did't worked as expected.

输入：

CSV文件中的行看起来像这样，

1,2,"text1", "Sample text ""present" in csv, as this",5

"Sample text "present" in csv, as this" # Error due to this value.

输出：

['1', '2', 'text1', 'Sample text present" in csv', 'as this', 5]

预期产量：

['1', '2', 'text1', 'Sample text "present" in csv, as this', 5]

Answer 1

这几乎令人尴尬，但是似乎至少可以在问题中显示的示例输入上起作用。 它通过对csvreader读取的每一行进行后处理来工作，并尝试检测何时由于格式错误而错误地读取了它们，然后对其进行更正。

import csv

def read_csv(filename):
    with open(filename, 'rb') as file:
        for row in csv.reader(file, skipinitialspace=True, quotechar=None):
            newrow = []
            use_a = True
            for a, b in zip(row, row[1:]):
                # Detect bad formatting.
                if (a.startswith('"') and not a.endswith('"')
                        and not b.startswith('"') and b.endswith('"')):
                    # Join misread field backs together.
                    newrow.append(', '.join((a,b)))
                    use_a = False
                else:
                    if use_a:
                        newrow.append(a)
                    else:
                        newrow.append(b)
                        use_a = True
            yield [field.replace('""', '"').strip('"') for field in newrow]

for row in read_csv('fmt_test2.csv'):
    print(row)

输出：

['1', '2', 'text1', 'Sample text "present" in csv, as this', '5']

如何正确读取csv中格式错误的字符串？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-06-15 10:31:57

如何正确读取csv中格​​式错误的字符串？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-06-15 10:31:57

如何正确读取csv中格式错误的字符串？

解决方案1
1 已采纳 2019-06-15 10:31:57