简体   繁体   English

当在python中读取csv时没有引用整个字段时,如何忽略引号中的逗号?

[英]How to ignore commas in quotes when the entire field is not quoted while reading a csv in python?

I have data like, below and when reading as CSV, I don't want to consider comma when its within the quotes even if the quotes are not immediate to the separator (like record #2).我有如下数据,当读取为 CSV 时,我不想在引号内考虑逗号,即使引号不是直接到分隔符(如记录 #2)。 1 and 3 records are good if we use separator, but failing on 2nd record.如果我们使用分隔符,第 1 条和第 3 条记录很好,但在第 2 条记录上失败。 I tried escapeCharacter but not working.我尝试了 escapeCharacter 但没有工作。 Input:输入:

col1, col2, col3
a, b, c
a, b1 "b2, b3" b4, c
"a1, a2", b, c

Expected output for 2nd record is:第二条记录的预期输出是:

  1. a
  2. b1 "b2, b3" b4
  3. c

Actual output:实际输出:

  1. a
  2. b1 "b2
  3. b3" b4

Updated更新

There might be a better solution.可能有更好的解决方案。 But top of my head, I can only think of this approach.但是在我的脑海中,我只能想到这种方法。

If you see pattern, the splitted sub_string will always be next to each other.如果您看到模式,则拆分后的 sub_string 将始终彼此相邻。 So, after splitting, we can combine consecutive sub_strings if there exists " .因此,拆分后,如果存在" ,我们可以合并连续的子字符串。

sample_strings = [
    'col1, col2, col3',
    'a, b, c',
    'a, b1 "b2, b3, test, test1, test2, test3" b4, c',
    '"a1, a2", b, c',
]


for string in sample_strings:
    splitted = string.split(', ')
    result = []
    to_escape = -1
    for index, value in enumerate(splitted):
        if index <= to_escape:
            continue

        if '"' in value:
            value = value + ', '
            index = index + 1
            while '"' not in splitted[index]:
                value += splitted[index] + ', '
                index += 1
            value += splitted[index]
            result.append(value)
            to_escape = index 

        else:
            result.append(value)
    
    print(result)

Output:输出:

['col1', 'col2', 'col3']
['a', 'b', 'c']
['a', 'b1 "b2, b3, test, test1, test2, test3" b4', 'c']
['"a1, a2"', 'b', 'c']

Any chance you could change the delimiter when creating the CSV files?创建 CSV 文件时,您是否有可能更改分隔符? I usually use semicolon instead of comma to avoid issues like that.我通常使用分号而不是逗号来避免这样的问题。

You can also tell python what's the delimiter in csv_reader = csv.reader(csv_file, delimiter=';')您还可以告诉python csv_reader = csv.reader(csv_file, delimiter=';') 中的分隔符是什么

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM