简体   繁体   English


[英]Python parsing CSV string with possible sets

I have a CSV string where some of the items might be enclosed by {} with commas inside.我有一个 CSV 字符串,其中一些项目可能用{}括起来,里面有逗号。 I wanted to collect the string values in a list.我想收集列表中的字符串值。

What is the most pythonic way to collect the values in a list?在列表中收集值的最pythonic 方法是什么?

Example 1: 'a,b,c' , expected output ['a', 'b', 'c']示例 1: 'a,b,c' ,预期输出['a', 'b', 'c']

Example 2: '{aa,ab}, b, c' , expected output ['{aa,ab}','b','c']示例 2: '{aa,ab}, b, c' ,预期输出['{aa,ab}','b','c']

Example 3: '{aa,ab}, {bb,b}, c' , expected output ['{aa,ab}', '{bb,b}', 'c']示例 3: '{aa,ab}, {bb,b}, c' ,预期输出['{aa,ab}', '{bb,b}', 'c']

I have tried to work with s.split(',') , it works for example 1 but will mess up for case 2 and 3.我曾尝试使用s.split(',') ,它适用于示例 1,但会在案例 2 和 3 中搞砸。

I believe that this question ( How to split but ignore separators in quoted strings, in python? ) is very similar to my problem.我相信这个问题( 在 python 中如何拆分但忽略带引号的字符串中的分隔符? )与我的问题非常相似。 But I can't figure out the proper regex syntax to use.但我无法弄清楚要使用的正确正则表达式语法。

The solution is very similar in fact:解决方案实际上非常相似:

import re
PATTERN = re.compile(r'''\s*((?:[^,{]|\{[^{]*\})+)\s*''')
data = '{aa,ab}, {bb,b}, c'

will give:会给:

['{aa,ab}', '{bb,b}', 'c']

A more readable way (at least to me) is to explain what you are looking for: either something between brackets { } or something that only contains alphanumeric characters:一种更易读的方式(至少对我而言)是解释您要查找的内容:括号 { } 之间的内容或仅包含字母数字字符的内容:

import re 

examples = [
  '{aa,ab}, b, c',
  '{aa,ab}, {bb,b}, c'

for example in examples:
  print(re.findall(r'(\{.+?\}|\w+)', example))

It prints它打印

['a', 'b', 'c']
['{aa,ab}', 'b', 'c']
['{aa,ab}', '{bb,b}', 'c']

Note that it is not necessary to use a regex, you can just use pure Python:请注意,没有必要使用正则表达式,您可以使用纯 Python:

s = '{aa,ab}, {bb,b}, c'
commas = [i for i, c in enumerate(s) if c == ',' and \
                                             s[:i].count('{') == s[:i].count('}')]
[s[2:b] for a, b in zip([-2] + commas, commas + [None])]
#['{aa,ab}', '{bb,b}', 'c']

A more simple pure python approach replacing {} with "":一种更简单的纯 python 方法用 "" 替换 {}:

def parseCSV(string):

    results = []
    current = ''
    quoted = False
    quoting = False

    for i in range(0, len(string)):
        currentletter = string[i]

        if currentletter == '"':
            if quoted == True:
                if quoting == True:
                    current = current + currentletter
                    quoting = False 
                    quoting = True

                quoted = True
                quoting = False


            shouldCheck  = False

            if quoted == True:

                if quoting == True:
                    quoted = False
                    quoting = False

                    shouldCheck = True

                    current = current + currentletter

                shouldCheck = True

            if shouldCheck == True:
                if currentletter == ',':
                    current = ''

                    current = current +  currentletter

    return results

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM