简体   繁体   English

用逗号分割字符串,但忽略括号内的逗号

[英]Split string by comma, but ignore commas within brackets

I'm trying to split a string by commas using python:我正在尝试使用 python 以逗号分隔字符串:

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"

But I want to ignore any commas within brackets [].但我想忽略方括号 [] 中的任何逗号。 So the result for above would be:所以上面的结果是:

["year:2020", "concepts:[ab553,cd779]", "publisher:elsevier"]

Anybody have advice on how to do this?有人对如何执行此操作有建议吗? I tried to use re.split like so:我试着像这样使用 re.split:

params = re.split(",(?![\w\d\s])", param)

But it is not working properly.但它不能正常工作。

result = re.split(r",(?!(?:[^,\[\]]+,)*[^,\[\]]+])", subject, 0)
,                 # Match the character “,” literally
(?!               # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
   (?:               # Match the regular expression below
      [^,\[\]]          # Match any single character NOT present in the list below
                           # The literal character “,”
                           # The literal character “[”
                           # The literal character “]”
         +                 # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      ,                 # Match the character “,” literally
   )
      *                 # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   [^,\[\]]          # Match any single character NOT present in the list below
                        # The literal character “,”
                        # The literal character “[”
                        # The literal character “]”
      +                 # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   ]                 # Match the character “]” literally
)

Updated to support more than 2 items in brackets.更新以支持括号中的 2 个以上的项目。 Eg例如

year:2020,concepts:[ab553,cd779],publisher:elsevier,year:2020,concepts:[ab553,cd779,xx345],publisher:elsevier

You can work this out using a user-defined function instead of split:您可以使用用户定义的 function 而不是拆分来解决此问题:

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"


def split_by_commas(s):
    lst = list()
    last_bracket = ''
    word = ""
    for c in s:
        if c == '[' or c == ']':
            last_bracket = c
        if c == ',' and last_bracket == ']':
            lst.append(word)
            word = ""
            continue
        elif c == ',' and last_bracket == '[':
            word += c
            continue
        elif c == ',':
            lst.append(word)
            word = ""
            continue
        word += c
    lst.append(word)
    return lst
main_lst = split_by_commas(s)

print(main_lst)

The result of the run of above code:上述代码运行结果:

['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']

This regex works on your example:此正则表达式适用于您的示例:

,(?=[^,]+?:)

Here, we use a positive lookahead to look for commas followed by non-comma and colon characters, then a colon.在这里,我们使用正向前瞻来查找逗号,后跟非逗号和冒号字符,然后是冒号。 This correctly finds the <comma><key> pattern you are searching for.这会正确找到您正在搜索的<comma><key>模式。 Of course, if the keys are allowed to have commas, this would have to be adapted a little further.当然,如果允许键有逗号,则必须进一步调整。

You can check out the regexr here你可以在这里查看正则表达式

Using a pattern with only a lookahead to assert a character to the right, will not assert if there is an accompanying character on the left.使用仅具有前瞻的模式来断言右侧的字符,如果左侧有伴随字符,则不会断言。

Instead of using split, you could either match 1 or more repetitions of values between square brackets, or match any character except a comma.除了使用拆分之外,您还可以匹配方括号之间的 1 个或多个重复值,或者匹配除逗号之外的任何字符。

(?:[^,]*\[[^][]*])+[^,]*|[^,]+

Regex demo正则表达式演示

s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
params = re.findall(r"(?:[^,]*\[[^][]*])+[^,]*|[^,]+", s)
print(params)

Output Output

['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']

I adapted @Bemwa's solution (which didn't work for my use-case)我采用了@Bemwa 的解决方案(不适用于我的用例)

def split_by_commas(s):
    lst = list()
    brackets = 0
    word = ""
    for c in s:
        if c == "[":
            brackets += 1
        elif c == "]":
            if brackets > 0:
                brackets -= 1
        elif c == "," and not brackets:
            lst.append(word)
            word = ""
            continue
        word += c
    lst.append(word)
    return lst

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在逗号上分割字符串,但忽略双引号内的逗号? - Split string on commas but ignore commas within double-quotes? 基于逗号拆分,但忽略双引号内的逗号 - Split based on commas but ignore commas within double-quotes Python在逗号处分割字符串,忽略数字逗号 - Python split string at comma, ignore comma in numbers 正则表达式以逗号分隔,但忽略冒号附近的逗号前导词 - Regex to split by comma, but ignore commas proceeding words near a colon 如何在逗号上将字符串拆分为数组但忽略括号中的逗号 - how to split string into array on commas but ignore commas in parentheses 逗号中的拆分字符串在python中的圆括号或大括号中不存在 - Split string on comma not present in round brackets or curly brackets in python 字符串之间用逗号分隔,但带有条件(忽略逗号分隔的单个单词) - String separation by commas, but with a condition (ignore comma separated single word) 在逗号上分割字符串,但忽略单引号中的逗号,并在python中分割字符串后创建字典 - split string on commas but ignore commas with in single quotes and create a dictionary after string split in python String.split 使用正则表达式忽略方括号内的内容 - String.split ignore content inside square brackets with regex 如何拆分括号内用逗号分隔的字符串 - How to split string separated by a comma within the bracket
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM