[英]Split string by comma, but ignore commas within brackets
我正在嘗試使用 python 以逗號分隔字符串:
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
但我想忽略方括號 [] 中的任何逗號。 所以上面的結果是:
["year:2020", "concepts:[ab553,cd779]", "publisher:elsevier"]
有人對如何執行此操作有建議嗎? 我試着像這樣使用 re.split:
params = re.split(",(?![\w\d\s])", param)
但它不能正常工作。
result = re.split(r",(?!(?:[^,\[\]]+,)*[^,\[\]]+])", subject, 0)
, # Match the character “,” literally
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
(?: # Match the regular expression below
[^,\[\]] # Match any single character NOT present in the list below
# The literal character “,”
# The literal character “[”
# The literal character “]”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
, # Match the character “,” literally
)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[^,\[\]] # Match any single character NOT present in the list below
# The literal character “,”
# The literal character “[”
# The literal character “]”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
] # Match the character “]” literally
)
更新以支持括號中的 2 個以上的項目。 例如
year:2020,concepts:[ab553,cd779],publisher:elsevier,year:2020,concepts:[ab553,cd779,xx345],publisher:elsevier
您可以使用用戶定義的 function 而不是拆分來解決此問題:
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
def split_by_commas(s):
lst = list()
last_bracket = ''
word = ""
for c in s:
if c == '[' or c == ']':
last_bracket = c
if c == ',' and last_bracket == ']':
lst.append(word)
word = ""
continue
elif c == ',' and last_bracket == '[':
word += c
continue
elif c == ',':
lst.append(word)
word = ""
continue
word += c
lst.append(word)
return lst
main_lst = split_by_commas(s)
print(main_lst)
上述代碼運行結果:
['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']
此正則表達式適用於您的示例:
,(?=[^,]+?:)
在這里,我們使用正向前瞻來查找逗號,后跟非逗號和冒號字符,然后是冒號。 這會正確找到您正在搜索的<comma><key>
模式。 當然,如果允許鍵有逗號,則必須進一步調整。
你可以在這里查看正則表達式
使用僅具有前瞻的模式來斷言右側的字符,如果左側有伴隨字符,則不會斷言。
除了使用拆分之外,您還可以匹配方括號之間的 1 個或多個重復值,或者匹配除逗號之外的任何字符。
(?:[^,]*\[[^][]*])+[^,]*|[^,]+
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
params = re.findall(r"(?:[^,]*\[[^][]*])+[^,]*|[^,]+", s)
print(params)
Output
['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']
我采用了@Bemwa 的解決方案(不適用於我的用例)
def split_by_commas(s):
lst = list()
brackets = 0
word = ""
for c in s:
if c == "[":
brackets += 1
elif c == "]":
if brackets > 0:
brackets -= 1
elif c == "," and not brackets:
lst.append(word)
word = ""
continue
word += c
lst.append(word)
return lst
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.