繁体   English   中英

正则表达式以逗号分隔,但忽略冒号附近的逗号前导词

[英]Regex to split by comma, but ignore commas proceeding words near a colon

我正在尝试使用 python 用逗号分隔字符串,但允许用户在某些密钥对中包含逗号。 这是我正在使用的字符串的两个示例:

title.search:The relation between visualization size, grouping, and user performance,publication_year:2020

author.id:c33432,title.search:The relation between visualization size, grouping, and user performance,publication_year:2020

我希望它变成:

["title.search:The relation between visualization size, grouping, and user performance", "publication_year:2020"]

["author.id:c33432", "title.search:The relation between visualization size, grouping, and user performance", "publication_year:2020"]

对我有帮助的是,冒号之前的部分(键)将始终以三种格式之一编写,例如:

  1. 类型
  2. 作者.id
  3. author.institutions.country_code

所以它可以是一个单词,两个单词之间用句点隔开,或者三个单词用句点隔开。

关于这是否可能的任何想法?

据我所见,您试图在文本中用逗号分隔,在这种情况下,正则表达式是\w,\w

请您尝试以下方法:

#!/usr/bin/python

import re

s = ['title.search:The relation between visualization size, grouping, and user performance,publication_year:2020',
'author.id:c33432,title.search:The relation between visualization size, grouping, and user performance,publication_year:2020']

for str in s:
    m = re.split(r',(?=\s*[\w.]+:)', str)
    print(m)

Output:

['title.search:The relation between visualization size, grouping, and user performance', 'publication_year:2020']
['author.id:c33432', 'title.search:The relation between visualization size, grouping, and user performance', 'publication_year:2020']

正则表达式,(?=\s*[\w.]+:)匹配逗号后跟

  • 零个或多个空白字符
  • 一系列单词字符和/或点字符
  • 冒号字符

为了。
然后将字符串拆分为满足上述条件的逗号。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM