简体   繁体   English

正则表达式匹配单词之间有空格的单词

[英]regex match for words with spaces in between them

I have this regex pattern ([^\\s|:]+):\\s*([^\\s|:]+) which works well for name:jones|location:london|age:23 . 我有这个正则表达式模式([^\\s|:]+):\\s*([^\\s|:]+)可以很好地用于name:jones|location:london|age:23 How can I extend the regex pattern to cover spaced words having space between them or words combined with numbers, for example: full name:jones hardy|city and dialling code :london 0044|age:23 years 我如何扩展正则表达式模式以覆盖它们之间有空格的间隔单词或带数字组合的单词,例如: full name:jones hardy|city and dialling code :london 0044|age:23 years

>>> ("full name", "jones hardy") ("city and dialling code", "london 0044")("age","23 years")

This situation seems like it calls for re.split . 这种情况似乎需要re.split

>>> s = "full name:jones hardy|city and dialling " \
...     "code :london 0044|age:23 years"
>>> [tuple(re.split('\s*:\s*', t))
...  for t in re.split('\s*\|\s*', s)]
[('full name', 'jones hardy'),
 ('city and dialling code', 'london 0044'),
 ('age', '23 years')]
>>> s= "full name:jones hardy|city and dialling code :london 0044|age:23 years"
>>> r=r"([^|:]+?)\s*:\s*([^|:]+)"
>>> re.findall(r, s)
[('full name', 'jones hardy '), ('city and dialling code', 'london 0044'), ('age', '23 years')]

So, the space at the end of 'city and dialling code ' will be eliminated. 因此, 'city and dialling code '末尾的空间将被消除。

But if there are spaces beforce '|' 但是,如果有空格,请强制使用'|' , it will not be eliminated: ,它不会被消除:

>>> s="full name:jones hardy |city and dialling code :london 0044|age:23 years"
>>> re.findall(r, s)
[('full name', 'jones hardy '), ('city and dialling code', 'london 0044'), ('age', '23 years')]

The will be a space at the end of 'jones hardy ' . 将会是'jones hardy '结尾处的空格。

EDIT 编辑

r"\\s*([\\w\\s]+?)\\s*:\\s*([\\w\\s]+?)\\s*(?:\\||$)" will eliminate all spaces at the begin and the end of the target string: r"\\s*([\\w\\s]+?)\\s*:\\s*([\\w\\s]+?)\\s*(?:\\||$)"将消除所有空格目标字符串的开始和结束:

>>> s
'  full name: jones hardy | city and dialling code :london 0044|age:23 years'
>>> r=r"\s*([\w\s]+?)\s*:\s*([\w\s]+?)\s*(?:\||$)"
>>> re.findall(r, s)
[('full name', 'jones hardy'), ('city and dialling code', 'london 0044'), ('age', '23 years')]

Simplify your regex, to capture everything except the delimiter which in your case is colon : or pipe | 简化您的正则表达式,以捕获除定界符(在您的情况下为冒号:或竖线|

>>> r = r"([^:|]+)\s*:\s*([^:|]+)"
>>> st = "full name:jones hardy|city and dialling code :london 0044"
>>> re.findall(r, st)
[('full name', 'jones hardy'), ('city and dialling code ', 'london 0044')]
>>> st="name:jones|location:london|age:23"
>>> re.findall(r, st)
[('name', 'jones'), ('location', 'london'), ('age', '23')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM