[英]Split this string using regular expression - python
Input string
---------------
South Africa 109/0
Australia 100
Sri Lanka 111
Sri Lanka 331/4
Expected Output
---------------
['South Africa', '109', '0']
['Australia', '100']
['Sri Lanka', '111']
['Sri Lanka', '331', '4']
我尝试了几种正则表达式,但无法弄清楚编写正确的正则表达式。 在这种情况下,空格分隔符对我没有帮助,因为国家名称可能带有空格,也可能没有空格(南非,印度)。 提前致谢
我们可以使用正则表达式:
r'(\D+)\s(\d+)(?:/(\d+))?'
(“很多非数字,后跟一个空格,然后是很多数字,然后可选地,后面跟着一个斜杠,然后是很多数字。”)
这将返回,例如
>>> [re.match(r'(\D+)\s(\d+)(?:/(\d+))?', x).groups()
... for x in ['South Africa 109/0',
... 'Australia 100',
... 'Sri Lanka 111',
... 'Sri Lanka 331/4']]
[('South Africa', '109', '0'),
('Australia', '100', None),
('Sri Lanka', '111', None),
('Sri Lanka', '331', '4')]
注意None
,您可能需要手动将其过滤掉。
尝试:
import re
re.split(r"(?<=[a-zA-Z])\s+(?=\d)|(?=\d)\s+(?=[a-zA-Z])|/", "South Africa 109/0")
re.compile("^([\w\s]+)\s(\d+)\/?(\d+)?")
给您三个小组。 我们可以分解它
^
)开头的一组只有字母和空格([\\w\\s]+)
) (\\d+)
/
不 None
) 这是您需要的正则表达式:
for match in re.finditer(r"(?m)^(?P<Country>.*?)\s*(?P<Number1>\d+)\s*?/?\s*?(?P<Number2>\d*?)\s*?$", inputText):
country = match.group("Country")
number1 = match.group("Number1")
number2 = match.group("Number2")
您可以在此处查看结果。
这是该模式的说明:
# ^(?P<Country>.*?)\s*(?P<Number1>\d+)\s*?/?\s*?(?P<Number2>\d*?)\s*?$
#
# Options: ^ and $ match at line breaks
#
# Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
# Match the regular expression below and capture its match into backreference with name “Country” «(?P<Country>.*?)»
# Match any single character that is not a line break character «.*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regular expression below and capture its match into backreference with name “Number1” «(?P<Number1>\d+)»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match the character “/” literally «/?»
# Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match the regular expression below and capture its match into backreference with name “Number2” «(?P<Number2>\d*?)»
# Match a single digit 0..9 «\d*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert position at the end of a line (at the end of the string or before a line break character) «$»
您已经用正则表达式得到了答案,但是我建议您也考虑可用的内置str
方法(无论如何针对此用例):
s = 'South Africa 109/0'
country, numbers = s.rsplit(' ', 1)
# ('South Africa', '109/0')
new_list = [country] + numbers.split('/')
# ['South Africa', '109', '0']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.