使用正则表达式拆分此字符串-python

Question

Input string
---------------
South Africa 109/0 
Australia 100
Sri Lanka 111
Sri Lanka 331/4

Expected Output
---------------
['South Africa', '109', '0']
['Australia', '100']
['Sri Lanka', '111']
['Sri Lanka', '331', '4']

我尝试了几种正则表达式，但无法弄清楚编写正确的正则表达式。 在这种情况下，空格分隔符对我没有帮助，因为国家名称可能带有空格，也可能没有空格（南非，印度）。 提前致谢

Answer 1

我们可以使用正则表达式：

r'(\D+)\s(\d+)(?:/(\d+))?'

（“很多非数字，后跟一个空格，然后是很多数字，然后可选地，后面跟着一个斜杠，然后是很多数字。”）

这将返回，例如

>>> [re.match(r'(\D+)\s(\d+)(?:/(\d+))?', x).groups() 
...  for x in ['South Africa 109/0', 
...            'Australia 100',
...            'Sri Lanka 111',
...            'Sri Lanka 331/4']]
[('South Africa', '109', '0'), 
 ('Australia', '100', None), 
 ('Sri Lanka', '111', None), 
 ('Sri Lanka', '331', '4')]

注意None ，您可能需要手动将其过滤掉。

Answer 2

尝试：

import re
re.split(r"(?<=[a-zA-Z])\s+(?=\d)|(?=\d)\s+(?=[a-zA-Z])|/", "South Africa 109/0")

Answer 3

re.compile("^([\w\s]+)\s(\d+)\/?(\d+)?")

给您三个小组。 我们可以分解它

行（ ^ ）开头的一组只有字母和空格([\\w\\s]+) ）
空间
一组数字，至少一个(\\d+)
一个/不
一组数字（可能是None ）

Answer 4

这是您需要的正则表达式：

for match in re.finditer(r"(?m)^(?P<Country>.*?)\s*(?P<Number1>\d+)\s*?/?\s*?(?P<Number2>\d*?)\s*?$", inputText):
    country = match.group("Country")
    number1 = match.group("Number1")
    number2 = match.group("Number2")

您可以在此处查看结果。

这是该模式的说明：

# ^(?P<Country>.*?)\s*(?P<Number1>\d+)\s*?/?\s*?(?P<Number2>\d*?)\s*?$
# 
# Options: ^ and $ match at line breaks
# 
# Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
# Match the regular expression below and capture its match into backreference with name “Country” «(?P<Country>.*?)»
#    Match any single character that is not a line break character «.*?»
#       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regular expression below and capture its match into backreference with name “Number1” «(?P<Number1>\d+)»
#    Match a single digit 0..9 «\d+»
#       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match the character “/” literally «/?»
#    Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match the regular expression below and capture its match into backreference with name “Number2” «(?P<Number2>\d*?)»
#    Match a single digit 0..9 «\d*?»
#       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert position at the end of a line (at the end of the string or before a line break character) «$»

Answer 5

您已经用正则表达式得到了答案，但是我建议您也考虑可用的内置str方法（无论如何针对此用例）：

s = 'South Africa 109/0'
country, numbers = s.rsplit(' ', 1)
# ('South Africa', '109/0')
new_list = [country] + numbers.split('/')
# ['South Africa', '109', '0']

使用正则表达式拆分此字符串-python

问题描述

5 个解决方案

解决方案1
2 2012-09-13 09:23:41

解决方案2
1 2012-09-13 09:26:19

解决方案3
0 2012-09-13 09:18:52

解决方案4
0 2012-09-13 09:33:19

解决方案5
0 2012-09-13 09:52:34

使用正则表达式拆分此字符串-python

问题描述

5 个解决方案

解决方案1 2 2012-09-13 09:23:41

解决方案2 1 2012-09-13 09:26:19

解决方案3 0 2012-09-13 09:18:52

解决方案4 0 2012-09-13 09:33:19

解决方案5 0 2012-09-13 09:52:34

解决方案1
2 2012-09-13 09:23:41

解决方案2
1 2012-09-13 09:26:19

解决方案3
0 2012-09-13 09:18:52

解决方案4
0 2012-09-13 09:33:19

解决方案5
0 2012-09-13 09:52:34