简体   繁体   English

用连字符提取数字的正则表达式

[英]Regular expression to extract number with hyphen

The text is like "1-2years. 3years. 10years."文本类似于“1-2 年。3 年。10 年。”

I want get result [(1,2),(3),(10)] .我想要得到结果[(1,2),(3),(10)]

I use python.我使用 python。

I first tried r"([0-9]?)[-]?([0-9])years" .我首先尝试r"([0-9]?)[-]?([0-9])years" It works well except for the case of 10. I also tried r"([0-9]?)[-]?([0-9]|10)years" but the result is still [(1,2),(3),(1,0)] .它工作得很好,除了 10 的情况。我也试过r"([0-9]?)[-]?([0-9]|10)years"但结果仍然是[(1,2),(3),(1,0)]

This should work:这应该有效:

import re

st = '1-2years. 3years. 10years.'
result = [tuple(e for e in tup if e) 
          for tup in re.findall(r'(?:(\d+)-(\d+)|(\d+))years', st)]
# [('1', '2'), ('3',), ('10',)]

The regex will look for either one number, or two separated by a hyphen, immediately prior to the word years .正则表达式将在单词years之前查找一个或两个用连字符分隔的数字。 If we give this to re.findall() , it will give us the output [('1', '2', ''), ('', '', '3'), ('', '', '10')] , so we also use a quick list comprehension to filter out the empty strings.如果我们将其提供给re.findall() ,它将为我们提供 output [('1', '2', ''), ('', '', '3'), ('', '', '10')] ,因此我们还使用快速列表理解来过滤掉空字符串。

Alternately we could use r'(\d+)(?:-(\d+))?years' to basically the same effect, which is closer to what you've already tried.或者,我们可以使用r'(\d+)(?:-(\d+))?years'达到基本相同的效果,这更接近您已经尝试过的效果。

Your attempt r"([0-9]?)[-]?([0-9])years" doesn't work for the case of 10 because you ask it to match one (or zero) digit per group.您的尝试r"([0-9]?)[-]?([0-9])years"不适用于10的情况,因为您要求它匹配每组一个(或零)位数字。

You also don't need the hyphen in brackets.您也不需要括号中的连字符。

This should work: Regex101这应该有效: Regex101

(\d+)(?:-(\d+))?years

Explanation:解释:

  • (\d+) : Capturing group for one or more digits (\d+) :捕获一个或多个数字的组
  • (?: ) : Non-capturing group (?: ) : 非捕获组
  • - : hyphen - :连字符
  • (\d+) : Capturing group for one or more digits (\d+) :捕获一个或多个数字的组
  • (?: )? : Make the previous non-capturing group optional : 使前面的非捕获组可选

In python:在 python 中:

import re

result = re.findall(r"(\d+)(?:-(\d+))?years", "1-2years. 3years. 10years.")

# Gives: [('1', '2'), ('3', ''), ('10', '')]

Each tuple in the list contains two elements: The number on the left side of the hyphen, and the number on the right side of the hyphen.列表中的每个元组包含两个元素:连字符左侧的数字和连字符右侧的数字。 Removing the blank elements is quite easy: you loop over each item in result , then you loop over each match in this item and only select it (and convert it to int ) if it is not empty.删除空白元素非常简单:循环遍历result中的每个item ,然后循环遍历该项目中的每个match item ,如果它不为空,则只返回 select(并将其转换为int )。

final_result = [tuple(int(match) for match in item if match) for item in result]

# gives: [(1, 2), (3,), (10,)]

You can use this pattern: (?:(\d+)-)?(\d+)years您可以使用此模式: (?:(\d+)-)?(\d+)years

See Regex Demo请参阅正则表达式演示

Code:代码:

import re

pattern = r"(?:(\d+)-)?(\d+)years"
text = "1-2years. 3years. 10years."
print([tuple(int(z) for z in x if z) for x in re.findall(pattern, text)])

Output: Output:

[(1, 2), (3,), (10,)]

You only match a single digit as the character class [0-9] is not repeated.您只匹配一个数字,因为字符 class [0-9]不重复。

Another option is to match the first digits with an optional part for - and digits.另一种选择是将第一位数字与 - 和数字的可选部分匹配。

Then you can split the matches on -然后你可以拆分比赛-

\b(\d+(?:-\d+)?)years\.
  • \b A word boundary \b单词边界
  • ( Capture group 1 (which will be returned by re.findall) (捕获组 1 (将由 re.findall 返回)
    • \d+(?:-\d+)? Match 1+ digits and optionally match - and again 1+ digits匹配 1+ 位数字并可选择匹配-并再次匹配 1+ 位数字
  • ) Close group 1 )关闭组 1
  • years\. Match literally with the escaped .从字面上与转义匹配.

See a regex demo and a Python demo .请参阅正则表达式演示Python 演示

Example例子

pattern = r"\b(\d+(?:-\d+)?)years\."
s = "1-2years. 3years. 10years."

res = [tuple(v.split('-')) for v in re.findall(pattern, s)]
print(res)

Output Output

[('1', '2'), ('3',), ('10',)]

Or if a list of lists is also ok instead of tuples或者如果列表的列表也可以而不是元组

res = [v.split('-') for v in re.findall(pattern, s)]

Output Output

[['1', '2'], ['3'], ['10']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM