[英]Regular expression to extract number with hyphen
The text is like "1-2years. 3years. 10years."文本类似于“1-2 年。3 年。10 年。”
I want get result [(1,2),(3),(10)]
.我想要得到结果
[(1,2),(3),(10)]
。
I use python.我使用 python。
I first tried r"([0-9]?)[-]?([0-9])years"
.我首先尝试
r"([0-9]?)[-]?([0-9])years"
。 It works well except for the case of 10. I also tried r"([0-9]?)[-]?([0-9]|10)years"
but the result is still [(1,2),(3),(1,0)]
.它工作得很好,除了 10 的情况。我也试过
r"([0-9]?)[-]?([0-9]|10)years"
但结果仍然是[(1,2),(3),(1,0)]
。
This should work:这应该有效:
import re
st = '1-2years. 3years. 10years.'
result = [tuple(e for e in tup if e)
for tup in re.findall(r'(?:(\d+)-(\d+)|(\d+))years', st)]
# [('1', '2'), ('3',), ('10',)]
The regex will look for either one number, or two separated by a hyphen, immediately prior to the word years
.正则表达式将在单词
years
之前查找一个或两个用连字符分隔的数字。 If we give this to re.findall()
, it will give us the output [('1', '2', ''), ('', '', '3'), ('', '', '10')]
, so we also use a quick list comprehension to filter out the empty strings.如果我们将其提供给
re.findall()
,它将为我们提供 output [('1', '2', ''), ('', '', '3'), ('', '', '10')]
,因此我们还使用快速列表理解来过滤掉空字符串。
Alternately we could use r'(\d+)(?:-(\d+))?years'
to basically the same effect, which is closer to what you've already tried.或者,我们可以使用
r'(\d+)(?:-(\d+))?years'
达到基本相同的效果,这更接近您已经尝试过的效果。
Your attempt r"([0-9]?)[-]?([0-9])years"
doesn't work for the case of 10
because you ask it to match one (or zero) digit per group.您的尝试
r"([0-9]?)[-]?([0-9])years"
不适用于10
的情况,因为您要求它匹配每组一个(或零)位数字。
You also don't need the hyphen in brackets.您也不需要括号中的连字符。
This should work: Regex101这应该有效: Regex101
(\d+)(?:-(\d+))?years
Explanation:解释:
(\d+)
: Capturing group for one or more digits (\d+)
:捕获一个或多个数字的组(?: )
: Non-capturing group (?: )
: 非捕获组-
: hyphen -
:连字符(\d+)
: Capturing group for one or more digits (\d+)
:捕获一个或多个数字的组(?: )?
: Make the previous non-capturing group optional In python:在 python 中:
import re
result = re.findall(r"(\d+)(?:-(\d+))?years", "1-2years. 3years. 10years.")
# Gives: [('1', '2'), ('3', ''), ('10', '')]
Each tuple in the list contains two elements: The number on the left side of the hyphen, and the number on the right side of the hyphen.列表中的每个元组包含两个元素:连字符左侧的数字和连字符右侧的数字。 Removing the blank elements is quite easy: you loop over each
item
in result
, then you loop over each match
in this item
and only select it (and convert it to int
) if it is not empty.删除空白元素非常简单:循环遍历
result
中的每个item
,然后循环遍历该项目中的每个match
item
,如果它不为空,则只返回 select(并将其转换为int
)。
final_result = [tuple(int(match) for match in item if match) for item in result]
# gives: [(1, 2), (3,), (10,)]
You can use this pattern: (?:(\d+)-)?(\d+)years
您可以使用此模式:
(?:(\d+)-)?(\d+)years
Code:代码:
import re
pattern = r"(?:(\d+)-)?(\d+)years"
text = "1-2years. 3years. 10years."
print([tuple(int(z) for z in x if z) for x in re.findall(pattern, text)])
Output: Output:
[(1, 2), (3,), (10,)]
You only match a single digit as the character class [0-9]
is not repeated.您只匹配一个数字,因为字符 class
[0-9]
不重复。
Another option is to match the first digits with an optional part for - and digits.另一种选择是将第一位数字与 - 和数字的可选部分匹配。
Then you can split the matches on -
然后你可以拆分比赛
-
\b(\d+(?:-\d+)?)years\.
\b
A word boundary \b
单词边界(
Capture group 1 (which will be returned by re.findall) (
捕获组 1 (将由 re.findall 返回)
\d+(?:-\d+)?
Match 1+ digits and optionally match -
and again 1+ digits-
并再次匹配 1+ 位数字)
Close group 1 )
关闭组 1years\.
Match literally with the escaped .
.
See a regex demo and a Python demo .请参阅正则表达式演示和Python 演示。
Example例子
pattern = r"\b(\d+(?:-\d+)?)years\."
s = "1-2years. 3years. 10years."
res = [tuple(v.split('-')) for v in re.findall(pattern, s)]
print(res)
Output Output
[('1', '2'), ('3',), ('10',)]
Or if a list of lists is also ok instead of tuples或者如果列表的列表也可以而不是元组
res = [v.split('-') for v in re.findall(pattern, s)]
Output Output
[['1', '2'], ['3'], ['10']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.