The text is like "1-2years. 3years. 10years."
I want get result [(1,2),(3),(10)]
.
I use python.
I first tried r"([0-9]?)[-]?([0-9])years"
. It works well except for the case of 10. I also tried r"([0-9]?)[-]?([0-9]|10)years"
but the result is still [(1,2),(3),(1,0)]
.
This should work:
import re
st = '1-2years. 3years. 10years.'
result = [tuple(e for e in tup if e)
for tup in re.findall(r'(?:(\d+)-(\d+)|(\d+))years', st)]
# [('1', '2'), ('3',), ('10',)]
The regex will look for either one number, or two separated by a hyphen, immediately prior to the word years
. If we give this to re.findall()
, it will give us the output [('1', '2', ''), ('', '', '3'), ('', '', '10')]
, so we also use a quick list comprehension to filter out the empty strings.
Alternately we could use r'(\d+)(?:-(\d+))?years'
to basically the same effect, which is closer to what you've already tried.
Your attempt r"([0-9]?)[-]?([0-9])years"
doesn't work for the case of 10
because you ask it to match one (or zero) digit per group.
You also don't need the hyphen in brackets.
This should work: Regex101
(\d+)(?:-(\d+))?years
Explanation:
(\d+)
: Capturing group for one or more digits (?: )
: Non-capturing group -
: hyphen (\d+)
: Capturing group for one or more digits (?: )?
: Make the previous non-capturing group optional In python:
import re
result = re.findall(r"(\d+)(?:-(\d+))?years", "1-2years. 3years. 10years.")
# Gives: [('1', '2'), ('3', ''), ('10', '')]
Each tuple in the list contains two elements: The number on the left side of the hyphen, and the number on the right side of the hyphen. Removing the blank elements is quite easy: you loop over each item
in result
, then you loop over each match
in this item
and only select it (and convert it to int
) if it is not empty.
final_result = [tuple(int(match) for match in item if match) for item in result]
# gives: [(1, 2), (3,), (10,)]
You can use this pattern: (?:(\d+)-)?(\d+)years
See Regex Demo
Code:
import re
pattern = r"(?:(\d+)-)?(\d+)years"
text = "1-2years. 3years. 10years."
print([tuple(int(z) for z in x if z) for x in re.findall(pattern, text)])
Output:
[(1, 2), (3,), (10,)]
You only match a single digit as the character class [0-9]
is not repeated.
Another option is to match the first digits with an optional part for - and digits.
Then you can split the matches on -
\b(\d+(?:-\d+)?)years\.
\b
A word boundary (
Capture group 1 (which will be returned by re.findall)
\d+(?:-\d+)?
Match 1+ digits and optionally match -
and again 1+ digits)
Close group 1 years\.
Match literally with the escaped .
See a regex demo and a Python demo .
Example
pattern = r"\b(\d+(?:-\d+)?)years\."
s = "1-2years. 3years. 10years."
res = [tuple(v.split('-')) for v in re.findall(pattern, s)]
print(res)
Output
[('1', '2'), ('3',), ('10',)]
Or if a list of lists is also ok instead of tuples
res = [v.split('-') for v in re.findall(pattern, s)]
Output
[['1', '2'], ['3'], ['10']]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.