Regular expression to extract number with hyphen

Question

The text is like "1-2years. 3years. 10years."

I want get result [(1,2),(3),(10)] .

I use python.

I first tried r"([0-9]?)[-]?([0-9])years" . It works well except for the case of 10. I also tried r"([0-9]?)[-]?([0-9]|10)years" but the result is still [(1,2),(3),(1,0)] .

Answer 1

This should work:

import re

st = '1-2years. 3years. 10years.'
result = [tuple(e for e in tup if e) 
          for tup in re.findall(r'(?:(\d+)-(\d+)|(\d+))years', st)]
# [('1', '2'), ('3',), ('10',)]

The regex will look for either one number, or two separated by a hyphen, immediately prior to the word years . If we give this to re.findall() , it will give us the output [('1', '2', ''), ('', '', '3'), ('', '', '10')] , so we also use a quick list comprehension to filter out the empty strings.

Alternately we could use r'(\d+)(?:-(\d+))?years' to basically the same effect, which is closer to what you've already tried.

Answer 2

Your attempt r"([0-9]?)[-]?([0-9])years" doesn't work for the case of 10 because you ask it to match one (or zero) digit per group.

You also don't need the hyphen in brackets.

This should work: Regex101

(\d+)(?:-(\d+))?years

Explanation:

(\d+) : Capturing group for one or more digits
(?: ) : Non-capturing group
- : hyphen
(\d+) : Capturing group for one or more digits
(?: )? : Make the previous non-capturing group optional

In python:

import re

result = re.findall(r"(\d+)(?:-(\d+))?years", "1-2years. 3years. 10years.")

# Gives: [('1', '2'), ('3', ''), ('10', '')]

Each tuple in the list contains two elements: The number on the left side of the hyphen, and the number on the right side of the hyphen. Removing the blank elements is quite easy: you loop over each item in result , then you loop over each match in this item and only select it (and convert it to int ) if it is not empty.

final_result = [tuple(int(match) for match in item if match) for item in result]

# gives: [(1, 2), (3,), (10,)]

Answer 3

You can use this pattern: (?:(\d+)-)?(\d+)years

See Regex Demo

Code:

import re

pattern = r"(?:(\d+)-)?(\d+)years"
text = "1-2years. 3years. 10years."
print([tuple(int(z) for z in x if z) for x in re.findall(pattern, text)])

Output:

[(1, 2), (3,), (10,)]

Answer 4

You only match a single digit as the character class [0-9] is not repeated.

Another option is to match the first digits with an optional part for - and digits.

Then you can split the matches on -

\b(\d+(?:-\d+)?)years\.

\b A word boundary
( Capture group 1 (which will be returned by re.findall)
- \d+(?:-\d+)? Match 1+ digits and optionally match - and again 1+ digits
) Close group 1
years\. Match literally with the escaped .

See a regex demo and a Python demo .

Example

pattern = r"\b(\d+(?:-\d+)?)years\."
s = "1-2years. 3years. 10years."

res = [tuple(v.split('-')) for v in re.findall(pattern, s)]
print(res)

Output

[('1', '2'), ('3',), ('10',)]

Or if a list of lists is also ok instead of tuples

res = [v.split('-') for v in re.findall(pattern, s)]

Output

[['1', '2'], ['3'], ['10']]

Regular expression to extract number with hyphen

Question

4 answers

solution1
1 2021-09-28 16:07:17

solution2
1 ACCPTED 2021-09-28 16:11:39

solution3
1 2021-09-28 16:12:57

solution4
1 2021-09-28 17:09:14

Regular expression to extract number with hyphen

Question

4 answers

solution1 1 2021-09-28 16:07:17

solution2 1 ACCPTED 2021-09-28 16:11:39

solution3 1 2021-09-28 16:12:57

solution4 1 2021-09-28 17:09:14

solution1
1 2021-09-28 16:07:17

solution2
1 ACCPTED 2021-09-28 16:11:39

solution3
1 2021-09-28 16:12:57

solution4
1 2021-09-28 17:09:14