I am trying to extract all string and digit numbers from a text.
text = 'one tweo three 10 number'
numbers = "(^a(?=\s)|one|two|three|four|five|six|seven|eight|nine|ten| \
eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen| \
eighteen|nineteen|twenty|thirty|forty|fifty|sixty|seventy|eighty| \
ninety|hundred|thousand)"
print re.search(numbers, text).group(0)
This gives me first words digit.
my expected result = ['one', 'two', 'three', '10']
How can I modify it so that all words and well digit numbers I Can get in list?
There are several issues here:
(?x)
at the start) nine
will match nine
in ninety
, so you should either put the longer values first, or use word boundaries \\b
\\b
as a backspace and not a word boundary |\\d+
branch to your number matching group re.findall
(or re.finditer
), not re.search
. Here is my suggestion:
import re
text = 'one two three 10 number eleven eighteen ninety \n '
numbers = r"""(?x) # Turn on free spacing mode
(
^a(?=\s)| # Here we match a at the start of string before whitespace
\d+| # HERE we match one or more digits
\b # Initial word boundary
(?:
one|two|three|four|five|six|seven|eight|nine|ten|
eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|
eighteen|nineteen|twenty|thirty|forty|fifty|sixty|seventy|eighty|
ninety|hundred|thousand
) # A list of alternatives
\b # Trailing word boundary
)"""
print(re.findall(numbers, text))
See Python demo
And here is a regex demo .
Well the re.findall and the add of [0-9]+ work well for your list. Unfortunately if you try to match something like seventythree you will get --> seven and three, thus you need something better than this below :-)
numbers = "(^a(?=\s)|one|two|three|four|five|six|seven|eight|nine|ten| \
eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen| \
eighteen|nineteen|twenty|thirty|forty|fifty|sixty|seventy|eighty| \
ninety|hundred|thousand|[0-9]+)"
x = re.findall(numbers, text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.