简体   繁体   中英

how to extract number from a sentence through python

Here is a sentence "a building is 100 m tall and 20 m wide" I want to extract the number about height which is 100, so i use

question = input "  "
height = re.findall(r'(\d+) m tall', question)

However, sometimes the sentence is not "100 m tall", it is "100 m high". in this case my program can no longer extract the number i want any more. Is there a way to improve my program and let it work no matter the sentence includes either "tall" or "high".

You can check the "tall or high" condition via | :

(\d+) m (tall|high)

Demo:

>>> re.findall(r'(\d+) m (tall|high)', 'a building is 100 m tall and 20 m wide')
[('100', 'tall')]
>>> re.findall(r'(\d+) m (tall|high)', 'a building is 100 m high and 20 m wide')
[('100', 'high')]

If you want for the word to not be captured, use a non-capturing group :

(\d+) m (?:tall|high)
>>> import re
>>> re.findall(r'(\d+) m (?:tall|high)', "a building is 100 m tall and 20 m wide")
['100']
>>> re.findall(r'(\d+) m (?:tall|high)', "a building is 100 m high and 20 m wide")
['100']

As per your requirement, the regular expression should match any of terms 'tall' or 'high'.

         i.e.,  ?:tall|high
        where,  ?: means 'matches any of'
                and,     | means 'or'

So, solution can be like :

>>> re.findall(r'(\d+) m (?:tall|high)', question)


 ['100']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM