简体   繁体   中英

How do I use NLTK to extract numbers from a text string in Python

I have been working on a program lately and I wanted to add a functionality where it would take in user speech such as "Show me my schedule from the next five(or 5) days" or something like that and then extract the number "Five or 5" as a number and use that in a different part of the code to request data from the google calendar, the google part is mostly done but I how do I get it to extract the numbers such as "Five" or letter based numbers, I found this code earlier when I was looking around and it only returns true or false and I'm not sure how to make it return the actual number, your help would be greatly appreciated!

import nltk

text = "Is there a one two three in there?"

def existence_of_numeric_data(text):
    text=nltk.word_tokenize(text)
    pos = nltk.pos_tag(text)
    count = 0
    for i in range(len(pos)):
        word , pos_tag = pos[i]
        if pos_tag == 'CD':
            return True
    return False

print(existence_of_numeric_data(text))

is there a way to make this release the numbers in integer format? like for example

String says "Show my schedule for the next five days" it'll return the number "5" as a separate int

If your text is like "Contains 1 2 3" then, you can simply do the following:

for word in text.split():
    if word.isdigit():
        num = int(word)

It should work. But for the text like "Contains one two three" you can make a dictionary containing the words like:

dt = ["one": 1, "two": 2, "three": 3, "four": 4, "five": 5]

and then simply search every word in this list in the given text:

for words in dt:
    for w in text.split():
        if w == words:
            num == dt[words]

But this may be used only if you have a limited number of words. For example, if the text contains twenty and your dictionary do not have twenty then it will not work.

I have found a plugin called word2number (Install using pip) and it does the job just fine, this is how you use it

from word2number import w2n

text = "There are five days in a week"

print(w2n.word2number(text))

output>>
5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM