簡體   English   中英

如何拆分字符串中的特定單詞?

[英]How to split specific words in string?

我有一個大學項目。 我想拆分單詞並將其轉換為 503 到 503 之類的數字。我從文本文件中獲取字符串,但我不知道如何拆分它。

我想轉換為測試的句子

there is five hundred three people

我想這樣分裂

there, is, five hundred three, people

並在列表中使用字典將其轉換為

there is 503 people

我搜索了很多網站,但找不到任何關於此的信息。 我試過.split()但它拆分了每個單詞,我不能將它用於項目。

它是 python,所以有一個庫: https : //github.com/careless25/text2digits

但是,如果您不喜歡使用庫,則此方法(來自庫)完全符合您的要求:

def text2int (textnum, numwords={}):
    if not numwords:
        units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
        ]

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion"]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units):  numwords[word] = (1, idx)
        for idx, word in enumerate(tens):       numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    textnum = textnum.replace('-', ' ')

    current = result = 0
    curstring = ""
    onnumber = False
    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
            onnumber = True
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                if onnumber:
                    curstring += repr(result + current) + " "
                curstring += word + " "
                result = current = 0
                onnumber = False
            else:
                scale, increment = numwords[word]

                current = current * scale + increment
                if scale > 100:
                    result += current
                    current = 0
                onnumber = True

    if onnumber:
        curstring += repr(result + current)

    return curstring

你可以這樣使用它:

>>> text2int("I want fifty five hot dogs for two hundred dollars.")
 I want 55 hot dogs for 200 dollars.

您可以使用以下命令安裝text2digits包:

pip install text2digits

然后使用如下包來處理您的示例:

from text2digits import text2digits
t2d = text2digits.Text2Digits()
print t2d.convert("there is five hundred three people")

輸出是:

>>> 
there is 503 people

您必須使用寫出的數字列表,然后在字符串中搜索所有數字並替換它們。

即像這樣的東西

strings["one", "two", "three"...]      #list of numbers represented as strings
numbers[1, 2, 3...]                    #corrasponding numbers 

def replaceNumbers(string):            #function to replace numbers
    for x in range(len(strings)):      #loop through strings 
        #replace string with number
        string= string[:string.find(x)] + str(numbers[x]) + string[string.find(x) + len(x):] 
    return string

然后你需要弄清楚如何處理成百上千等

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM