[英]How to split specific words in string?
I have a university project.我有一个大学项目。 I want to split and convert words to numbers like five hundred three to 503. I take string from text file but I don't how to split it.我想拆分单词并将其转换为 503 到 503 之类的数字。我从文本文件中获取字符串,但我不知道如何拆分它。
The sentence I want to convert for test我想转换为测试的句子
there is five hundred three people
I want to split like this我想这样分裂
there, is, five hundred three, people
and take in list to use dictionary to convert it to并在列表中使用字典将其转换为
there is 503 people
I searched so much site but I can't find anything about this.我搜索了很多网站,但找不到任何关于此的信息。 I tried .split() but it split every word and I can't use it for project.我试过.split()但它拆分了每个单词,我不能将它用于项目。
It's python, so there is a library for this: https://github.com/careless25/text2digits它是 python,所以有一个库: https : //github.com/careless25/text2digits
But, if you do not prefer using the library, this method (from the library) that does exactly what you want:但是,如果您不喜欢使用库,则此方法(来自库)完全符合您的要求:
def text2int (textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]
tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]
scales = ["hundred", "thousand", "million", "billion", "trillion"]
numwords["and"] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)
ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]
textnum = textnum.replace('-', ' ')
current = result = 0
curstring = ""
onnumber = False
for word in textnum.split():
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
current = current * scale + increment
if scale > 100:
result += current
current = 0
onnumber = True
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)
if word not in numwords:
if onnumber:
curstring += repr(result + current) + " "
curstring += word + " "
result = current = 0
onnumber = False
else:
scale, increment = numwords[word]
current = current * scale + increment
if scale > 100:
result += current
current = 0
onnumber = True
if onnumber:
curstring += repr(result + current)
return curstring
You can use it like this:你可以这样使用它:
>>> text2int("I want fifty five hot dogs for two hundred dollars.")
I want 55 hot dogs for 200 dollars.
You can install text2digits
package with:您可以使用以下命令安装text2digits
包:
pip install text2digits
Then use the package as follows to work with your example:然后使用如下包来处理您的示例:
from text2digits import text2digits
t2d = text2digits.Text2Digits()
print t2d.convert("there is five hundred three people")
And the output is:输出是:
>>>
there is 503 people
You would have to use a list of numbers written out and then search the string for all of them and replace them.您必须使用写出的数字列表,然后在字符串中搜索所有数字并替换它们。
ie something like this即像这样的东西
strings["one", "two", "three"...] #list of numbers represented as strings
numbers[1, 2, 3...] #corrasponding numbers
def replaceNumbers(string): #function to replace numbers
for x in range(len(strings)): #loop through strings
#replace string with number
string= string[:string.find(x)] + str(numbers[x]) + string[string.find(x) + len(x):]
return string
then you then need to figure out how to deal with hundreds, thousands, ect然后你需要弄清楚如何处理成百上千等
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.