繁体   English   中英

Python-标记化,替换单词

[英]Python - tokenizing, replacing words

我正在尝试创建一些带有随机单词的句子。 具体来说,我会有类似的东西:

"The weather today is [weather_state]."

并能够执行类似在[方括号]中找到所有标记的操作,然后将其与字典或列表中的随机对应标记交换,从而使我拥有:

"The weather today is warm."
"The weather today is bad."

要么

"The weather today is mildly suiting for my old bones."

请记住,[bracket]令牌的位置并不总是在同一位置,并且我的字符串中会有多个括号中的令牌,例如:

"[person] is feeling really [how] today, so he's not going [where]."

我真的不知道从哪里开始,或者这甚至是使用令牌化或令牌模块的最佳解决方案。 非常感谢任何指向我正确方向的提示!

编辑:为澄清起见,我真的不需要使用方括号,任何非标准字符都可以。

您正在使用回调函数查找re.sub:

words = {
    'person': ['you', 'me'],
    'how': ['fine', 'stupid'],
    'where': ['away', 'out']
}

import re, random

def random_str(m):
    return random.choice(words[m.group(1)])


text = "[person] is feeling really [how] today, so he's not going [where]."
print re.sub(r'\[(.+?)\]', random_str, text)

#me is feeling really stupid today, so he's not going away.   

注意,与format方法不同,这允许对占位符进行更复杂的处理,例如

[person:upper] got $[amount if amount else 0] etc

基本上,您可以在此之上构建自己的“模板引擎”。

您可以使用format方法。

>>> a = 'The weather today is {weather_state}.'
>>> a.format(weather_state = 'awesome')
'The weather today is awesome.'
>>>

也:

>>> b = '{person} is feeling really {how} today, so he\'s not going {where}.'
>>> b.format(person = 'Alegen', how = 'wacky', where = 'to work')
"Alegen is feeling really wacky today, so he's not going to work."
>>>

当然,这种方法只适用, 如果你可以从方括号来卷曲那些切换。

如果使用括号而不是方括号,则您的字符串可以用作字符串格式模板 您可以使用itertools.product用很多替代品来填充它:

import itertools as IT

text = "{person} is feeling really {how} today, so he's not going {where}."
persons = ['Buster', 'Arthur']
hows = ['hungry', 'sleepy']
wheres = ['camping', 'biking']

for person, how, where in IT.product(persons, hows, wheres):
    print(text.format(person=person, how=how, where=where))

产量

Buster is feeling really hungry today, so he's not going camping.
Buster is feeling really hungry today, so he's not going biking.
Buster is feeling really sleepy today, so he's not going camping.
Buster is feeling really sleepy today, so he's not going biking.
Arthur is feeling really hungry today, so he's not going camping.
Arthur is feeling really hungry today, so he's not going biking.
Arthur is feeling really sleepy today, so he's not going camping.
Arthur is feeling really sleepy today, so he's not going biking.

要生成随机句子,可以使用random.choice

for i in range(5):
    person = random.choice(persons)
    how = random.choice(hows)
    where = random.choice(wheres)
    print(text.format(person=person, how=how, where=where))

如果必须使用方括号格式不包含大括号,则可以用大括号替换方括号,然后按上述步骤操作:

text = "[person] is feeling really [how] today, so he's not going [where]."
text = text.replace('[','{').replace(']','}')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM