简体   繁体   中英

Python - tokenizing, replacing words

I'm trying to create something like sentences with random words put into them. To be specific, I'd have something like:

"The weather today is [weather_state]."

and to be able to do something like finding all tokens in [brackets] and than exchange them for a randomized counterpart from a dictionary or a list, leaving me with:

"The weather today is warm."
"The weather today is bad."

or

"The weather today is mildly suiting for my old bones."

Keep in mind, that the position of the [bracket] token wouldn't be always in the same position and there would be multiple bracketed tokens in my string, like:

"[person] is feeling really [how] today, so he's not going [where]."

I really don't know where to start with this or is this even the best solution to use tokenize or token modules with this. Any hints that would point me in the right direction greatly appreciated!

EDIT: Just for clarification, I don't really need to use square brackets, any non-standard character will do.

You're looking for re.sub with a callback function:

words = {
    'person': ['you', 'me'],
    'how': ['fine', 'stupid'],
    'where': ['away', 'out']
}

import re, random

def random_str(m):
    return random.choice(words[m.group(1)])


text = "[person] is feeling really [how] today, so he's not going [where]."
print re.sub(r'\[(.+?)\]', random_str, text)

#me is feeling really stupid today, so he's not going away.   

Note that unlike the format method, this allows for more sophisticated processing of placeholders, eg

[person:upper] got $[amount if amount else 0] etc

Basically, you can build your own "templating engine" on top of that.

You can use the format method.

>>> a = 'The weather today is {weather_state}.'
>>> a.format(weather_state = 'awesome')
'The weather today is awesome.'
>>>

Also:

>>> b = '{person} is feeling really {how} today, so he\'s not going {where}.'
>>> b.format(person = 'Alegen', how = 'wacky', where = 'to work')
"Alegen is feeling really wacky today, so he's not going to work."
>>>

Of course, this method only works IF you can switch from square brackets to curly ones.

If you use braces instead of brackets, then your string could be used as a string formatting template . You could fill it in with lots of substitutions using itertools.product :

import itertools as IT

text = "{person} is feeling really {how} today, so he's not going {where}."
persons = ['Buster', 'Arthur']
hows = ['hungry', 'sleepy']
wheres = ['camping', 'biking']

for person, how, where in IT.product(persons, hows, wheres):
    print(text.format(person=person, how=how, where=where))

yields

Buster is feeling really hungry today, so he's not going camping.
Buster is feeling really hungry today, so he's not going biking.
Buster is feeling really sleepy today, so he's not going camping.
Buster is feeling really sleepy today, so he's not going biking.
Arthur is feeling really hungry today, so he's not going camping.
Arthur is feeling really hungry today, so he's not going biking.
Arthur is feeling really sleepy today, so he's not going camping.
Arthur is feeling really sleepy today, so he's not going biking.

To generate random sentences, you could use random.choice :

for i in range(5):
    person = random.choice(persons)
    how = random.choice(hows)
    where = random.choice(wheres)
    print(text.format(person=person, how=how, where=where))

If you must use brackets and have no braces in your format, you could replace the brackets with braces and then proceed as above:

text = "[person] is feeling really [how] today, so he's not going [where]."
text = text.replace('[','{').replace(']','}')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM