简体   繁体   中英

How do I use regular expressions in Python with placeholder text?

I am doing a project in Python where I require a user to input text. If the text matches a format supported by the program, it will output a response that includes a user's key word (it is a simple chat bot). The format is stored in a text file as a user input format and an answer format.

For example, the text file looks like this, with user input on the left and output on the right:

my name is <-name> | Hi there, <-name>

So if the user writes my name is johnny , I want the program to know that johnny is the <-name> variable, and then to print the response Hi there, johnny .

Some prodding me in the right direction would be great! I have never used regular expressions before and I read an article on how to use them, but unfortunately it didn't really help me since it mainly went over how to match specific words.

Here's an example:

import re

io = [
    ('my name is (?P<name>\w+)', 'Hi there, {name}'),
]

string = input('> ')
for regex, output in io:
    match = re.match(regex, string)
    if match:
        print(output.format(**match.groupdict()))
        break

I'll take you through it:


'my name is (?P<name>\w+)'

(?P<name>...) stores the following part ( \\w+ ) under the name name in the match object which we're going to use later on.


match = re.match(regex, string)

This looks for the regex in the input given. Note that re.match only matches at the beginning of the input, if you don't want that restriction use re.search here instead.


If it matches:

output.format(**match.groupdict())

match.groupdict returns a dictionary of keys defined by (?P<name>...) and their associated matched values. ** passes those key/values to .format , in this case Python will translate it to output.format(name='matchedname') .


To construct the io dictionary from a file do something like this:

io = []
with open('input.txt') as file_:
    for line in file:
        key, value = line.rsplit(' | ', 1)
        io.append(tuple(key, value))

You are going to want to do a group match and then pull out the search groups.

First you would want to import re - re is the python regex module. Lets say that user_input is the var holding the input string. You then want to use the re.sub method to match your string and return a substitute it for something.

output = re.sub(input_regex, output_regex, user_input)

So the regex, first you can put the absolute stuff you want:

input_regex = 'my name is '

If you want it to match explicitly from the start of the line, you should proceed it with the caret:

input_regex = '^my name is '

You then want a group to match any string .+ (. is anything, + is 1 or more of the preceding item) until the end of the line '$'.

input_regex = '^my name is .+$'

Now you'll want to put that into a named group. Named groups take the form "(?Pregex)" - note that those angle brackets are literal.

input_regex = '^my name is (?P<name>.+)$'

You now have a regex that will match and give a match group named "name" with the users name in it. The output string will need to reference the match group with "\\g"

output_regex = 'Hi there, \\g<name>'

Putting it all together you can do it in a one liner (and the import):

import re
output = re.sub('^my name is (?P<name>.+)$', 'Hi there, \g<name>', user_input)

Asking for REGEXP inevitably leads to answers like the ones you're getting right now: demonstrations of basic REGEXP operations: how to split sentences, search for some term combination like 'my' + 'name' + 'is' within, etc.

In fact, you could learn all this from reading existing documentation and open source programs. REGEXP is not exactly easy. Still you'll need to understand a bit on your own to be able to really know what's going on, if you want to change and extend your program. Don't just copy from the receipts here.

But you may even want to have something more comprehensive. Because you mentioned building a "chat bot", you may want see, how others are approaching that task - way beyond REGEXP. See:

So if the user writes 'my name is johnny', I want the program to know that 'johnny' is the '<-name>' variable, ...

From you question it's unclear, how complex this program should become. What, if he types

'Johnny is my name.'

or

'Hey, my name is John X., but call me johnny.'

?

Take a look at re module and pay attention for capturing groups.

For example, you can assume that name will be a word, so it matches \\w+ . Then you have to construct a regular expression with \\w+ capturing group where the name should be (capturing groups are delimited by parentheses):

r'my name is (\w+)'

and then match it against the input (hint: look for match in the re module docs).

Once you get the match, you have to get the contents of capturing group (in this case at index 1, index 0 is reserved for the whole match) and use it to construct your response.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM