简体   繁体   English

如何检查字符串是否与 Python 中的设置模式匹配?

[英]How do I check if a string matches a set pattern in Python?

I want to match a string to a specific pattern or set of words, like below:我想将一个字符串与特定模式或一组单词相匹配,如下所示:

the apple is red is the query and the apple|orange|grape is red|orange|violet is the pattern to match. the apple is red是查询, the apple|orange|grape is red|orange|violet是要匹配的模式。 The pipes would represent words that would substitute each other.管道将代表可以相互替换的单词。 The pattern could also be grouped like [launch app]|[start program] .该模式也可以像[launch app]|[start program]那样分组。 I would like the module to return True or False whether the query matches the pattern, naturally.无论查询是否与模式匹配,我都希望模块自然地返回 True 或 False。

What is the best way to accomplish this if there is not a library that does this already?如果没有图书馆已经做到这一点,那么最好的方法是什么? If this can be done with simple regex, great;如果这可以用简单的正则表达式来完成,那就太好了; however I know next to nothing about regex.但是我对正则表达式几乎一无所知。 I am using Python 2.7.11我正在使用 Python 2.7.11

import re

string = 'the apple is red'

re.search(r'^the (apple|orange|grape) is (red|orange|violet)', string)

Here's an example of it running:这是它运行的示例:

In [20]: re.search(r'^the (apple|orange|grape) is (red|orange|violet)', string). groups()
Out[20]: ('apple', 'red')

If there are no matches then re.search() will return nothing.如果没有匹配项,则re.search()将不返回任何内容。

You may know "next to nothing about regex" but you nearly wrote the pattern.您可能“对正则表达式几乎一无所知”,但您几乎编写了模式。

The sections within the parentheses can also have their own regex patterns, too.括号内的部分也可以有自己的正则表达式模式。 So you could match "apple" and "apples" with所以你可以将“apple”和“apples”与

r'the (apple[s]*|orange|grape)

The re based solutions for this kind of problem work great.针对此类问题的基于re的解决方案效果很好。 But it would sure be nice if there were an easy way to pull data out of strings in Python without have to learn regex (or to learn it AGAIN, which what I always end up having to do since my brain is broken).但是如果有一种简单的方法可以从 Python 中的字符串中提取数据而无需学习正则表达式(或者再次学习它,因为我的大脑已经崩溃,我总是最终不得不这样做),那肯定会很好。

Thankfully, someone took the time to write parse .谢天谢地,有人花时间写了parse

parse

parse is a nice package for this kind of thing. parse这种东西的一个很好的包 It uses regular expressions under the hood, but the API is based on the string format specification mini-language , which most Python users will already be familiar with.它在底层使用正则表达式,但 API 基于string格式规范 mini-language ,大多数 Python 用户已经熟悉了。

For a format spec you will use over and over again, you'd use parse.compile .对于您将反复使用的格式规范,您将使用parse.compile Here is an example:下面是一个例子:

>>> import parse
>>> theaisb_parser = parse.compile('the {} is {}')
>>> fruit, color = theaisb_parser.parse('the apple is red')
>>> print(fruit, color)
apple red

parmatter

I have put a package I created for my own use on pypi in case others find it useful.已经把我自己创建的包放在 pypi 上,以防其他人觉得它有用。 It make things just a little bit nicer.它让事情变得更好一点。 It makes heavy usage of parse .它大量使用parse The idea is to combine the functionality of a string.Formatter and a parse.Parser into a single object, which I have called a parmatter (also the package name).这个想法是将string.Formatterparse.Parser 的功能组合成一个对象,我称之为parmatter (也是包名称)。

The package contains a number of useful custom parmatter types.该包包含许多有用的自定义参数类型。 StaticParmatter has a precompiled parsing specification (similar to the object from parse.compile above). StaticParmatter有一个预编译的解析规范(类似于上面parse.compile的对象)。 Use it like this:像这样使用它:

>>> from parmatter import StaticParmatter
>>> theaisb = StaticParmatter('the {} is {}')
>>> print(theaisb.format('lizard', 'chartreuse'))
the lizard is chartreuse
>>> fruit, color = theaisb.unformat('the homynym is ogive')
>>> print(fruit, color)
homynym ogive

Note that for "unformatting", the parse package uses the method name parse .请注意,对于“取消格式化”, parse包使用方法名称parse However, my package uses unformat .但是,我的包使用unformat The reason for this is that parmatter classes are subclassed from string.Formatter , and string.Formatter already has a .parse() method (which provides different functionality).这样做的原因是parmatter类是从string.Formatter子类化的,而string.Formatter已经有一个.parse()方法(它提供了不同的功能)。 Additionally, I think unformat is a more intuitive method name, anyway.此外,无论如何,我认为unformat是一个更直观的方法名称。

EDIT: see also my previous answer to another question , which discusses these packages as well.编辑:另请参阅我之前对另一个问题的回答,其中也讨论了这些包。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM