Python正则表达式测试句子是否有效

Question

ACTIVE_LIST = ACTOR | ACTIVE_LIST and ACTOR
ACTOR = NOUN | ARTICLE NOUN
ARTICLE = a | the
NOUN = tom | jerry | goofy | mickey | jimmy | dog | cat | mouse

By applying above rule I can generate 通过应用上述规则，我可以生成

a tom 
tom and a jerry 
the tom and a jerry 
the tom and a jerry and tom and dog

but not 但不是

Tom 
the Tom and me

can I check the sentence is correct by only using python re module. 我可以仅使用python re模块检查句子是否正确。 I know how to match certain char by [abc] but don't know about word. 我知道如何用[abc]匹配某些字符，但不知道单词。 Actually I am trying to solve this ACM problem . 实际上，我正在尝试解决此ACM问题。 If someone assist me partially I can do the rest. 如果有人部分帮助我，我可以做剩下的事。 This is my 1st question at this arena. 这是我在这个舞台上的第一个问题。 Any suggestion or improvement highly appreciated. 任何建议或改进表示高度赞赏。

Answer 1

Use re.compile 使用重新编译

re.compile('tom', re.IGNORECASE)

In this following topic, you will have other way to do without re.compile. 在下面的主题中，您将具有其他方法而无需重新编译。 (search / match) （搜索/匹配）

Case insensitive Python regular expression without re.compile 不区分大小写的Python正则表达式，无需重新编译

Answer 2

This can be seen as an NLP (Natural Language Processing) problem. 这可以看作是NLP（自然语言处理）问题。 There is a special python module called NLTK (Natural Language Toolkit) that can be best used to solve this task, easier done than with regular expressions. 有一个称为NLTK（自然语言工具包）的特殊python模块，可以最好地解决该任务，比使用正则表达式更容易完成。

1) First you need to download the NLTK ( http://www.nltk.org/install.html ) 1）首先，您需要下载NLTK（ http://www.nltk.org/install.html ）

2) Import NLTK: 2）导入NLTK：

import nltk

3) Create a small grammar, a context free grammar containing your four rules ( https://en.wikipedia.org/wiki/Context-free_grammar ). 3）创建一个小的语法，一个上下文无关的语法，其中包含您的四个规则（ https://en.wikipedia.org/wiki/Context-free_grammar ）。 By means of the CFG module from NLTK, you can easily do that with one line of code: 借助NLTK的CFG模块，您可以使用一行代码轻松地完成此操作：

acm_grammar = nltk.CFG.fromstring("""
ACTIVE_LIST -> ACTOR | ACTIVE_LIST 'and' ACTOR
ACTOR -> NOUN | ARTICLE NOUN
ARTICLE -> 'a' | 'the'
NOUN -> 'tom' | 'jerry' | 'goofy' | 'mickey' | 'jimmy' | 'dog' | 'cat' | 'mouse' """)

4) Create a parser that will use the acm_grammar: 4）创建一个将使用acm_grammar的解析器：

parser = nltk.ChartParser(acm_grammar)

5) Test it on some input. 5）在某些输入上进行测试。 Input sentences must be in the form of a list with comma-separated words (strings). 输入句子必须为列表形式，并以逗号分隔的单词（字符串）。 The split() method can be used for this: split（）方法可用于此目的：

input= ["a tom", "tom and a jerry", "the tom and a jerry","the tom and a jerry and tom and dog","Tom", "the Tom and me"]

for sent in input:
    split_sent = sent.split()
    try:
        parser.parse(split_sent)
        print(sent,"-- YES I WILL")
    except ValueError:
        print(sent,"-- NO I WON'T")

In this last step, we check if the parser can parse a sentence according to the acm_grammar. 在最后一步中，我们检查解析器是否可以根据acm_grammar解析句子。 If it cannot, the call to the parser will result in a ValueError. 如果不能，则对解析器的调用将导致ValueError。 Here is the output of this code: 这是此代码的输出：

a tom -- YES I WILL
tom and a jerry -- YES I WILL
the tom and a jerry -- YES I WILL
the tom and a jerry and tom and dog -- YES I WILL
Tom -- NO I WON'T
the Tom and me -- NO I WON'T

Answer 3

Yes, you can write that as a regex pattern, because the grammar is regular. 是的，您可以将其编写为正则表达式模式，因为语法是常规的。 The regular expression will be pretty long, but it could be generated in a fairly straight-forward way; 正则表达式将很长，但是可以以非常简单的方式生成。 once you have the regex, you just compile it and apply it to each input. 一旦有了正则表达式，就可以对其进行编译并将其应用于每个输入。

The key is to turn regular rules into repetitions. 关键是将规则转化为重复规则。 For example, 例如，

STATEMENT = ACTION | STATEMENT , ACTION

can be turned into 可以变成

ACTION (, ACTION)*

Of course, that's just a part of the problem, because you'd first have to have transformed ACTION into a regular expression in order to create the regex for STATEMENT . 当然，这只是问题的一部分，因为您首先必须将ACTION转换为正则表达式才能为STATEMENT创建正则表达式。

The problem description glosses over an important issue, which is that the input does not just consist of lower-case alphabetic characters and commas. 问题描述掩盖了一个重要的问题，即输入不仅包含小写字母字符和逗号。 It also contains spaces, and the regular expression needs to insist on spaces at appropriate points. 它还包含空格，正则表达式需要在适当的位置坚持空格。 For example, the , above probably must (and certainly might) be followed by one (or more) spaces. 例如，在,上面可能必须（当然可能），后跟一个（或多个）空格。 It might be ok if it were preceded by a one or more spaces, too; 如果前面也有一个或多个空格也可以。 the problem description isn't clear. 问题描述不清楚。

So the correction regular expression for NOUN will actually turn out to be: 因此， NOUN的校正正则表达式实际上将为：

((a|the) +)?(tom|jerry|goofy|mickey|jimmy|dog|cat|mouse)

(I also found it interesting that the grammar as presented lets VERB match "hatesssssssss". I have no idea whether that was intentional.) （我还发现所呈现的语法使VERB匹配“ hatesssssssss”很有趣。我不知道这是否是故意的。）

Answer 4

After thinking a lot I have solved it at my own 想了很多之后，我自己解决了

ARTICLE = ( 'a', 'the')
NOUN = ('tom' , 'jerry' , 'goofy' , 'mickey' , 'jimmy' , 'dog' , 'cat' , 'mouse')

all_a = NOUN +tuple([' '.join([x,y]) for x in ARTICLE for y in NOUN])


def aseKi(str):
    return str in all_a

st = 'the tom and jerry'
st1 = 'tom and a jerry'

st2 = 'tom and jerry and the mouse'

st = 'tom and goofy and goofy and the goofy and a dog and cat'

val = st.split('and')

nice_val = [x.strip() for x in val]


s = [aseKi(x) for x in nice_val]

if all(s):
    print 'YES I WILL'
else:
    print "NO I WON'T"

Python正则表达式测试句子是否有效

问题描述

4 个解决方案

解决方案1
2 2015-12-31 08:45:50

解决方案2
1 2015-12-31 16:17:54

解决方案3
1 已采纳 2015-12-31 22:49:28

解决方案4
0 2015-12-31 17:55:17

Python正则表达式测试句子是否有效

问题描述

4 个解决方案

解决方案1 2 2015-12-31 08:45:50

解决方案2 1 2015-12-31 16:17:54

解决方案3 1 已采纳 2015-12-31 22:49:28

解决方案4 0 2015-12-31 17:55:17

解决方案1
2 2015-12-31 08:45:50

解决方案2
1 2015-12-31 16:17:54

解决方案3
1 已采纳 2015-12-31 22:49:28

解决方案4
0 2015-12-31 17:55:17