使用ReGex匹配表达式，Python

Question

I have many sentences , though i'd create a function that would operate on each sentence individually. 我有很多句子，尽管我会创建一个可以对每个句子单独操作的函数。 so the input is just a string. 所以输入只是一个字符串。 My main objective is to extract the words that follow prepositions like in "near blue meadows" i'd want blue meadows to be extracted. 我的主要目标是提取介词之后的词，例如在"near blue meadows"我希望提取blue meadows 。
I have all my prepositions in a text file. 我所有的介词都放在一个文本文件中。 it works fine but i guess there's a problem in the regex used . 它工作正常，但我想所使用的正则表达式存在问题。 here's my code: import re 这是我的代码：import re

with open("Input.txt") as f:
    words = "|".join(line.rstrip() for line in f)
    pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
    text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"
    print(pattern.search(text3).group())

This returns : 返回：

AttributeError                            Traceback (most recent call last)
<ipython-input-83-be0cdffb436b> in <module>()
      5     pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
      6     text3 = ""
----> 7     print(pattern.search(text3).group())

AttributeError: 'NoneType' object has no attribute 'group

The main problem is with regex , my expected output is "hennur police" ie 2 words after near . 主要问题是使用正则表达式，我的预期输出是“ hennur警察”，即近2个字。 In my code I have used ({}) to match from the list of preps, \\s followed by space , (\\d+\\w+|\\w+) followed by words like 19th or hennur , \\s\\w+ followed by a space and a word. 在我的代码中，我使用({})来匹配前缀列表， \\s后跟空格， (\\d+\\w+|\\w+)后跟诸如19th或hennur之类的单词， \\s\\w+后跟空格，一个字。 My regex fails to match , hence the None error. 我的正则表达式无法匹配，因此出现None错误。 Why is it not working? 为什么不起作用？

The content of the Input.txt file: Input.txt文件的内容：

['near','nr','opp','opposite','behind','towards','above','off']

Expected output: 预期产量：

hennur police

Answer 1

The file contains Python list literal. 该文件包含Python列表文字。 Use ast.literal to parse the literal. 使用ast.literal解析文字。

>>> import ast
>>> ast.literal_eval("['near','nr','opp','opposite','behind','towards','above','off']")
['near', 'nr', 'opp', 'opposite', 'behind', 'towards', 'above', 'off']

import ast
import re

with open("Input.txt") as f:
    words = '|'.join(ast.literal_eval(f.read()))
    pattern = re.compile('(?:{})\s(\d*\w+\s\w+)'.format(words))
    text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"

    # If there could be multiple matches, use `findall` or `finditer`
    #   `findall` returns a list of list if there's capturing group instead of
    #   entire matched string.
    for place in pattern.findall(text3):
        print(place)

    # If you want to get only the first match, use `search`.
    #   You need to use `group(1)` to get only group 1.
    print pattern.search(text3).group(1)

output (The first line is printed in for loop, the second one come from search(..).group(1) ): 输出（第一行打印在for循环中，第二行来自search(..).group(1) ）：

hennur police
hennur police

NOTE you need to re.escape each word if there's any special character in the word that has special meaning in regular expression. 注意如果单词中有任何特殊字符在正则表达式中具有特殊含义，则需要重新re.escape每个单词。

使用ReGex匹配表达式，Python

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-02-27 07:42:11

使用ReGex匹配表达式，Python

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-02-27 07:42:11

解决方案1
1 已采纳 2014-02-27 07:42:11