如何使用 Python 来标记句子字符串中的单词，具体取决于它们是否在一个特定单词之后和句号之前？

Question

I have a list of strings containing job descriptions like the following:我有一个包含职位描述的字符串列表，如下所示：

direct or coordinate an organization's financial or budget activities to fund operations, maximize investments, or increase efficiency. may serve as liaisons between organizations, shareholders, and outside organizations. may attend and participate in meetings of municipal councils or council committees. represent organizations or promote their objectives at official functions, or delegate representatives to do so.

I already have some python code that splits up each word in the description, and gives it a number of attributes, for example how many times it appears in the description, its position (in terms of numerical rank) or its POS tag (whether it's a noun, verb etc.).我已经有一些 python 代码将描述中的每个单词分开，并赋予它一些属性，例如它在描述中出现的次数，它的 position（就数字排名而言）或其 POS 标签（无论是名词、动词等）。 So for example, if the job description was just "plan schedules", my program can already give me the following:例如，如果工作描述只是“计划时间表”，我的程序已经可以给我以下内容：

[('plan', 'plan', 'NN', 0, 2, 5, 'construction managers', '11-9021.00', 245), ('schedule', 'schedul', 'NN', 1, 1, 1, 'construction managers', '11-9021.00', 245)]

I wanted to add to this a flag/boolean which would highlight, for each word in the definition, whether it comes after the word 'may' and before a full stop.我想为此添加一个标志/布尔值，它将突出显示定义中的每个单词，它是否出现在单词“可能”之后和句号之前。 Essentially, I would be looking for a list of booleans for each description, which I could zip to the above structure as the 10th attribute and know for each word whether it comes between 'may' and a full stop.本质上，我会寻找每个描述的布尔值列表，我可以将其 zip 作为上述结构的第 10 个属性，并知道每个单词是否介于“可能”和句号之间。

Any suggestions on how I could achieve this?关于如何实现这一目标的任何建议？

Answer 1

I'm assuming that you want to find the keyword appearing anywhere between the word "may" and a full stop, ie whether someone is allowed to perform a certain task.我假设您想找到出现在单词“may”和句号之间的任何地方的关键字，即是否允许某人执行某项任务。

After having compiled your list of keywords, you can use regular expressions and the re library to search for matching strings.编译完关键字列表后，您可以使用正则表达式和re库来搜索匹配的字符串。

The re.search method returns a Match object if the regular expression is found in the string, otherwise None .如果在字符串中找到正则表达式，则re.search方法返回 Match object，否则返回None 。 But these two cases can also be converted to a boolean variable:但是这两种情况也可以转换为 boolean 变量：

import re
def may_matcher(string, keyword):
    return bool(re.search(r'may\s(\w*\s)*'+keyword+'\s*(\w*\s)*\w*\.',string))

Applying this little function gives you the desired boolean:应用这个小 function 会给你想要的 boolean：

string = "may attend to guests."
may_matcher(string, "attend")
may_matcher(string, "help")

The first line evaluates to True whereas the second one evaluates to False .第一行计算为True而第二行计算为False 。

You can then use list comprehension to go through all of your keywords:然后，您可以通过所有关键字对 go 使用列表理解：

keywords = ["attend", "help"]
may_list = [may_matcher(string,keyword) for keyword in keywords]

It should be noted that one should be careful with negative sentences : A sentence with "may not" would also be matched by this function, If such sentences also exist.需要注意的是，要注意否定句：如果这样的句子也存在，那么这个 function 也会匹配带有“may not”的句子。 you would have to modify the regex.您将不得不修改正则表达式。

如何使用 Python 来标记句子字符串中的单词，具体取决于它们是否在一个特定单词之后和句号之前？

问题描述

1 个解决方案

解决方案1
0 2021-05-17 16:51:44

如何使用 Python 来标记句子字符串中的单词，具体取决于它们是否在一个特定单词之后和句号之前？

问题描述

1 个解决方案

解决方案1 0 2021-05-17 16:51:44

解决方案1
0 2021-05-17 16:51:44