简体   繁体   English

如何使用 Python 来标记句子字符串中的单词,具体取决于它们是否在一个特定单词之后和句号之前?

[英]How can use Python to mark words in a sentence string depending on whether they come after one specific word and before a full stop?

I have a list of strings containing job descriptions like the following:我有一个包含职位描述的字符串列表,如下所示:

direct or coordinate an organization's financial or budget activities to fund operations, maximize investments, or increase efficiency. may serve as liaisons between organizations, shareholders, and outside organizations. may attend and participate in meetings of municipal councils or council committees. represent organizations or promote their objectives at official functions, or delegate representatives to do so.

I already have some python code that splits up each word in the description, and gives it a number of attributes, for example how many times it appears in the description, its position (in terms of numerical rank) or its POS tag (whether it's a noun, verb etc.).我已经有一些 python 代码将描述中的每个单词分开,并赋予它一些属性,例如它在描述中出现的次数,它的 position(就数字排名而言)或其 POS 标签(无论是名词、动词等)。 So for example, if the job description was just "plan schedules", my program can already give me the following:例如,如果工作描述只是“计划时间表”,我的程序已经可以给我以下内容:

[('plan', 'plan', 'NN', 0, 2, 5, 'construction managers', '11-9021.00', 245), ('schedule', 'schedul', 'NN', 1, 1, 1, 'construction managers', '11-9021.00', 245)]

I wanted to add to this a flag/boolean which would highlight, for each word in the definition, whether it comes after the word 'may' and before a full stop.我想为此添加一个标志/布尔值,它将突出显示定义中的每个单词,它是否出现单词“可能”之后和句号之前 Essentially, I would be looking for a list of booleans for each description, which I could zip to the above structure as the 10th attribute and know for each word whether it comes between 'may' and a full stop.本质上,我会寻找每个描述的布尔值列表,我可以将其 zip 作为上述结构的第 10 个属性,并知道每个单词是否介于“可能”和句号之间。

Any suggestions on how I could achieve this?关于如何实现这一目标的任何建议?

I'm assuming that you want to find the keyword appearing anywhere between the word "may" and a full stop, ie whether someone is allowed to perform a certain task.我假设您想找到出现在单词“may”和句号之间的任何地方的关键字,即是否允许某人执行某项任务。

After having compiled your list of keywords, you can use regular expressions and the re library to search for matching strings.编译完关键字列表后,您可以使用正则表达式re库来搜索匹配的字符串。

The re.search method returns a Match object if the regular expression is found in the string, otherwise None .如果在字符串中找到正则表达式,则re.search方法返回 Match object,否则返回None But these two cases can also be converted to a boolean variable:但是这两种情况也可以转换为 boolean 变量:

import re
def may_matcher(string, keyword):
    return bool(re.search(r'may\s(\w*\s)*'+keyword+'\s*(\w*\s)*\w*\.',string))

Applying this little function gives you the desired boolean:应用这个小 function 会给你想要的 boolean:

string = "may attend to guests."
may_matcher(string, "attend")
may_matcher(string, "help")

The first line evaluates to True whereas the second one evaluates to False .第一行计算为True而第二行计算为False

You can then use list comprehension to go through all of your keywords:然后,您可以通过所有关键字对 go 使用列表理解:

keywords = ["attend", "help"]
may_list = [may_matcher(string,keyword) for keyword in keywords]

It should be noted that one should be careful with negative sentences : A sentence with "may not" would also be matched by this function, If such sentences also exist.需要注意的是,要注意否定句:如果这样的句子也存在,那么这个 function 也会匹配带有“may not”的句子。 you would have to modify the regex.您将不得不修改正则表达式。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中提取特定字符串之前的2个单词,实际单词和2个字符串? - extracting the 2 words before, the actual word, and the 2 strings after a specific string in python? 如何使用pyspark查找一列的字符串句子中是否包含一个或多个单词 - How to use pyspark to find whether a column contains one or more words in it's string sentence 获取特定单词之后的所有单词 - Get all words that come after specific word 如何检查句子特定部分之前的字符串是否与其他行中的任何文本匹配或不匹配(与特定部分之后相同)? - how to check whether the string before specific part of a sentence matches with any of the text in other lines or no (same with after specific part)? 在python文件中的特定单词之前和之后打印5个单词 - printing 5 words before and after a specific word in a file in python 我如何使用句号作为 python 中的多个分隔符之一 - How can i use a full stop as one of multiple delimiters in python 如何在python中找到另一个字符串(句子)中一个字符串(可以是多词)的计数/出现 - How to find the count/occurrence of one string(can be multi-word) in another string(sentence) in python 我如何在Python中的关键搜索词之前和之后显示2个单词 - How I display 2 words before and after a key search word in Python 如何使用Python删除特定单词之前的所有单词(如果有多个特定单词)? - How to remove all words before specific word using Python (if there are multiple specific words)? 如何使用python regex删除字符串中特定单词之前和之后的文本 - How to remove text after and before specific words in a string using python regex
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM