简体   繁体   中英

How can use Python to mark words in a sentence string depending on whether they come after one specific word and before a full stop?

I have a list of strings containing job descriptions like the following:

direct or coordinate an organization's financial or budget activities to fund operations, maximize investments, or increase efficiency. may serve as liaisons between organizations, shareholders, and outside organizations. may attend and participate in meetings of municipal councils or council committees. represent organizations or promote their objectives at official functions, or delegate representatives to do so.

I already have some python code that splits up each word in the description, and gives it a number of attributes, for example how many times it appears in the description, its position (in terms of numerical rank) or its POS tag (whether it's a noun, verb etc.). So for example, if the job description was just "plan schedules", my program can already give me the following:

[('plan', 'plan', 'NN', 0, 2, 5, 'construction managers', '11-9021.00', 245), ('schedule', 'schedul', 'NN', 1, 1, 1, 'construction managers', '11-9021.00', 245)]

I wanted to add to this a flag/boolean which would highlight, for each word in the definition, whether it comes after the word 'may' and before a full stop. Essentially, I would be looking for a list of booleans for each description, which I could zip to the above structure as the 10th attribute and know for each word whether it comes between 'may' and a full stop.

Any suggestions on how I could achieve this?

I'm assuming that you want to find the keyword appearing anywhere between the word "may" and a full stop, ie whether someone is allowed to perform a certain task.

After having compiled your list of keywords, you can use regular expressions and the re library to search for matching strings.

The re.search method returns a Match object if the regular expression is found in the string, otherwise None . But these two cases can also be converted to a boolean variable:

import re
def may_matcher(string, keyword):
    return bool(re.search(r'may\s(\w*\s)*'+keyword+'\s*(\w*\s)*\w*\.',string))

Applying this little function gives you the desired boolean:

string = "may attend to guests."
may_matcher(string, "attend")
may_matcher(string, "help")

The first line evaluates to True whereas the second one evaluates to False .

You can then use list comprehension to go through all of your keywords:

keywords = ["attend", "help"]
may_list = [may_matcher(string,keyword) for keyword in keywords]

It should be noted that one should be careful with negative sentences : A sentence with "may not" would also be matched by this function, If such sentences also exist. you would have to modify the regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM