简体   繁体   English

在字符串中找到列表项的整个部分,而不是子部分?

[英]Find whole part of list item, not subparts, in a string?

I have a dictionary of keys & values (massively truncated for ease of reading): 我有一个键与值的字典(为了易于阅读而被截断了):

responsePolarities = {'yes':0.95, 'hell yes':0.99, 'no':-0.95, 'hell no':-0.99, 'okay':0.70}

I am doing a check to see if any key is in a string passed to my function: 我正在检查传递给我的函数的字符串中是否有任何键:

for key, value in responsePolarities.items():
    if key in string:
        return value

Problem is that if, in the passed string, a word such as "know" is in it, the function sees the 'no' in 'know' and returns a -0.95. 问题在于,如果在所传递的字符串中包含诸如“ know”之类的单词,该函数将在“ know”中看到“ no”并返回-0.95。

I can't add spaces around the 'no' key because it could be the only response provided. 我不能在'no'键周围添加空格,因为它可能是唯一提供的响应。

How can I make the function see 'no' as 'no' but not 'know'? 如何使函数将“否”视为“否”而不是“知道”? Am I correct in thinking this is probably going to need to be a RegExp job, or is there something more simple I'm missing? 我是否认为这可能需要RegExp工作是正确的,还是我缺少更简单的东西?

I thought about splitting my passed string into individual words, but then I couldn't check for multi-word phrases that modify the response polarity (like no vs. hell no)... 我曾考虑过将传递的字符串拆分成单个单词,但是后来我无法检查是否有多词短语修改了响应的极性(例如no vs. hell no)。

If I understand this correctly, you want to match text that contains your keys, but only if the whole word matches. 如果我正确理解这一点,则希望匹配包含键的文本,但前提是要匹配整个单词。 You can do this using the regex word boundary delimiter \\b . 您可以使用正则表达式单词边界定界符\\b It will match when the word is separated by punctuation, like :no, but not other word characters like know . 当单词被标点符号分隔时,它会匹配,例如:no,但是其他单词字符(例如know不会。 Here you loop through some strings and for each find the matching keys in the dictionary: 在这里,您遍历一些字符串,并为每个字符串在字典中找到匹配的键:

responsePolarities = {'yes':0.95, 'hell yes':0.99, 'no':-0.95, 'hell no':-0.99, 'okay':0.70}

strings = [
    'I know nothing',
    'I now think the answer is no',
    'hell, mayb yes',
    'or hell yes',
    'i thought:yes or maybe--hell yes--'
]

for s in strings:
    for k,v in responsePolarities.items():
        if re.search(rf"\b{k}\b", s):
            print(f"'{s}' matches: {k} : {v}")

'I know nothing' shouldn't match anything. 'I know nothing'不应该匹配任何东西。 The matches should look like: 匹配项应如下所示:

'I now think the answer is no' matches: no : -0.95 “我现在认为答案是否定的”匹配项:否:-0.95
'hell, mayb yes' matches: yes : 0.95 'hell,mayb yes'匹配:是:0.95
'or hell yes' matches: yes : 0.95 “或地狱是”匹配项:是:0.95
'or hell yes' matches: hell yes : 0.99 “或地狱是”匹配项:地狱是:0.99
'i thought:yes or maybe--hell yes--' matches: yes : 0.95 '我以为:是或也许-地狱是-'匹配:是:0.95
'i thought:yes or maybe--hell yes--' matches: hell yes : 0.99 '我以为:是或也许-地狱是-'比赛:地狱是:0.99

If you are doing a lot of searches, you might consider precompiling the regexes before the loop. 如果您要进行大量搜索,则可以考虑在循环之前预编译正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM