繁体   English   中英

检查字符串是否包含字典中的字符串值并返回适当的键

[英]Checking a if a string contains a string value from a dictionary and return the appropriate key

我想检查 Pandas 列中的字符串是否包含字典中的单词,如果存在匹配项,则需要使用适当的字典键作为列值创建一个新列。 例如。 dict = {'Car': ['Merc', 'BMW', 'Ford, 'Suzuki'], 'MotorCycle': ['Harley', 'Yamaha', 'Trump']}

df

句子
一个 “他开着一辆雇佣兵”
“他骑着哈雷”

应该返回

句子 车辆
一个 “他开着一辆雇佣兵” '车'
“他骑着哈雷” “摩托车

一种解决方案是从dct创建反向字典并使用str.split搜索正确的单词:

dct = {
    "Car": ["Merc", "BMW", "Ford", "Suzuki"],
    "MotorCycle": ["Harley", "Yamaha", "Triump"],
}

dct_inv = {i: k for k, v in dct.items() for i in v}


def find_word(x):
    for w in x.strip(" '").split():
        if w in dct_inv:
            return dct_inv[w]
    return None


df["Vehicle"] = df["Sentence"].apply(find_word)
print(df)

印刷:

  Person             Sentence     Vehicle
0      A   'He drives a Merc'         Car
1      B  'He rides a Harley'  MotorCycle

您可以反转字典并使用正则表达式 + map

import re

dic = {'Car': ['Merc', 'BMW', 'Ford', 'Suzuki'],
       'MotorCycle': ['Harley', 'Yamaha', 'Triump']}

# invert dictionary
d = {k:v for v,l in dic.items()
     for k in l}

# craft regex 
regex = f'({"|".join(map(re.escape, d))})'

# map vehicle from match
df['Vehicle'] = df['Sentence'].str.extract(regex, expand=False).map(d)

Output:

  Person           Sentence     Vehicle
0      A   He drives a Merc         Car
1      B  He rides a Harley  MotorCycle

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM