[英]Checking a if a string contains a string value from a dictionary and return the appropriate key
我想检查 Pandas 列中的字符串是否包含字典中的单词,如果存在匹配项,则需要使用适当的字典键作为列值创建一个新列。 例如。 dict = {'Car': ['Merc', 'BMW', 'Ford, 'Suzuki'], 'MotorCycle': ['Harley', 'Yamaha', 'Trump']}
df
人 | 句子 |
---|---|
一个 | “他开着一辆雇佣兵” |
乙 | “他骑着哈雷” |
应该返回
人 | 句子 | 车辆 |
---|---|---|
一个 | “他开着一辆雇佣兵” | '车' |
乙 | “他骑着哈雷” | “摩托车 |
一种解决方案是从dct
创建反向字典并使用str.split
搜索正确的单词:
dct = {
"Car": ["Merc", "BMW", "Ford", "Suzuki"],
"MotorCycle": ["Harley", "Yamaha", "Triump"],
}
dct_inv = {i: k for k, v in dct.items() for i in v}
def find_word(x):
for w in x.strip(" '").split():
if w in dct_inv:
return dct_inv[w]
return None
df["Vehicle"] = df["Sentence"].apply(find_word)
print(df)
印刷:
Person Sentence Vehicle
0 A 'He drives a Merc' Car
1 B 'He rides a Harley' MotorCycle
您可以反转字典并使用正则表达式 + map
:
import re
dic = {'Car': ['Merc', 'BMW', 'Ford', 'Suzuki'],
'MotorCycle': ['Harley', 'Yamaha', 'Triump']}
# invert dictionary
d = {k:v for v,l in dic.items()
for k in l}
# craft regex
regex = f'({"|".join(map(re.escape, d))})'
# map vehicle from match
df['Vehicle'] = df['Sentence'].str.extract(regex, expand=False).map(d)
Output:
Person Sentence Vehicle
0 A He drives a Merc Car
1 B He rides a Harley MotorCycle
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.