简体   繁体   English

在Pandas Dataframe中查找多个字典键并返回多个匹配值

[英]Looking up multiple dictionary keys in a Pandas Dataframe & return multiple values for matches

First time posting so apologies in advance if my formatting is off. 如果我的格式化关闭,第一次发布如此道歉。

Here's my issue: 这是我的问题:

I've created a Pandas dataframe which contains multiple rows of text: 我创建了一个包含多行文本的Pandas数据框:

d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']}
keywords = pd.DataFrame(d,columns=['keywords'])
In [7]: keywords
Out[7]:
        keywords
0  cheap shoes
1  luxury shoes
2  cheap hiking shoes

Now I have a dictionary that contains the following keys / values: 现在我有一个包含以下键/值的字典:

labels = {'cheap' : 'budget', 'luxury' : 'expensive', 'hiking' : 'sport'}

What I would like to do is find out whether a key in the dictionary exist in the dataframe, and if so, return the appropriate value 我想要做的是找出数据框中是否存在字典中的键,如果存在,则返回适当的值

I was able to somewhat get there using the following: 我能够使用以下方法实现这一目标:

for k,v in labels.items():
   keywords['Labels'] = np.where(keywords['keywords'].str.contains(k),v,'No Match')

However, the output is missing the first two keys and is only catching the last "hiking" key 但是,输出缺少前两个键,只捕获最后一个“远足”键

    keywords            Labels
0   cheap shoes         No Match
1   luxury shoes        No Match
2   cheap hiking shoes  sport

Additionally, I'd also like to know if there's a way to catch multiple values in the dictionary separated by | 另外,我还想知道是否有一种方法可以捕获由|分隔的字典中的多个值 , so the ideal output would look like this ,所以理想的输出看起来像这样

    keywords            Labels
0   cheap shoes         budget
1   luxury shoes        expensive
2   cheap hiking shoes  budget | sport

Any help or guidance is much appreciated. 非常感谢任何帮助或指导。

Cheers 干杯

It's certainly possible. 这当然是可能的。 Here is one way. 这是一种方式。

d = {'keywords': ['cheap shoes', 'luxury shoes', 'cheap hiking shoes', 'nothing']}

keywords = pd.DataFrame(d,columns=['keywords'])

labels = {'cheap': 'budget', 'luxury': 'expensive', 'hiking': 'sport'}

df = pd.DataFrame(d)

def matcher(k):
    x = (i for i in labels if i in k)
    return ' | '.join(map(labels.get, x))

df['values'] = df['keywords'].map(matcher)

#              keywords          values
# 0         cheap shoes          budget
# 1        luxury shoes       expensive
# 2  cheap hiking shoes  budget | sport
# 3             nothing                

You can use "|".join(labels.keys()) to get a pattern to be used by re.findall() . 您可以使用"|".join(labels.keys())来获取re.findall()使用的模式。

import pandas as pd
import re

d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']}
keywords = pd.DataFrame(d,columns=['keywords'])
labels = {'cheap' : 'budget', 'luxury' : 'expensive', 'hiking' : 'sport'}
pattern = "|".join(labels.keys())

def f(s):
    return "|".join(labels[word] for word in re.findall(pattern, s))

keywords.keywords.map(f)

Sticking with your approach, you could do eg 坚持你的方法,你可以做到例如

arr = np.array([np.where(keywords['keywords'].str.contains(k), v, 'No Match') for k, v in labels.items()]).T
keywords["Labels"] = ["|".join(set(item[ind if ind.sum() == ind.shape[0] else ~ind])) for item, ind in zip(arr, (arr == "No Match"))]

Out[97]: 
             keywords        Labels
0         cheap shoes        budget
1        luxury shoes     expensive
2  cheap hiking shoes  sport|budget

I like the idea of using replace first then finding the values. 我喜欢先使用replace然后找到值的想法。

keywords.assign(
    values=
    keywords.keywords.replace(labels, regex=True)
            .str.findall(f'({"|".join(labels.values())})')
            .str.join(' | ')
)

             keywords          values
0         cheap shoes          budget
1        luxury shoes       expensive
2  cheap hiking shoes  budget | sport

You could split the strings into separate columns, then stack into a multi index, so that you can map , the labels dictionary to the values. 您可以split字符串split为单独的列,然后stack成多索引,以便您可以将标签字典map到值。 Then groupby the initial index, and concatenate the strings that belong to each index 然后groupby初始索引,并concatenate属于每个索引的字符串

keywords['Labels'] = keywords.keywords.str.split(expand=True).stack()\
                     .map(labels).groupby(level=0)\
                     .apply(lambda x: x.str.cat(sep=' | '))



            keywords          Labels
0         cheap shoes          budget
1        luxury shoes       expensive
2  cheap hiking shoes  budget | sport

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫数据框到具有多个键的字典 - Pandas Dataframe to Dictionary with Multiple Keys 从具有多个键和值的字典创建 dataframe - Create a dataframe from a dictionary with multiple keys and values DataFrame 列到具有多个键和值的字典 - DataFrame columns to dictionary with multiple keys and values python pandas将dataframe转换为具有多个值的字典 - python pandas convert dataframe to dictionary with multiple values 如何将具有多个值的字典转换为pandas数据框? - How to convert dictionary with multiple values into a pandas dataframe? "将熊猫数据框转换为在多行上具有相同键的字典" - Converting pandas dataframe to dictionary with same keys over multiple rows 如何使用 Pandas 数据框的公共键填充多个字典? - How to populate multiple dictionary with common keys to pandas dataframe? Python Pandas 查找另一个数据帧返回多个匹配项 - Python Pandas Lookup another Dataframe return Multiple Matches 从 Python 字典创建一个 Pandas 数据框。 Python 字典有多个键,它的值有字符串和列表数据类型 - Create a pandas dataframe from a python dictionary. Python dictionary has multiple keys and its values have both string and list data types 使用python pandas在数据框中使用多个字典重新映射多个列值 - remapping multiple column values with multiple dictionary in dataframe using python pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM