简体   繁体   English

将列值与dict匹配

[英]Match column values to dict

I have a dict and a dataframe like the examples v and df below. 我有一个dict和一个数据框,如下面的例子v和df。 I want to search through the items in df and return the item that has the maximum number of field values in common with the values in the dict. 我想搜索df中的项目并返回具有与dict中的值相同的最大字段值数量的项目。 In this case it would be item 3. I was thinking maybe using apply with a lambda function, or transposing the df. 在这种情况下,它将是第3项。我在考虑使用带有lambda函数的应用,或者转置df。 I just can't quiet get my head around it. 我只是无法安静地绕过它。 If anyone has a slick way to do this or any tips they're greatly appreciated. 如果有人有一个光滑的方式来做这个或任何提示,他们非常感激。

input: 输入:

v={'size':1,'color':red}

df:

item size color
2    2    red
3    1    red

Output:
3

Create one line DataFrame and merge with original: 创建一行DataFrame并与原始merge

a = pd.DataFrame(v, index=[0]).merge(df)['item']
print (a)
0    3
Name: item, dtype: int64

Another solution with query , but if strings values of dict is necessary add another " : query另一个解决方案,但如果dict字符串值是必要的添加另一个"

v1 = {k: '"{}"'.format(v) if isinstance(v, str) else v for k, v in v.items()}
print (v1)
{'size': 1, 'color': '"red"'}

df = df.query(' & '.join(['{}=={}'.format(i,j) for i, j in v1.items()]))['item']
print (df)
1    3
Name: item, dtype: int64

In output are possible 3 ways - Series with more values, one value or empty, so helper function was created: 在输出是可能的3种方式- Series有多个值,一个值或空的,所以辅助函数创建:

def get_val(v):
    x = pd.DataFrame(v, index=[0]).merge(df)['item']
    if x.empty:
        return 'Not found'
    elif len(x) == 1:
        return x.values[0]
    else:
        return x.values.tolist()
print (get_val({'size':1,'color':'red'}))
3

print (get_val({'size':10,'color':'red'}))
Not found

print (get_val({'color':'red'}))
[2, 3]

An alternative solution is to work with dictionaries instead of dataframes: 另一种解决方案是使用字典而不是数据帧:

v = {'size': 1, 'color': 'red'}

match_count = {}

fields = df.columns[1:]

for k, value in df.to_dict(orient='index').items():
    match_count[value['item']] = sum(value[i] == v[i] for i in fields & v.keys())

Result 结果

print(match_count)
# {2: 1, 3: 2}

res = max(match_count.items(), key=lambda x: x[1])

print(res)
# (3, 2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM