[英]Match column values to dict
I have a dict and a dataframe like the examples v and df below. 我有一个dict和一个数据框,如下面的例子v和df。 I want to search through the items in df and return the item that has the maximum number of field values in common with the values in the dict.
我想搜索df中的项目并返回具有与dict中的值相同的最大字段值数量的项目。 In this case it would be item 3. I was thinking maybe using apply with a lambda function, or transposing the df.
在这种情况下,它将是第3项。我在考虑使用带有lambda函数的应用,或者转置df。 I just can't quiet get my head around it.
我只是无法安静地绕过它。 If anyone has a slick way to do this or any tips they're greatly appreciated.
如果有人有一个光滑的方式来做这个或任何提示,他们非常感激。
input: 输入:
v={'size':1,'color':red}
df:
item size color
2 2 red
3 1 red
Output:
3
Create one line DataFrame
and merge
with original: 创建一行
DataFrame
并与原始merge
:
a = pd.DataFrame(v, index=[0]).merge(df)['item']
print (a)
0 3
Name: item, dtype: int64
Another solution with query
, but if strings values of dict
is necessary add another "
: query
另一个解决方案,但如果dict
字符串值是必要的添加另一个"
:
v1 = {k: '"{}"'.format(v) if isinstance(v, str) else v for k, v in v.items()}
print (v1)
{'size': 1, 'color': '"red"'}
df = df.query(' & '.join(['{}=={}'.format(i,j) for i, j in v1.items()]))['item']
print (df)
1 3
Name: item, dtype: int64
In output are possible 3 ways - Series
with more values, one value or empty, so helper function was created: 在输出是可能的3种方式-
Series
有多个值,一个值或空的,所以辅助函数创建:
def get_val(v):
x = pd.DataFrame(v, index=[0]).merge(df)['item']
if x.empty:
return 'Not found'
elif len(x) == 1:
return x.values[0]
else:
return x.values.tolist()
print (get_val({'size':1,'color':'red'}))
3
print (get_val({'size':10,'color':'red'}))
Not found
print (get_val({'color':'red'}))
[2, 3]
An alternative solution is to work with dictionaries instead of dataframes: 另一种解决方案是使用字典而不是数据帧:
v = {'size': 1, 'color': 'red'}
match_count = {}
fields = df.columns[1:]
for k, value in df.to_dict(orient='index').items():
match_count[value['item']] = sum(value[i] == v[i] for i in fields & v.keys())
Result 结果
print(match_count)
# {2: 1, 3: 2}
res = max(match_count.items(), key=lambda x: x[1])
print(res)
# (3, 2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.