![](/img/trans.png)
[英]What is the nicest (and fastest) way to create a flat dataframe from a multilevel dictionary
[英]Create dictionary from another dictionary with the fastest and scalable way
我几乎没有创建新词典的场景:
我的代码是:
# input dictionary
data =[
{'name': 'foo', 'rank': 3, 'game': 'football', 'total': 1},
{'name': 'bar', 'rank': 5, 'game': 'hockey', 'total': 0},
{'name': 'foo', 'rank': 7, 'game': 'tennis', 'total': 0},
{'name': 'foo', 'rank': 2, 'game': 'cricket', 'total': 2},
{'name': 'bar', 'rank': 1, 'game': 'cricket', 'total': 8},
]
result_list = []
merged_data = {}
result_data = {}
# Get the list of dict if key 'total' value is not zero
dict_without_total = [
den for den in data if den.get('total')
]
for my_dict in dict_without_total:
# deleting key 'brand' and 'total' from the
del my_dict['rank']
del my_dict['total']
result_data.update({
my_dict.get('name'): (my_dict.get('game'))
})
result_list.append(result_data)
# store all values of same keys in list and sort the values list
for result in result_list:
for keys, values in result.items():
if keys not in merged_data:
merged_data[keys] = []
merged_data[keys].append(values)
merged_data[keys].sort()
print merged_data
输出我的代码:
{
'bar': ['cricket', 'cricket', 'cricket'],
'foo': ['cricket', 'cricket', 'cricket']
}
预期结果:
{
'foo': ['cricket', 'football'],
'bar': ['cricket']
}
有没有更快的方法来获得结果,或者我可以使用一些python内置函数来处理这种情况?
您可以真正简化此操作,因为无需修改现有字典。 将原始数据结构单独保留并构建新数据结构通常会更加清晰。
data = [
{'name': 'foo', 'rank': 3, 'game': 'football', 'total': 1},
{'name': 'bar', 'rank': 5, 'game': 'hockey', 'total': 0},
{'name': 'foo', 'rank': 7, 'game': 'tennis', 'total': 0},
{'name': 'foo', 'rank': 2, 'game': 'cricket', 'total': 2},
{'name': 'bar', 'rank': 1, 'game': 'cricket', 'total': 8},
]
result = {}
for e in data:
if e["total"]:
name = e["name"]
if name not in result:
result[name] = []
result[name].append(e["game"])
print result
结果是{'foo': ['football', 'cricket'], 'bar': ['cricket']}
这就是你要找的东西。
你可以试试:
data =[
{'name': 'foo', 'rank': 3, 'game': 'football', 'total': 1},
{'name': 'bar', 'rank': 5, 'game': 'hockey', 'total': 0},
{'name': 'foo', 'rank': 7, 'game': 'tennis', 'total': 0},
{'name': 'foo', 'rank': 2, 'game': 'cricket', 'total': 2},
{'name': 'bar', 'rank': 1, 'game': 'cricket', 'total': 8},
]
final_dict={}
for single_data in data:
if single_data['total'] > 0:
if single_data['name'] in final_dict:
final_dict[single_data['name']].append(single_data['game'])
else:
final_dict[single_data['name']]=[single_data['game']]
print final_dict
输出:
{'foo': ['football', 'cricket'], 'bar': ['cricket']}
如果我理解你的要求,应该这样做:
names = set(x['name'] for x in data)
{name: sorted(list(set(x['game'] for x in data if (x['total']>0 and x['name']==name)))) for name in names}
除了其他答案,如果你for my_dict in dict_without_total:
添加了result_data={}
,它应该可以正常工作。
for my_dict in dict_without_total:
result_data={}
....rest of the code...
result_data
没有在每次迭代时重新初始化,这是问题。
另一种方案:
要创建所需的字典:
from collections import defaultdict
d2 = defaultdict(set)
[d2[d["name"]].add(d["game"]) for d in data if d["total"] > 0]
要对键进行排序:
for key in d2.keys(): d2[key] = sorted(list(d2[key]))
您也可以选择熊猫(替代方法):
import pandas as pd
df = pd.DataFrame([i for i in data if i['total']])
{k: g['game'].tolist() for k,g in df.groupby('name')}
#Out[178]: {'bar': ['cricket'], 'foo': ['football', 'cricket']}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.