[英]How do I create nested dictionary from pandas data frame while adding numbers
I am trying to create a nested dictionary with the key as the office, then the remaining columns added within that office.我正在尝试创建一个以办公室为键的嵌套字典,然后将其余列添加到该办公室中。
Should look something like this.应该看起来像这样。
final_dict = {'YELLOW': {'Files Loaded': 21332, 'Files Assigned': 10613} 'RED':....}.... final_dict = {'YELLOW': {'Files Loaded': 21332, 'Files Assigned': 10613} 'RED':....}....
Current code is and I'm completely stuck on how to nest and add the values.当前代码是,我完全坚持如何嵌套和添加值。
d = {'Office': ['Yellow','Yellow','Red', 'Red', 'Blue', 'Blue'], 'Files Loaded': [1223, 3062, 10, 100, 1520, 75], 'Files Assigned': [1223, 30, 1500, 10, 75, 12],
'Files Analyzed': [1223, 15, 25, 34, 98, 1000], 'Discrepancies Identified': [17, 30, 150, 1456, 186, 1896]}
df = pd.DataFrame(data=d)
fields = ['Files Loaded', 'Files Assigned', 'Files Analyzed', 'Discrepancies Identified']
final_dict = df.groupby('Office')[fields].apply(list).to_dict()
print(final_dict)
{'Blue': ['Files Loaded', 'Files Assigned', 'Files Analyzed', 'Discrepancies Identified'], 'Red': ['Files Loaded', 'Files Assigned', 'Files Analyzed', 'Discrepancies Identified'], 'Yellow': ['Files Loaded', 'Files Assigned', 'Files Analyzed', 'Discrepancies Identified']}
With the following input:使用以下输入:
import pandas as pd
from pprint import pprint
d = {'Office': ['Yellow', 'Yellow', 'Red', 'Red', 'Blue', 'Blue'],
'Files Loaded': [1223, 3062, 10, 100, 1520, 75],
'Files Assigned': [1223, 30, 1500, 10, 75, 12],
'Files Analyzed': [1223, 15, 25, 34, 98, 1000],
'Discrepancies Identified': [17, 30, 150, 1456, 186, 1896]}
df = pd.DataFrame(data=d)
We can use the pandas groupby
and aggregation ( agg
) function to sum up the totals per office.我们可以使用 pandas
groupby
和聚合 ( agg
) function 来汇总每个办公室的总数。 Then by using to_dict
on 'index'
, we get the data provided as a dictionary, where the key
is the Office
and the values are a dictionary for which the key
is the column name and the values are the aggregated count.然后通过在
'index'
上使用to_dict
,我们得到作为字典提供的数据,其中key
是Office
,值是字典,其中key
是列名,值是聚合计数。
data = df.groupby('Office').agg('sum')
answer = data.to_dict('index')
pprint(answer)
Output: Output:
{'Blue': {'Discrepancies Identified': 2082,
'Files Analyzed': 1098,
'Files Assigned': 87,
'Files Loaded': 1595},
'Red': {'Discrepancies Identified': 1606,
'Files Analyzed': 59,
'Files Assigned': 1510,
'Files Loaded': 110},
'Yellow': {'Discrepancies Identified': 47,
'Files Analyzed': 1238,
'Files Assigned': 1253,
'Files Loaded': 4285}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.