多索引pandas数据帧到字典

Question

I have a dataframe as following: 我有一个数据帧如下：

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])

if I groupby by two columns and count the size, 如果我按两列分组并计算大小，

df.groupby(['regiment','company']).size()

I get the following: 我得到以下内容：

regiment    company
Dragoons    1st        2
            2nd        2
Nighthawks  1st        2
            2nd        2
Scouts      1st        2
            2nd        2
dtype: int64

What I want as an output is a dictionary as following: 我想要的输出是字典如下：

{'Dragoons':{'1st':2,'2nd':2},
 'Nighthawks': {'1st':2,'2nd':2}, 
  ... }

I tried different methods but to no avail. 我尝试了不同的方法但无济于事。 Is there relatively clean way to achieve the above? 是否有相对干净的方式来实现上述目标？

Thank you so much in advance!!!! 非常感谢你!!!!

Answer 1

You can add Series.unstack with DataFrame.to_dict : 您可以添加Series.unstack与DataFrame.to_dict ：

d = df.groupby(['regiment','company']).size().unstack().to_dict(orient='index')
print (d)
{'Dragoons': {'2nd': 2, '1st': 2}, 
 'Nighthawks': {'2nd': 2, '1st': 2}, 
 'Scouts': {'2nd': 2, '1st': 2}}

Another solution, very similar as another answer: 另一个解决方案，与另一个答案非常相似

from collections import Counter

df = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')}
print (df)
{'Dragoons': {'2nd': 2, '1st': 2}, 
'Nighthawks': {'2nd': 2, '1st': 2}, 
'Scouts': {'2nd': 2, '1st': 2}}

But if use first solution, there hs to be problem with NaN s (it depends of data) 但是如果使用第一个解决方案，那就是NaN的问题（它取决于数据）

Sample: 样品：

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '3rd'],
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
print (df)
      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd       Ali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     3rd       Ali             3             70

df1 = df.groupby(['regiment','company']).size().unstack()
print (df1)
company     1st  2nd  3rd
regiment                 
Dragoons    2.0  2.0  NaN
Nighthawks  2.0  2.0  NaN
Scouts      2.0  1.0  1.0

d = df1.to_dict(orient='index')
print (d)
{'Dragoons': {'3rd': nan, '2nd': 2.0, '1st': 2.0}, 
'Nighthawks': {'3rd': nan, '2nd': 2.0, '1st': 2.0}, 
'Scouts': {'3rd': 1.0, '2nd': 1.0, '1st': 2.0}}

Then is necessary use: 然后是必要的用途：

d = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')}
print (d)
{'Dragoons': {'2nd': 2, '1st': 2}, 
'Nighthawks': {'2nd': 2, '1st': 2},
 'Scouts': {'3rd': 1, '2nd': 1, '1st': 2}}

Or another John Galt answer. 或者另一个John Galt回答。

Answer 2

You can reset the index after group by and pivot your data as per your need. 您可以在分组后重置索引，并根据需要透视数据。 Below code gives the required output. 下面的代码给出了所需的输出。

df = df.groupby(['regiment','company']).size().reset_index()
print(pd.pivot_table(df, values=0, index='regiment', columns='company').to_dict(orient='index'))

output: 输出：

{'Nighthawks': {'2nd': 2, '1st': 2}, 'Scouts': {'2nd': 2, '1st': 2}, 'Dragoons': {'2nd': 2, '1st': 2}}

Answer 3

How about creating dict with groups comprehension. 如何用群体理解创建词典。

In [409]: {g:v['company'].value_counts().to_dict() for g, v in df.groupby('regiment')}
Out[409]:
{'Dragoons': {'1st': 2, '2nd': 2},
 'Nighthawks': {'1st': 2, '2nd': 2},
 'Scouts': {'1st': 2, '2nd': 2}}

多索引pandas数据帧到字典

问题描述

3 个解决方案

解决方案1
4 已采纳 2017-06-29 11:22:04

解决方案2
3 2017-06-29 11:27:36

解决方案3
1 2017-06-29 11:17:32

多索引pandas数据帧到字典

问题描述

3 个解决方案

解决方案1 4 已采纳 2017-06-29 11:22:04

解决方案2 3 2017-06-29 11:27:36

解决方案3 1 2017-06-29 11:17:32

解决方案1
4 已采纳 2017-06-29 11:22:04

解决方案2
3 2017-06-29 11:27:36

解决方案3
1 2017-06-29 11:17:32