[英]Pandas groupby sort on multiindex
import pandas as pd
values = {'C1': ['B', 'A'],
'C2': ['B', 'A'],
'C3': ['B', 'A'],
}
df = pd.DataFrame(values)
df.set_index(keys=['C1', 'C2'], inplace=True)
grouped = df.groupby(level='C1', sort=False)
for name, group in grouped:
print(name)
yields 产量
A
一种
B乙
However, I would expect 但是,我希望
B
乙
A一种
How do I get the second result? 如何获得第二个结果?
Could be a known issue as mentioned in my comment. 如我的评论中所述,这可能是一个已知问题。
Maybe this is a valid workaround: 也许这是一个有效的解决方法:
import pandas as pd
values = {'C1': ['B', 'A'],
'C2': ['B', 'A'],
'C3': ['B', 'A'],
}
df = pd.DataFrame(values)
grouped = df.groupby(['C1', 'C2'], sort=False)['C3']
for name, group in grouped:
print group.iloc[0]
Result 结果
B
乙
A
一种
Consider restructuring your data 考虑重组数据
Unless your real data requires you to reset to a MultiIndex
, it seems unnecessary to reindex before doing a groupby()
. 除非您的实际数据要求您重置为
MultiIndex
,否则在进行groupby()
之前似乎无需重新索引。
If you groupby
C1
only you get your desired example output: 如果仅按
C1
groupby
则会得到所需的示例输出:
import pandas as pd
values = {'C1': ['B', 'A'],
'C2': ['B', 'A'],
'C3': ['B', 'A'],
}
df = pd.DataFrame(values)
print 'Original DataFrame'
print df
print
df2 = df.set_index(keys=['C1', 'C2'], inplace=False)
print 'Reindexed DataFrame'
print df2
print
grouped = df.groupby(['C1'], sort=False)
grouped2 = df2.groupby(level='C1', sort=False)
print 'Original Groups'
print grouped.groups
print
print 'Reindexed Groups'
print grouped2.groups
print
print 'Original Group for loop output'
for name, group in grouped:
print(name)
print
print 'Reindexed Group for loop output'
for name, group in grouped2:
print(name)
Original DataFrame
C1 C2 C3
0 B B B
1 A A A
Reindexed DataFrame
C3
C1 C2
B B B
A A A
Original Groups
{'A': [1], 'B': [0]}
Reindexed Groups
{'A': [('A', 'A')], 'B': [('B', 'B')]}
Original Group for loop output
B
A
Reindexed Group for loop output
A
B
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.