简体   繁体   English

从前两组获得前两行

[英]Get first two rows from first two groups

Consider the data frame df 考虑数据帧df

mux = pd.MultiIndex.from_arrays([
    list('aaaabbbbbccdddddd'),
    list('tuvwlmnopxyfghijk')
], names=['one', 'two'])

df = pd.DataFrame({'col': np.arange(len(mux))}, mux)

df

         col
one two     
a   t      0
    u      1
    v      2
    w      3
b   l      4
    m      5
    n      6
    o      7
    p      8
c   x      9
    y     10
d   f     11
    g     12
    h     13
    i     14
    j     15
    k     16

How do I elegantly get the first two rows of the first two groups if I group by the first level of the index: 如果我按索引的第一级分组,如何优雅地获取前两组的前两行:

         col
one two     
a   t      0
    u      1
b   l      4
    m      5

Option 1 选项1
You could use a list comp and pd.concat : 您可以使用list comppd.concat

pd.concat([g.head(2) for _, g in df.groupby(level=0)][:2])

         col
one two     
a   t      0
    u      1
b   l      4
    m      5

Since having the list comp complete is an unnecessary overhead, you could use itertools.takewhile to prevent that. 由于列表comp完成是一个不必要的开销,你可以使用itertools.takewhile来防止这种情况。

it = itertools.takewhile(lambda x: x[0] < 2, enumerate(df.groupby(level=0)))
pd.concat([g.head(2) for _, (_, g) in it])

         col
one two     
a   t      0
    u      1
b   l      4
    m      5

Option 2 选项2
Another possible solution I could think of is pre-filtering your df to retain rows for only the first two values of index level 0, and then do the groupby. 我能想到的另一个可能的解决方案是预过滤你的df以保留仅为索引级别0的前两个值的行,然后执行groupby。

# https://stackoverflow.com/a/46900625/4909087
df.loc[df.index.levels[0][:2].values].groupby(level=0).head(2)

         col
one two     
a   t      0
    u      1
b   l      4
    m      5

Looks hacky but this is what I tried 看起来很讨厌,但这是我尝试过的

df.groupby(level=['one']).head(2)[:4]


       col
one two 
a   t   0
u   1
b   l   4
m   5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM