[英]Extracting group observations from pandas dataframe
I have a pandas dataframe. 我有一个熊猫数据框。 I want to extract a certain number of observations from each sub group of the dataframe and put them into a new dataframe.
我想从数据框的每个子组中提取一定数量的观察值,然后将它们放入新的数据框中。 For example, let's assume we have the following dataframe:
例如,假设我们具有以下数据框:
Var1 Var2
0 1 1.2
1 2 1.3
2 2 1.4
3 1 1.5
4 1 1.6
5 2 1.7
6 1 1.8
7 1 1.9
8 2 2.0
9 1 2.1
10 2 2.2
11 1 2.3
I want to sort it by var1 first: 我想先按var1对其进行排序:
Var1 Var2
0 1 1.2
1 1 1.5
2 1 1.6
3 1 1.8
4 1 1.9
5 1 2.1
6 1 2.3
7 2 1.3
8 2 1.4
9 2 1.7
10 2 2.0
11 2 2.2
and then keep the first two observations of each group and put them to a new dataframe: 然后保留每个组的前两个观察值,并将它们放入新的数据框中:
Var1 Var2
0 1 1.2
1 1 1.5
2 2 1.3
3 2 1.4
I know how to use group by, but it is not clear to me how to perform the second step. 我知道如何使用分组依据,但是我不清楚如何执行第二步。 Thank you very much for the help.
非常感谢你的帮助。
Use sort_values
with groupby
and head
: 将
sort_values
与groupby
和head
:
df = df.sort_values('Var1').groupby('Var1').head(2).reset_index(drop=True)
print (df)
Var1 Var2
0 1 1.2
1 1 1.5
2 2 1.3
3 2 1.4
df = df.groupby('Var1').head(2).sort_values('Var1').reset_index(drop=True)
print (df)
Var1 Var2
0 1 1.2
1 1 1.5
2 2 1.3
3 2 1.4
Another solution with iloc
: iloc
另一种解决方案:
df = df.groupby('Var1')['Var2']
.apply(lambda x: x.iloc[:2])
.reset_index(level=1, drop=True)
.reset_index()
print (df)
Var1 Var2
0 1 1.2
1 1 1.5
2 2 1.3
3 2 1.4
Note: 注意:
For older version of pandas change sort_values
to sort
, but rather toupgrade to last version. 对于旧版本的熊猫,
sort_values
更改为sort
,而是升级至最新版本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.