[英]Why doesn't first and last in a groupby give me first and last
I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented. 我发布这个是因为这个话题刚刚出现在另一个问题/答案中,并且行为记录不清楚。
Consider the dataframe df
考虑数据帧df
df = pd.DataFrame(dict(
A=list('xxxyyy'),
B=[np.nan, 1, 2, 3, 4, np.nan]
))
A B
0 x NaN
1 x 1.0
2 x 2.0
3 y 3.0
4 y 4.0
5 y NaN
I wanted to get the first and last rows of each group defined by column 'A'
. 我想获得由列'A'
定义的每个组的第一行和最后一行。
I tried 我试过了
df.groupby('A').B.agg(['first', 'last'])
first last
A
x 1.0 2.0
y 3.0 4.0
However, This doesn't give me the np.NaN
s that I expected. 但是,这并没有给我预期的np.NaN
。
How do I get the actual first and last values in each group? 如何获得每组中的实际第一个和最后一个值?
As noted here by @unutbu : 如前所述这里的@unutbu :
The groupby.first and groupby.last methods return the first and last non-null values respectively. groupby.first和groupby.last方法分别返回第一个和最后一个非null值。
To get the actual first and last values, do: 要获取实际的第一个和最后一个值,请执行:
def h(x):
return x.values[0]
def t(x):
return x.values[-1]
df.groupby('A').B.agg([h, t])
h t
A
x NaN 2.0
y 3.0 NaN
One option is to use the .nth
method: 一种选择是使用.nth
方法:
>>> gb = df.groupby('A')
>>> gb.nth(0)
B
A
x NaN
y 3.0
>>> gb.nth(-1)
B
A
x 2.0
y NaN
>>>
However, I haven't found a way to aggregate them neatly. 但是,我还没有找到一种方法来整齐地聚合它们。 Of course, one can always use a pd.DataFrame
constructor: 当然,总是可以使用pd.DataFrame
构造函数:
>>> pd.DataFrame({'first':gb.B.nth(0), 'last':gb.B.nth(-1)})
first last
A
x NaN 2.0
y 3.0 NaN
Note: I explicitly used the gb.B
attribute, or else you have to use .squeeze
注意:我明确使用了gb.B
属性,否则你必须使用.squeeze
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.