为什么不首先和最后一组给我第一个也是最后一个

Question

I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented. 我发布这个是因为这个话题刚刚出现在另一个问题/答案中，并且行为记录不清楚。

Consider the dataframe df 考虑数据帧df

df = pd.DataFrame(dict(
    A=list('xxxyyy'),
    B=[np.nan, 1, 2, 3, 4, np.nan]
))

   A    B
0  x  NaN
1  x  1.0
2  x  2.0
3  y  3.0
4  y  4.0
5  y  NaN

I wanted to get the first and last rows of each group defined by column 'A' . 我想获得由列'A'定义的每个组的第一行和最后一行。

I tried 我试过了

df.groupby('A').B.agg(['first', 'last'])

   first  last
A             
x    1.0   2.0
y    3.0   4.0

However, This doesn't give me the np.NaN s that I expected. 但是，这并没有给我预期的np.NaN 。

How do I get the actual first and last values in each group? 如何获得每组中的实际第一个和最后一个值？

Answer 1

As noted here by @unutbu : 如前所述这里的@unutbu ：

The groupby.first and groupby.last methods return the first and last non-null values respectively. groupby.first和groupby.last方法分别返回第一个和最后一个非null值。

To get the actual first and last values, do: 要获取实际的第一个和最后一个值，请执行：

def h(x):
    return x.values[0]

def t(x):
    return x.values[-1]

df.groupby('A').B.agg([h, t])

     h    t
A          
x  NaN  2.0
y  3.0  NaN

Answer 2

One option is to use the .nth method: 一种选择是使用.nth方法：

>>> gb = df.groupby('A')
>>> gb.nth(0)
     B
A
x  NaN
y  3.0
>>> gb.nth(-1)
     B
A
x  2.0
y  NaN
>>>

However, I haven't found a way to aggregate them neatly. 但是，我还没有找到一种方法来整齐地聚合它们。 Of course, one can always use a pd.DataFrame constructor: 当然，总是可以使用pd.DataFrame构造函数：

>>> pd.DataFrame({'first':gb.B.nth(0), 'last':gb.B.nth(-1)})
   first  last
A
x    NaN   2.0
y    3.0   NaN

Note: I explicitly used the gb.B attribute, or else you have to use .squeeze 注意：我明确使用了gb.B属性，否则你必须使用.squeeze

为什么不首先和最后一组给我第一个也是最后一个

问题描述

2 个解决方案

解决方案1
6 2017-08-17 20:55:07

解决方案2
6 已采纳 2017-08-17 21:18:55

为什么不首先和最后一组给我第一个也是最后一个

问题描述

2 个解决方案

解决方案1 6 2017-08-17 20:55:07

解决方案2 6 已采纳 2017-08-17 21:18:55

解决方案1
6 2017-08-17 20:55:07

解决方案2
6 已采纳 2017-08-17 21:18:55