[英]Returning the last valid (non-null) value from a pandas DataFrame
Suppose I have a dataframe
looks like: 假设我有一个
dataframe
如下:
a b
0 11 A
1 -2 A
2 3 A
3 NA A
4 0.5 B
5 NA B
6 -9 B
I can create a group by 'b'. 我可以用'b'创建一个组。 Is there a fast way to get the last non-NA value in 'a' of each group?
有没有一种快速的方法来获得每组中'a'的最后一个非NA值? In this case would be 3 for group A and -9 for group B.
在这种情况下,对于A组为3,对于B组为-9。
(In this case the series 'a' is sorted as given, but it might not be the case. There could be another column 'c', according which the 'last' is defined.) (在这种情况下,系列'a'按给定的顺序排序,但情况可能并非如此。可能有另一列'c',根据该列定义'last'。)
I wrote my own loop code by looking into the grouped.groups dict. 我通过查看groups.groups dict编写了自己的循环代码。 But apparently that's very inefficient given my huge dataset.
但显然,鉴于我庞大的数据集,效率非常低。 I think this could be done very straightforwardly -- maybe I am just too novice with pandas :-)
我认为这可以非常直接地完成 - 也许我对熊猫太新手了:-)
I added a github issue for this recently: https://github.com/pydata/pandas/issues/1043 我最近为此添加了一个github问题: https : //github.com/pydata/pandas/issues/1043
In the meantime, you could do: 在此期间,你可以这样做:
def get_last_valid(series):
return series.dropna().iget(-1)
df.groupby('b')['a'].apply(get_last_valid)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.