[英]Pandas: Forward Fill without Filling trailing NaNs
I have a dataframe where each column is a time series of different length. 我有一个数据框,其中每一列都是不同长度的时间序列。 As such, there are missing values both between values in the time series, and at the end of each time series save one column.
因此,在时间序列中的值之间以及在每个时间序列的末尾都保存一列的值都缺失。 I would like to fill the missing values between values, but not fill the "trailing" NaNs
我想填写值之间的缺失值,但不填写“尾随” NaN
Using df = df.fillna(method='ffill')
gets me most of the way there, but fills in the trailing NaNs, which I don't want, because where the data ends is actually important to my analysis. 使用
df = df.fillna(method='ffill')
可以帮助我解决大部分问题,但可以填写尾随的NaN,这是我所不希望的,因为数据的结尾对我的分析实际上很重要。
Edit: 编辑:
I would like to turn this: 我想转一下:
ERICB SS Equity DCI US Equity FLEX US Equity
date
2008-02-14 8.026 NaN NaN
2008-02-18 NaN NaN 1.472
2008-02-19 8.074 NaN NaN
2008-02-22 NaN NaN 1.532
2008-02-25 8.062 NaN NaN
2008-03-03 8.100 NaN NaN
2008-03-06 8.100 NaN 1.955
2008-03-07 8.100 NaN NaN
2010-12-30 5.431 NaN NaN
2010-12-31 5.422 NaN NaN
2011-01-03 5.422 NaN NaN
2011-01-04 5.373 NaN NaN
Into this: 变成这个:
ERICB SS Equity DCI US Equity FLEX US Equity
date
2008-02-14 8.026 NaN NaN
2008-02-18 8.026 NaN 1.472
2008-02-19 8.074 NaN 1.472
2008-02-22 8.074 NaN 1.532
2008-02-25 8.062 NaN 1.532
2008-03-03 8.100 NaN 1.532
2008-03-06 8.100 NaN 1.955
2008-03-07 8.100 NaN NaN
2010-12-30 5.431 NaN NaN
2010-12-31 5.422 NaN NaN
2011-01-03 5.422 NaN NaN
2011-01-04 5.373 NaN NaN
So it's forward filled but only when there is some non-null value in the future to fill to, leaving the trailing nulls. 因此,它是前向填充的,但是仅当将来有一些非空值要填充时才保留,而尾随为空。
One way would be to bfill, which makes everything before the last non-NaN value non-NaN, and then use where
to select the ffill() results: 一种方法是填充,使所有上一个非NaN值之前的内容都变为非NaN,然后使用
where
选择ffill()结果:
In [45]: df.ffill().where(df.bfill().notnull())
Out[45]:
date ERICB SS Equity DCI US Equity FLEX US Equity
0 2008-02-14 8.026 NaN NaN
1 2008-02-18 8.026 NaN 1.472
2 2008-02-19 8.074 NaN 1.472
3 2008-02-22 8.074 NaN 1.532
4 2008-02-25 8.062 NaN 1.532
5 2008-03-03 8.100 NaN 1.532
6 2008-03-06 8.100 NaN 1.955
7 2008-03-07 8.100 NaN NaN
8 2010-12-30 5.431 NaN NaN
9 2010-12-31 5.422 NaN NaN
10 2011-01-03 5.422 NaN NaN
11 2011-01-04 5.373 NaN NaN
Another would be to directly make a mask containing True for all values up to and including the last valid value: 另一个方法是直接为所有值(包括最后一个有效值)制作一个包含True的掩码:
df.ffill().where(df.notnull().iloc[::-1].cummax().iloc[::-1])
where the .iloc[::-1]
stuff is required because I can't find a better way to take a cumulative operation in the bottom-to-top direction. 需要
.iloc[::-1]
地方,因为我找不到更好的方法来进行从下到上的累积操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.