熊猫：向前填充而不填充尾随的NaN

Question

I have a dataframe where each column is a time series of different length. 我有一个数据框，其中每一列都是不同长度的时间序列。 As such, there are missing values both between values in the time series, and at the end of each time series save one column. 因此，在时间序列中的值之间以及在每个时间序列的末尾都保存一列的值都缺失。 I would like to fill the missing values between values, but not fill the "trailing" NaNs 我想填写值之间的缺失值，但不填写“尾随” NaN

Using df = df.fillna(method='ffill') gets me most of the way there, but fills in the trailing NaNs, which I don't want, because where the data ends is actually important to my analysis. 使用df = df.fillna(method='ffill')可以帮助我解决大部分问题，但可以填写尾随的NaN，这是我所不希望的，因为数据的结尾对我的分析实际上很重要。

Edit: 编辑：

I would like to turn this: 我想转一下：

            ERICB SS Equity  DCI US Equity  FLEX US Equity
date

2008-02-14            8.026            NaN             NaN
2008-02-18              NaN            NaN           1.472
2008-02-19            8.074            NaN             NaN
2008-02-22              NaN            NaN           1.532
2008-02-25            8.062            NaN             NaN
2008-03-03            8.100            NaN             NaN
2008-03-06            8.100            NaN           1.955
2008-03-07            8.100            NaN             NaN
2010-12-30            5.431            NaN             NaN
2010-12-31            5.422            NaN             NaN
2011-01-03            5.422            NaN             NaN
2011-01-04            5.373            NaN             NaN

Into this: 变成这个：

            ERICB SS Equity  DCI US Equity  FLEX US Equity
date

2008-02-14            8.026            NaN             NaN
2008-02-18            8.026            NaN           1.472
2008-02-19            8.074            NaN           1.472
2008-02-22            8.074            NaN           1.532
2008-02-25            8.062            NaN           1.532
2008-03-03            8.100            NaN           1.532
2008-03-06            8.100            NaN           1.955
2008-03-07            8.100            NaN             NaN
2010-12-30            5.431            NaN             NaN
2010-12-31            5.422            NaN             NaN
2011-01-03            5.422            NaN             NaN
2011-01-04            5.373            NaN             NaN

So it's forward filled but only when there is some non-null value in the future to fill to, leaving the trailing nulls. 因此，它是前向填充的，但是仅当将来有一些非空值要填充时才保留，而尾随为空。

Answer 1

One way would be to bfill, which makes everything before the last non-NaN value non-NaN, and then use where to select the ffill() results: 一种方法是填充，使所有上一个非NaN值之前的内容都变为非NaN，然后使用where选择ffill（）结果：

In [45]: df.ffill().where(df.bfill().notnull())
Out[45]: 
          date  ERICB SS Equity  DCI US Equity  FLEX US Equity
0   2008-02-14            8.026            NaN             NaN
1   2008-02-18            8.026            NaN           1.472
2   2008-02-19            8.074            NaN           1.472
3   2008-02-22            8.074            NaN           1.532
4   2008-02-25            8.062            NaN           1.532
5   2008-03-03            8.100            NaN           1.532
6   2008-03-06            8.100            NaN           1.955
7   2008-03-07            8.100            NaN             NaN
8   2010-12-30            5.431            NaN             NaN
9   2010-12-31            5.422            NaN             NaN
10  2011-01-03            5.422            NaN             NaN
11  2011-01-04            5.373            NaN             NaN

Another would be to directly make a mask containing True for all values up to and including the last valid value: 另一个方法是直接为所有值（包括最后一个有效值）制作一个包含True的掩码：

df.ffill().where(df.notnull().iloc[::-1].cummax().iloc[::-1])

where the .iloc[::-1] stuff is required because I can't find a better way to take a cumulative operation in the bottom-to-top direction. 需要.iloc[::-1]地方，因为我找不到更好的方法来进行从下到上的累积操作。

熊猫：向前填充而不填充尾随的NaN

问题描述

1 个解决方案

解决方案1
5 已采纳 2018-09-11 19:25:02

熊猫：向前填充而不填充尾随的NaN

问题描述

1 个解决方案

解决方案1 5 已采纳 2018-09-11 19:25:02

解决方案1
5 已采纳 2018-09-11 19:25:02