[英]How to apply numpy.where() or fillna() row by row to return elements from newly-filled rows
I am trying to fill NaN rows based on previous rows AND different columns.我正在尝试根据以前的行和不同的列填充 NaN 行。 I have the following code:我有以下代码:
import pandas as pd
import numpy as np
data = {'value':[55,58,60,62,64,np.nan,np.nan],
'growth_rate': [np.nan,1.0545,1.034483,1.033333,1.032258,1.02,1.03]}
df = pd.DataFrame(data)
print(df)
Which gives the following dataframe:这给出了以下数据框:
value growth_rate
0 55.0 NaN
1 58.0 1.054500
2 60.0 1.034483
3 62.0 1.033333
4 64.0 1.032258
5 NaN 1.020000
6 NaN 1.030000
I do have the growth rates to fill the gaps in rows 5 and 6. I've tried the following code:我确实有填充第 5 行和第 6 行空白的增长率。我尝试了以下代码:
df['value'] = np.where(df['value'].isnull(), df['value'].shift(1) * df['growth_rate'], df['value'])
print(df)
Which gives me the following output:这给了我以下输出:
value growth_rate
0 55.00 NaN
1 58.00 1.054500
2 60.00 1.034483
3 62.00 1.033333
4 64.00 1.032258
5 65.28 1.020000
6 NaN 1.030000
As you can see, only row 5 was filled using np.where()
.如您所见,使用np.where()
仅填充了第 5 行。 I have to rerun this line to get the expected result:我必须重新运行这一行才能得到预期的结果:
value growth_rate
0 55.0000 NaN
1 58.0000 1.054500
2 60.0000 1.034483
3 62.0000 1.033333
4 64.0000 1.032258
5 65.2800 1.020000
6 67.2384 1.030000
However, this approach is not efficient.但是,这种方法效率不高。 There must be a way to make this operation in one line!必须有一种方法可以在一行中进行此操作! I've tried with fillna()
as well, but I get the same results:我也尝试过fillna()
,但得到了相同的结果:
df['value'] = df['value'].fillna(df['value'].shift(1) * df['growth_rate'])
print(df)
value growth_rate
0 55.00 NaN
1 58.00 1.054500
2 60.00 1.034483
3 62.00 1.033333
4 64.00 1.032258
5 65.28 1.020000
6 NaN 1.030000
I wish I could find some sort of ffill()
or np.where()
that fills gaps based newly-filled rows and another column (growth_rate) at the same time, all in one step.我希望我能找到某种ffill()
或np.where()
同时填充基于新填充的行和另一列 (growth_rate) 的空白,所有这些都一步完成。
Assuming all missing values are in a single group, we can ffill
the missing values in value to bring down the last valid value, then take the cumulative product ( cumprod
) of growth_rate
where value
isna
:假设所有缺失值都在一个组,我们可以ffill
价值的缺失值打倒最后的有效值,然后采取累积产物( cumprod
的) growth_rate
其中value
isna
:
m = df['value'].isna()
df.loc[m, 'value'] = df['value'].ffill() * df.loc[m, 'growth_rate'].cumprod()
df
: df
:
value growth_rate
0 55.0000 NaN
1 58.0000 1.054500
2 60.0000 1.034483
3 62.0000 1.033333
4 64.0000 1.032258
5 65.2800 1.020000
6 67.2384 1.030000
Setup and imports:设置和导入:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'value': [55.0, 58.0, 60.0, 62.0, 64.0, np.nan, np.nan],
'growth_rate': [np.nan, 1.0545, 1.034483, 1.033333, 1.032258, 1.02, 1.03]
})
Assuming we want separate interspersed nan
groups to be calculated independently we can create groups with cumsum
and use groupby cumprod
instead:假设我们希望独立计算单独的散布nan
组,我们可以使用cumsum
创建组并使用groupby cumprod
代替:
m = df['value'].isna()
df.loc[m, 'value'] = (
df['value'].ffill() *
df.loc[m, 'growth_rate'].groupby((~m).cumsum()).cumprod()
)
df
: df
:
value growth_rate
0 55.000000 NaN
1 58.000000 1.054500
2 60.000014 1.034483 # (group 1) cumprod
3 62.000000 1.033333
4 64.000000 1.032258
5 65.280000 1.020000 # (group 2) values same as without groupby
6 67.238400 1.030000 # since these are in a group together
Modified setup and imports:修改设置和导入:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'value': [55.0, 58.0, np.nan, 62.0, 64.0, np.nan, np.nan],
'growth_rate': [np.nan, 1.0545, 1.034483, 1.033333, 1.032258, 1.02, 1.03]
})
modified df
:修改后的df
:
value growth_rate
0 55.0 NaN
1 58.0 1.054500
2 NaN 1.034483
3 62.0 1.033333
4 64.0 1.032258
5 NaN 1.020000
6 NaN 1.030000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.