简体   繁体   English

如何逐行应用 numpy.where() 或 fillna() 以从新填充的行中返回元素

[英]How to apply numpy.where() or fillna() row by row to return elements from newly-filled rows

I am trying to fill NaN rows based on previous rows AND different columns.我正在尝试根据以前的行和不同的列填充 NaN 行。 I have the following code:我有以下代码:

import pandas as pd
import numpy as np

data = {'value':[55,58,60,62,64,np.nan,np.nan],
        'growth_rate': [np.nan,1.0545,1.034483,1.033333,1.032258,1.02,1.03]}

df = pd.DataFrame(data)  

print(df) 

Which gives the following dataframe:这给出了以下数据框:

   value  growth_rate
0   55.0          NaN
1   58.0     1.054500
2   60.0     1.034483
3   62.0     1.033333
4   64.0     1.032258
5    NaN     1.020000
6    NaN     1.030000

I do have the growth rates to fill the gaps in rows 5 and 6. I've tried the following code:我确实有填充第 5 行和第 6 行空白的增长率。我尝试了以下代码:

df['value'] = np.where(df['value'].isnull(), df['value'].shift(1) * df['growth_rate'], df['value'])
print(df) 

Which gives me the following output:这给了我以下输出:

   value  growth_rate
0  55.00          NaN
1  58.00     1.054500
2  60.00     1.034483
3  62.00     1.033333
4  64.00     1.032258
5  65.28     1.020000
6    NaN     1.030000

As you can see, only row 5 was filled using np.where() .如您所见,使用np.where()仅填充了第 5 行。 I have to rerun this line to get the expected result:我必须重新运行这一行才能得到预期的结果:

     value  growth_rate
0  55.0000          NaN
1  58.0000     1.054500
2  60.0000     1.034483
3  62.0000     1.033333
4  64.0000     1.032258
5  65.2800     1.020000
6  67.2384     1.030000

However, this approach is not efficient.但是,这种方法效率不高。 There must be a way to make this operation in one line!必须有一种方法可以在一行中进行此操作! I've tried with fillna() as well, but I get the same results:我也尝试过fillna() ,但得到了相同的结果:

df['value'] = df['value'].fillna(df['value'].shift(1) * df['growth_rate'])
print(df) 
   value  growth_rate
0  55.00          NaN
1  58.00     1.054500
2  60.00     1.034483
3  62.00     1.033333
4  64.00     1.032258
5  65.28     1.020000
6    NaN     1.030000

I wish I could find some sort of ffill() or np.where() that fills gaps based newly-filled rows and another column (growth_rate) at the same time, all in one step.我希望我能找到某种ffill()np.where()同时填充基于新填充的行和另一列 (growth_rate) 的空白,所有这些都一步完成。

Assuming all missing values are in a single group, we can ffill the missing values in value to bring down the last valid value, then take the cumulative product ( cumprod ) of growth_rate where value isna :假设所有缺失值都在一个组,我们可以ffill价值的缺失值打倒最后的有效值,然后采取累积产物( cumprod的) growth_rate其中value isna

m = df['value'].isna()
df.loc[m, 'value'] = df['value'].ffill() * df.loc[m, 'growth_rate'].cumprod()

df : df

     value  growth_rate
0  55.0000          NaN
1  58.0000     1.054500
2  60.0000     1.034483
3  62.0000     1.033333
4  64.0000     1.032258
5  65.2800     1.020000
6  67.2384     1.030000

Setup and imports:设置和导入:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'value': [55.0, 58.0, 60.0, 62.0, 64.0, np.nan, np.nan],
    'growth_rate': [np.nan, 1.0545, 1.034483, 1.033333, 1.032258, 1.02, 1.03]
})

Assuming we want separate interspersed nan groups to be calculated independently we can create groups with cumsum and use groupby cumprod instead:假设我们希望独立计算单独的散布nan组,我们可以使用cumsum创建组并使用groupby cumprod代替:

m = df['value'].isna()
df.loc[m, 'value'] = (
        df['value'].ffill() *
        df.loc[m, 'growth_rate'].groupby((~m).cumsum()).cumprod()
)

df : df

       value  growth_rate
0  55.000000          NaN
1  58.000000     1.054500
2  60.000014     1.034483  # (group 1) cumprod 
3  62.000000     1.033333
4  64.000000     1.032258
5  65.280000     1.020000  # (group 2) values same as without groupby
6  67.238400     1.030000  # since these are in a group together

Modified setup and imports:修改设置和导入:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'value': [55.0, 58.0, np.nan, 62.0, 64.0, np.nan, np.nan],
    'growth_rate': [np.nan, 1.0545, 1.034483, 1.033333, 1.032258, 1.02, 1.03]
})

modified df :修改后的df

   value  growth_rate
0   55.0          NaN
1   58.0     1.054500
2    NaN     1.034483
3   62.0     1.033333
4   64.0     1.032258
5    NaN     1.020000
6    NaN     1.030000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 numpy fillna() 和 numpy.where() 用于 Pandas DataFrame 中的列? - How to use numpy fillna() with numpy.where() for a column in a pandas DataFrame? 如何让 numpy.where() 只返回满足条件的元素? - How to let numpy.where() return only the elements satisfying the condition? PYTHON 如何使用 numpy.where 添加一列,其中包括下一行数据框中的数据? - PYTHON How to add a column using numpy.where that includes data from the dataframe in the next row? numpy.where用于该行的索引不全为零 - numpy.where for row index which that row is not all zero 如何在2D数组上获得与numpy.where相同的结果,而又没有从同一行获取2个索引 - How to obtain the same result as numpy.where over a 2D array without getting 2 indices from the same row 是否有一个numpy.where()等效于按行操作? - Is there a numpy.where() equivalent for row-wise operations? Pandas - Numpy.Where 引用前一行值 - Pandas - Numpy.Where referencing previous row value 使用numpy.where()或类似方法从矩阵中的一行获取特定值 - Using numpy.where() or similar to get specifc values from a row in a matrix numpy.where-奇怪的行为:新元素从何处产生? - numpy.where - Weird behaviour: new elements spawning from nowhere? numpy.where()究竟是如何选择此示例中的元素的? - How exactly does numpy.where() select the elements in this example?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM