简体   繁体   English

根据另一行向前或向后填充

[英]Fill forwards or backwards depending on another row

I have a dataframe like the following: 我有一个如下数据框:

 loc status   ID
0   LA    NaN  NaN
1  CHC    NaN  NaN
2  NYC    ARR   32
3  CHC    DEP   45
4  SEA    NaN  NaN

I am trying to fill the missing values in the ID column depending on the status column. 我试图根据状态列填充ID列中的缺失值。 If the status column is "ARR": I want to fill backwards and if the status column is "DEP": I want to fill forwards so my final dataframe would look like: 如果状态列为“ ARR”:我想向后填充,并且如果状态列为“ DEP”:我想向前填充,这样我的最终数据帧将如下所示:

  loc status  ID
0   LA    NaN  32
1  CHC    NaN  32
2  NYC    ARR  32
3  CHC    DEP  45
4  SEA    NaN  45

I have been trying to accomplish this by using 2 for loops to loop through both columns, but I was wondering if there was a more efficient way to do this in Pandas? 我一直在尝试通过使用2 for循环遍历两列来实现这一点,但是我想知道在Pandas中是否有一种更有效的方法?

This should work 这应该工作

dt.ID.fillna(method='bfill').fillna(method='ffill')

It will fill NA values with preceding non-NA values (in reverse first and then forwards) 它将使用先前的非NA值填充NA值(先反向,然后转发)

Edit: 编辑:

Maybe this is what you're looking for (based on comments) 也许这就是您要寻找的(基于评论)

dt.ID.fillna(method='ffill').where(dt.ID.notnull() | (dt.status.shift(1) == 'DEP'), dt.ID.fillna(method='bfill').where(dt.ID.notnull() | (dt.status.shift(-1) == 'ARR')))

Its not very readable, but should give a general idea 它不是很可读,但是应该给出一个总体思路

You can approach this by dividing your dataframe df according to whether you want to forward fill or backward fill those rows: 您可以通过根据要向前填充还是向后填充这些行来划分数据帧df来实现此目的:

create two copies of your df, one with everything forward filled and the other with everything back filled 创建您的df的两个副本,一个副本将所有内容都填满,另一个将所有内容都填满

fill_forward = df.status.fillna(method='ffill') 
fill_backward = df.status.fillna(method='bfill') 

get the indices of the rows where forward filling resulted in rows being filled with 'DEP' and the indices where back filling resulted in the rows being filled with 'ARR' (ie. your two conditions) 获取前向填充导致行被'DEP'填充的行的索引,以及向后填充导致行被'ARR'填充的行的索引(即,您的两个条件)

forward_index = df.index[(df.status != fill_forward) & (fill_forward == 'DEP')]
backward_index = df.index[(df.status != fill_backward) & (fill_backward == 'ARR')]

update these indices so that they include the row directly preceding (used when forward filling) or the row directly following (used when backward filling). 更新这些索引,使它们包括紧接在前的行(在向前填充时使用)或紧随在后的行(在向后填充时使用)。

forward_rows = sorted(list({ind for f in forward_index for ind in [f,f-1]}))
backward_rows = sorted(list({ind for b in backward_index for ind in [b,b+1]}))

fill (using the appropriate method) for each the list of indices and assign the updated values to the original df. 为每个索引列表填充(使用适当的方法),并将更新的值分配给原始df。 note that by doing the forward fill first you are giving preference to forward filling when the indices overlap. 请注意,通过先进行正向填充,您可以优先选择索引重叠时的正向填充。

df.ID.iloc[forward_rows] = df.ID.iloc[forward_rows].fillna(method='ffill')
df.ID.iloc[backward_rows] = df.ID.iloc[backward_rows].fillna(method='bfill')

print(df)

   loc status    ID
0   LA    NaN  32.0
1  CHC    NaN  32.0
2  NYC    ARR  32.0
3  CHC    DEP  45.0
4  SEA    NaN  45.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM