简体   繁体   中英

Filling NaN cells based on a value in the same row

I've asked a similar question to this and it got no reply, so I thought I'd take a different approach and see if anyone knows how to do this;

First I'll tell you my goal and what I already know:

I am currently cleaning a dataset and need to backward fill the dataset to get rid of some of the NaN values.

From the image below

数据

I would like to backward fill the Na columns of the same X column value, and fill the Na cell with a Y value that has a row value of 1

This image shows what outcome I would like

数据

I already know I can use

 df.loc[df['Y'] == 1] = df.loc[:,].bfill(limit=1)

to get it to only fill cells that are matching with a Y value row of 1 (hence the bottom Na cell is not filled).

Here is my question: Using the code above, it fills the middle Na because the Y value to the left is 1, this is fine for the top cell because the source cell and Na cell both have the X value of 1, although for the middle Na there is an X value of 2 and 3. So, is there a way to fill cells that share the same X value down the row? (the X values need to be the same between the source and the Na, if not, nothing happens.)

Thanks!

We can try with loc + groupby bfill :

df.loc[df['Y'] == 1, 'Z'] = df.groupby('X')['Z'].bfill()

groupby will ensure that each group of X values is treated independently, bfill will backfill per group. df['Y'] == 1 ensures that only rows with Y value of 1 will be updated.

df :

   X  Y    Z
0  1  1  2.0
1  1  2  2.0
2  2  1  NaN
3  3  1  3.0
4  3  2  NaN
5  4  1  4.0

Initial Frame:

import numpy as np
import pandas as pd

df = pd.DataFrame({'X': [1, 1, 2, 3, 3, 4],
                   'Y': [1, 2, 1, 1, 2, 1],
                   'Z': [np.nan, 2, np.nan, 3, np.nan, 4]})

df :

   X  Y    Z
0  1  1  NaN
1  1  2  2.0
2  2  1  NaN
3  3  1  3.0
4  3  2  NaN
5  4  1  4.0

Edit to bfill all columns except X and Y use:

df.loc[df['Y'] == 1, df.columns.difference(['X', 'Y'])] = df.groupby('X').bfill()

尝试使用shift

df.loc[df['Y'].eq(1) & df['X'].shift(-1).eq(df['X']), 'Z'] = df['Z'].bfill(limit=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM