Replace missing value in a row if there's a match in two columns from another row using Pandas

Question

I'm working on a data analysis project and I have the following dataframe that looks like this.

id	store	long	lat
1	A	1	-4
2	NaN	2	3
3	C	4	5
4	D	2	3

I want to fill the missing value NaN in the 'store' column with the one in row with id 4, given that row with id 2 and 4 have the same values in the 'long' and 'lat' columns, so the output should look like this

id	store	long	lat
1	A	1	-4
2	D	2	3
3	C	4	5
4	D	2	3

I want to do this for a long dataframe (almost a million rows), so I don't know the row ids that have the same 'long' and 'lat' values.

I'm working on Python using Pandas. I've only come up with this solution using for loops and iterrows(), which is super slow

df_missing_names = df[df['store'].isna()] #rows that have missing names
df_with_names = df[df['store'].notna()] #rows that don't have missing names

for indx, row in df_missing_names.iterrows(): #run through all the rows that don't have names

    for indx_j, row_j in df_with_names.iterrows(): #run through all the rows that have names

        if (row.lat == row_j.lat) & (row.long == row_j.long): #if both lat and long values match
            df[indx, 'store'] = row_j.store #then update name of the row in the original dataframe

Is there a faster way to do this using built in functions on Pandas? Thanks for the help

Answer 1

You can use:

df['store'] = df.groupby(['long', 'lat'], sort=False).bfill()['store']

Output:

   id store  long  lat
0   1     A     1   -4
1   2     D     2    3
2   3     C     4    5
3   4     D     2    3

Replace missing value in a row if there's a match in two columns from another row using Pandas

Question

1 answers

solution1
0 2022-08-25 19:52:01

Replace missing value in a row if there's a match in two columns from another row using Pandas

Question

1 answers

solution1 0 2022-08-25 19:52:01

solution1
0 2022-08-25 19:52:01