简体   繁体   English

如果使用 Pandas 在另一行的两列中匹配,则替换一行中的缺失值

[英]Replace missing value in a row if there's a match in two columns from another row using Pandas

I'm working on a data analysis project and I have the following dataframe that looks like this.我正在做一个数据分析项目,我有以下 dataframe 看起来像这样。

id ID store店铺 long lat纬度
1 1 A一个 1 1 -4 -4
2 2 NaN 2 2 3 3
3 3 C C 4 4 5 5
4 4 D D 2 2 3 3

I want to fill the missing value NaN in the 'store' column with the one in row with id 4, given that row with id 2 and 4 have the same values in the 'long' and 'lat' columns, so the output should look like this我想用 id 为 4 的行中的一个填充“store”列中的缺失值 NaN,因为 id 为 2 和 4 的行在“long”和“lat”列中具有相同的值,因此 output 应该看起来像这样

id ID store店铺 long lat纬度
1 1 A一个 1 1 -4 -4
2 2 D D 2 2 3 3
3 3 C C 4 4 5 5
4 4 D D 2 2 3 3

I want to do this for a long dataframe (almost a million rows), so I don't know the row ids that have the same 'long' and 'lat' values.我想为长 dataframe (几乎一百万行)执行此操作,所以我不知道具有相同“long”和“lat”值的行 ID。

I'm working on Python using Pandas.我正在使用 Pandas 研究 Python。 I've only come up with this solution using for loops and iterrows(), which is super slow我只使用for 循环和 iterrows() 提出了这个解决方案,这非常慢

df_missing_names = df[df['store'].isna()] #rows that have missing names
df_with_names = df[df['store'].notna()] #rows that don't have missing names

for indx, row in df_missing_names.iterrows(): #run through all the rows that don't have names

    for indx_j, row_j in df_with_names.iterrows(): #run through all the rows that have names

        if (row.lat == row_j.lat) & (row.long == row_j.long): #if both lat and long values match
            df[indx, 'store'] = row_j.store #then update name of the row in the original dataframe

Is there a faster way to do this using built in functions on Pandas?有没有更快的方法使用 Pandas 上的内置函数来执行此操作? Thanks for the help谢谢您的帮助

You can use:您可以使用:

df['store'] = df.groupby(['long', 'lat'], sort=False).bfill()['store']

Output: Output:

   id store  long  lat
0   1     A     1   -4
1   2     D     2    3
2   3     C     4    5
3   4     D     2    3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 替换熊猫另一行中的缺失值 - Replace missing value from another row in pandas pandas:根据其他列将多行中一个单元格的值替换为一个特定行 - pandas: replace one cell's value from mutiple row by one particular row based on other columns pandas 通过跨列计算来确定一行中的缺失值 - pandas determine missing value(s) in a row by computing across columns 熊猫删除行,除非来自另一个数据帧中两列的字符串 - pandas delete row unless strings from two columns in another dataframe 用另一列的行最大值替换 0.01 - replace 0.01 with the row maximum value from another columns Pandas:在 2 个数据帧之间按每一行比较一列的值,并根据匹配替换另一列中的值 - Pandas: Compare value for a column by each row between 2 dataframes and replace value in another column based on the match Python Pandas:当该行中的两个值与该列中更远的值匹配时,从该行中选择“值” - Python Pandas: Selecting Value from row when two values in that row match a value farther up the column 使用 Pandas 如何根据来自不同 csv 文件的两列复制行值 - Using Pandas how to copy row value based on two Columns from different csv file 使用上一行的两列来确定熊猫数据框中的列值 - Using two columns from previous row to determine column value in a pandas data frame Python Pandas - 将一行中的一个值与前一行中两个不同列中的两个不同值进行比较 - Python Pandas - Compared a value in a row to two different values from two different columns in previous row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM