[英]Fill N/A data based on value in another column
I have a csv file with 2 column store_name
and store_location
that some store_location
is missing.我有一个 csv 文件,其中包含 2 列
store_name
和store_location
,其中缺少一些store_location
。 And I want to fill missing data with data in same column based on value in another column.我想根据另一列中的值用同一列中的数据填充缺失的数据。
Below is my csv file:下面是我的 csv 文件:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/hoatranobita/app_to_cloud_4/main/store_location.csv')
Here is my expected Output:这是我预期的 Output:
I tried to find solutions but still not find out.我试图找到解决方案,但仍然没有找到。
Thanks.谢谢。
TL;DR: providing 3 different approaches in case you want to: TL;DR:提供 3 种不同的方法,以防您想要:
ensure a unique value per group确保每个组的唯一值
fill the NaN with the first available value用第一个可用值填充 NaN
fill the NaN with the previous/next non-NA row用上一个/下一个非 NA 行填充 NaN
Looks like you could need a unique value per group.看起来您可能需要每个组的唯一值。 Use
groupby.transform('first')
to get the first non-NA value:使用
groupby.transform('first')
获取第一个非 NA 值:
df['store_location'] = df.groupby('store_name')['store_location'].transform('first')
output: output:
store_name store_location
0 AJ's Liquor III POINT (-93.648959 42.021456)
1 AJ's Liquor III POINT (-93.648959 42.021456)
2 Ambysure Inc / Clinton POINT (-90.225022 41.833351)
3 Ambysure Inc / Clinton POINT (-90.225022 41.833351)
4 Bancroft Liquor Store POINT (-94.218 43.29355)
5 Bancroft Liquor Store POINT (-94.218 43.29355)
6 Bani's POINT (-92.455801 42.518018000000005)
7 Bani's / Cedar Falls POINT (-92.455801 42.518018000000005)
8 Bani's / Cedar Falls POINT (-92.455801 42.518018000000005)
9 Barrys Mini Mart POINT (-91.38553 43.050183)
10 Baxter Family Market POINT (-93.151465 41.826715)
11 Beecher Liquor / Dubuque POINT (-90.696886 42.500775000000004)
12 Beer on Floyd / Sioux City POINT (-96.372185 42.531448000000005)
13 Beer Thirty Denison POINT (-95.360162 42.012412)
14 Beer Thirty Storm Lake / Storm Lake POINT (-95.198584 42.646794)
15 Beer Thirty Storm Lake / Storm Lake POINT (-95.198584 42.646794)
16 Beer Thirty Storm Lake / Storm Lake POINT (-95.198584 42.646794)
df['store_location'] = df['store_location'].fillna(df.groupby('store_name')['store_location'].transform('first'))
output: output:
store_name store_location
0 AJ's Liquor III POINT (-93.648959 42.021456)
1 AJ's Liquor III POINT (-93.648959 42.021456)
2 Ambysure Inc / Clinton POINT (-90.225022 41.833351)
3 Ambysure Inc / Clinton POINT (-90.225022 41.833351)
4 Bancroft Liquor Store POINT (-94.218 43.29355)
5 Bancroft Liquor Store POINT (-94.218 43.29355)
6 Bani's POINT (-92.455801 42.518018000000005)
7 Bani's / Cedar Falls POINT (-92.455801 42.518018000000005)
8 Bani's / Cedar Falls POINT (-92.455801 42.518018000000005)
9 Barrys Mini Mart POINT (-91.38553 43.050183)
10 Baxter Family Market POINT (-93.151465 41.826715)
11 Beecher Liquor / Dubuque POINT (-90.696886 42.500775000000004)
12 Beer on Floyd / Sioux City POINT (-96.372185 42.531448000000005)
13 Beer Thirty Denison POINT (-95.360162 42.012412)
14 Beer Thirty Storm Lake / Storm Lake POINT (-95.198584 42.646794)
15 Beer Thirty Storm Lake / Storm Lake POINT (-95.19941700000001 42.647498)
16 Beer Thirty Storm Lake / Storm Lake POINT (-95.198584 42.646794)
ffill
+ bfill
:ffill
+ bfill
每组的上一个/下一个非 NA 值:df['store_location'] = df.groupby('store_name')['store_location'].transform(lambda g: g.ffill().bfill())
output: output:
store_name store_location
0 AJ's Liquor III POINT (-93.648959 42.021456)
1 AJ's Liquor III POINT (-93.648959 42.021456)
2 Ambysure Inc / Clinton POINT (-90.225022 41.833351)
3 Ambysure Inc / Clinton POINT (-90.225022 41.833351)
4 Bancroft Liquor Store POINT (-94.218 43.29355)
5 Bancroft Liquor Store POINT (-94.218 43.29355)
6 Bani's POINT (-92.455801 42.518018000000005)
7 Bani's / Cedar Falls POINT (-92.455801 42.518018000000005)
8 Bani's / Cedar Falls POINT (-92.455801 42.518018000000005)
9 Barrys Mini Mart POINT (-91.38553 43.050183)
10 Baxter Family Market POINT (-93.151465 41.826715)
11 Beecher Liquor / Dubuque POINT (-90.696886 42.500775000000004)
12 Beer on Floyd / Sioux City POINT (-96.372185 42.531448000000005)
13 Beer Thirty Denison POINT (-95.360162 42.012412)
14 Beer Thirty Storm Lake / Storm Lake POINT (-95.198584 42.646794)
15 Beer Thirty Storm Lake / Storm Lake POINT (-95.19941700000001 42.647498)
16 Beer Thirty Storm Lake / Storm Lake POINT (-95.19941700000001 42.647498)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.