[英]Filling in missing values in a column based on values from another column
I have a df
that has 3 columns: recnum
, state
, and zip_code
.我有一个包含 3 列的
df
: recnum
、 state
和zip_code
。 The state
column is missing values and I want to fill them in based on matching zip_code
. state
列缺少值,我想根据匹配的zip_code
填充它们。 I have tried using .ffill
with no luck.我试过使用
.ffill
没有运气。 Below is a sample of how the df
currently looks and what I want the output to look like.下面是
df
当前外观的示例以及我希望 output 的外观。 Any assistance would be greatly appreciated.任何帮助将不胜感激。
recnum: int64 state: string zip: float64 recnum:int64 state:字符串 zip:float64
Current df
当前
df
recnum![]() |
state ![]() |
zip_code![]() |
---|---|---|
1 ![]() |
AL![]() |
11111 ![]() |
2 ![]() |
CO![]() |
22222 ![]() |
3 ![]() |
TX![]() |
33333 ![]() |
4 ![]() |
NaN![]() |
11111 ![]() |
5 ![]() |
AL![]() |
11111 ![]() |
6 ![]() |
CO![]() |
22222 ![]() |
7 ![]() |
TX![]() |
33333 ![]() |
8 ![]() |
NaN![]() |
22222 ![]() |
Desired Output:所需的 Output:
recnum![]() |
state ![]() |
zip_code![]() |
---|---|---|
1 ![]() |
AL![]() |
11111 ![]() |
2 ![]() |
CO![]() |
22222 ![]() |
3 ![]() |
TX![]() |
33333 ![]() |
4 ![]() |
AL![]() |
11111 ![]() |
5 ![]() |
AL![]() |
11111 ![]() |
6 ![]() |
CO![]() |
22222 ![]() |
7 ![]() |
TX![]() |
33333 ![]() |
8 ![]() |
CO![]() |
22222 ![]() |
Try grouping by zip_code then using ffill, bfill to fill out the nans:尝试按 zip_code 分组,然后使用 ffill、bfill 填写 nans:
import numpy as np
import pandas as pd
df = pd.DataFrame({'recnum': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
'state': {0: 'AL', 1: 'CO', 2: 'TX', 3: np.nan, 4: 'AL',
5: 'CO', 6: 'TX', 7: np.nan},
'zip_code': {0: 11111, 1: 22222, 2: 33333, 3: 11111,
4: 11111, 5: 22222, 6: 33333, 7: 22222}})
df['state'] = df.groupby('zip_code')['state'].ffill().bfill()
print(df)
df
: df
:
recnum state zip_code
0 1 AL 11111
1 2 CO 22222
2 3 TX 33333
3 4 AL 11111
4 5 AL 11111
5 6 CO 22222
6 7 TX 33333
7 8 CO 22222
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.