根据另一列的值填充一列中的缺失值

Question

I have a df that has 3 columns: recnum , state , and zip_code .我有一个包含 3 列的df ： recnum 、 state和zip_code 。 The state column is missing values and I want to fill them in based on matching zip_code . state列缺少值，我想根据匹配的zip_code填充它们。 I have tried using .ffill with no luck.我试过使用.ffill没有运气。 Below is a sample of how the df currently looks and what I want the output to look like.下面是df当前外观的示例以及我希望 output 的外观。 Any assistance would be greatly appreciated.任何帮助将不胜感激。

recnum: int64 state: string zip: float64 recnum：int64 state：字符串 zip：float64

Current df当前df

recnum收据	state state	zip_code邮政编码
1 1	AL铝	11111 11111
2 2	CO一氧化碳	22222 22222
3 3	TX德克萨斯州	33333 33333
4 4	NaN钠	11111 11111
5 5	AL铝	11111 11111
6 6	CO一氧化碳	22222 22222
7 7	TX德克萨斯州	33333 33333
8 8	NaN钠	22222 22222

Desired Output:所需的 Output：

recnum收据	state state	zip_code邮政编码
1 1	AL铝	11111 11111
2 2	CO一氧化碳	22222 22222
3 3	TX德克萨斯州	33333 33333
4 4	AL铝	11111 11111
5 5	AL铝	11111 11111
6 6	CO一氧化碳	22222 22222
7 7	TX德克萨斯州	33333 33333
8 8	CO一氧化碳	22222 22222

Answer 1

Try grouping by zip_code then using ffill, bfill to fill out the nans:尝试按 zip_code 分组，然后使用 ffill、bfill 填写 nans：

import numpy as np
import pandas as pd

df = pd.DataFrame({'recnum': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
                   'state': {0: 'AL', 1: 'CO', 2: 'TX', 3: np.nan, 4: 'AL',
                             5: 'CO', 6: 'TX', 7: np.nan},
                   'zip_code': {0: 11111, 1: 22222, 2: 33333, 3: 11111,
                                4: 11111, 5: 22222, 6: 33333, 7: 22222}})

df['state'] = df.groupby('zip_code')['state'].ffill().bfill()
print(df)

df : df ：

   recnum state  zip_code
0       1    AL     11111
1       2    CO     22222
2       3    TX     33333
3       4    AL     11111
4       5    AL     11111
5       6    CO     22222
6       7    TX     33333
7       8    CO     22222

根据另一列的值填充一列中的缺失值

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-11 23:11:53

根据另一列的值填充一列中的缺失值

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-11 23:11:53

解决方案1
1 已采纳 2021-05-11 23:11:53