[英]Simultaneously fill missing values in related columns in pandas dataframe
I have a dataframe with two columns State and Code, with missing values in each. 我有一个包含State和Code两列的数据框,每个列都有缺失值。
import pandas as pd
df = pd.DataFrame([['Alabama', 'AL'], ['Alaska', 'AK'], ['Arizona', 'AZ'], ['Arkansas', 'AR'], ['Iowa','IA'],['Hawaii','HI'], ['Idaho', 'ID'], ['Alabama', ''], ['', 'IA'], ['Alaska',''], ['', 'AZ']], columns=['State', 'Code'])
Missing values 缺少价值观
State Code
7 Alabama
8 IA
9 Alaska
10 AZ
What I've tried 我试过的
state_code_dict = {
'Alabama': 'AL',
'Alaska': 'AK',
'Arizona': 'AZ',
'Arkansas': 'AR',
'Iowa':'IA',
'Hawaii':'HI',
'Idaho': 'ID',
}
def state_code(x):
if (x['Code'] == ''):
return state_code_dict[x['State']]
else:
return x['Code']
df['Code'] = df.apply(lambda x: state_code(x), axis=1)
This sets the missing values in Code. 这会在Code中设置缺失值。 I need to update this function for setting State as well. 我还需要更新此功能以设置State。 I'm looking to simplify this. 我正在寻求简化这一点。
Required output 要求的输出
State Code
7 Alabama AL
8 Iowa IA
9 Alaska AK
10 Arizona AZ
IIUC, you can use map
to first map codes and then states, using boolean masking to just assign values when you have empty values 在IIUC中,您可以使用map
来首先映射代码,然后使用布尔掩码在空值时分配值
mask = df.Code == ''
df.loc[mask, 'Code'] = df[mask].State.map(state_code_dict)
mask = df.State == ''
df.loc[mask, 'State'] = df[mask].Code.map({v:k for k,v in state_code_dict.items()})
State Code
0 Alabama AL
1 Alaska AK
2 Arizona AZ
3 Arkansas AR
4 Iowa IA
5 Hawaii HI
6 Idaho ID
7 Alabama AL
8 Iowa IA
9 Alaska AK
10 Arizona AZ
You can replace blank strings with np.nan
and then use fillna
with pd.Series.map
. 您可以替换空白字符串np.nan
然后用fillna
与pd.Series.map
。 Similar idea to @RafaelC but implemented differently. 与@ RafaelC类似的想法,但实现方式不同。
code_state_dict = {v: k for k, v in state_code_dict.items()}
df.replace('', np.nan, inplace=True)
df['Code'].fillna(df['State'].map(state_code_dict), inplace=True)
df['State'].fillna(df['Code'].map(code_state_dict), inplace=True)
print(df)
State Code
0 Alabama AL
1 Alaska AK
2 Arizona AZ
3 Arkansas AR
4 Iowa IA
5 Hawaii HI
6 Idaho ID
7 Alabama AL
8 Iowa IA
9 Alaska AK
10 Arizona AZ
To fill in the codes 填写代码
df['Code'] = df.apply(lambda x: x['Code'] if x['Code']!='' else state_code_dict[x['State']],axis=1)
To fill in the states 填写州
state_code_dict2 = {v: k for k, v in state_code_dict.items()}
df['State'] = df.apply(lambda x: x['State'] if x['State']!='' else state_code_dict2[x['Code']],axis=1)
Similar question to Filling a series based on key value pairs 类似于填写基于键值对的系列的问题
Using your data: 使用您的数据:
(df.replace('', np.nan)
.sort_values(by=['State', 'Code'], ascending=False)
.groupby('State').ffill().bfill()
.groupby('Code').ffill().bfill())
Output: 输出:
Code State
4 IA Iowa
6 ID Idaho
5 HI Hawaii
3 AR Arkansas
2 AZ Arizona
1 AK Alaska
9 AK Alaska
0 AL Alabama
7 AL Alabama
8 IA Iowa
10 AZ Arizona
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.