简体   繁体   English

同时填充pandas数据帧中相关列中的缺失值

[英]Simultaneously fill missing values in related columns in pandas dataframe

I have a dataframe with two columns State and Code, with missing values in each. 我有一个包含State和Code两列的数据框,每个列都有缺失值。

import pandas as pd

df = pd.DataFrame([['Alabama', 'AL'], ['Alaska', 'AK'], ['Arizona', 'AZ'], ['Arkansas', 'AR'], ['Iowa','IA'],['Hawaii','HI'], ['Idaho', 'ID'], ['Alabama', ''], ['', 'IA'], ['Alaska',''], ['', 'AZ']], columns=['State', 'Code'])

Missing values 缺少价值观

    State   Code
7   Alabama     
8             IA
9   Alaska  
10            AZ

What I've tried 我试过的

state_code_dict = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'Iowa':'IA',
    'Hawaii':'HI',
    'Idaho': 'ID',    
}

def state_code(x):
    if (x['Code'] == ''):
        return state_code_dict[x['State']]
    else:
        return x['Code']

df['Code'] = df.apply(lambda x: state_code(x), axis=1)

This sets the missing values in Code. 这会在Code中设置缺失值。 I need to update this function for setting State as well. 我还需要更新此功能以设置State。 I'm looking to simplify this. 我正在寻求简化这一点。

Required output 要求的输出

    State   Code
7   Alabama   AL
8   Iowa      IA
9   Alaska    AK
10  Arizona   AZ

IIUC, you can use map to first map codes and then states, using boolean masking to just assign values when you have empty values 在IIUC中,您可以使用map来首先映射代码,然后使用布尔掩码在空值时分配值

mask = df.Code == ''
df.loc[mask, 'Code'] = df[mask].State.map(state_code_dict)

mask = df.State == ''
df.loc[mask, 'State'] = df[mask].Code.map({v:k for k,v in state_code_dict.items()})

    State   Code
0   Alabama AL
1   Alaska  AK
2   Arizona AZ
3   Arkansas    AR
4   Iowa    IA
5   Hawaii  HI
6   Idaho   ID
7   Alabama AL
8   Iowa    IA
9   Alaska  AK
10  Arizona AZ

You can replace blank strings with np.nan and then use fillna with pd.Series.map . 您可以替换空白字符串np.nan然后用fillnapd.Series.map Similar idea to @RafaelC but implemented differently. 与@ RafaelC类似的想法,但实现方式不同。

code_state_dict = {v: k for k, v in state_code_dict.items()}

df.replace('', np.nan, inplace=True)
df['Code'].fillna(df['State'].map(state_code_dict), inplace=True)
df['State'].fillna(df['Code'].map(code_state_dict), inplace=True)

print(df)

       State Code
0    Alabama   AL
1     Alaska   AK
2    Arizona   AZ
3   Arkansas   AR
4       Iowa   IA
5     Hawaii   HI
6      Idaho   ID
7    Alabama   AL
8       Iowa   IA
9     Alaska   AK
10   Arizona   AZ

To fill in the codes 填写代码

df['Code'] = df.apply(lambda x: x['Code'] if x['Code']!='' else state_code_dict[x['State']],axis=1)

To fill in the states 填写州

state_code_dict2 = {v: k for k, v in state_code_dict.items()}
df['State'] = df.apply(lambda x: x['State'] if x['State']!='' else state_code_dict2[x['Code']],axis=1)

Similar question to Filling a series based on key value pairs 类似于填写基于键值对的系列的问题

Using your data: 使用您的数据:

(df.replace('', np.nan)
  .sort_values(by=['State', 'Code'], ascending=False)
  .groupby('State').ffill().bfill()
  .groupby('Code').ffill().bfill())

Output: 输出:

    Code    State
4   IA  Iowa
6   ID  Idaho
5   HI  Hawaii
3   AR  Arkansas
2   AZ  Arizona
1   AK  Alaska
9   AK  Alaska
0   AL  Alabama
7   AL  Alabama
8   IA  Iowa
10  AZ  Arizona

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM