简体   繁体   中英

Create multiple columns in pandas dataframe in single update

I have a dataframe as below:

df = pd.DataFrame({'Group': ['Fruit', 'Vegetable', 'Fruit', 'Vegetable', 'Fruit', 'Vegetable', 'Vegetable'],
                       'NId': ['Banana', 'Onion', 'Grapes', 'Potato', 'Apple', np.nan, np.nan],
                       'BName': [np.nan, 'GTwo', np.nan, 'GSix', np.nan, 'GOne', 'GNine'],
                       'BId': [np.nan, '5252', np.nan, '5678', np.nan, '5125', '5923']})
df['BId'] = df['BId'].astype(str)
df = df[['Group', 'NId', 'BName', 'BId']]

Which is the dataframe as below:

       Group     NId  BName   BId
0      Fruit  Banana    NaN   nan
1  Vegetable   Onion   GTwo  5252
2      Fruit  Grapes    NaN   nan
3  Vegetable  Potato   GSix  5678
4      Fruit   Apple    NaN   nan
5  Vegetable     NaN   GOne  5125
6  Vegetable     NaN  GNine  5923

And then I do below operations to create new columns as coded below:

df.loc[df['NId'].notna(), 'Cat'] = df[df['NId'].notna()].apply(lambda x: 'NId', axis=1)
df.loc[df['NId'].isna(), 'Cat'] = df[df['NId'].isna()].apply(lambda x: 'GId', axis=1)

df.loc[df['NId'].notna(), 'Id'] = df[df['NId'].notna()].apply(lambda x: str(x['NId']), axis=1)
df.loc[df['NId'].isna(), 'Id'] = df[df['NId'].isna()].apply(lambda x: x['BName'], axis=1)

df.loc[df['NId'].notna(), 'IdQ'] = df[df['NId'].notna()].apply(lambda x: 'NId:' + str(x['NId']), axis=1)
df.loc[df['NId'].isna(), 'IdQ'] = df[df['NId'].isna()].apply(lambda x: 'BId:' + x['BId'], axis=1)

Which produced the below output dataframe:

       Group     NId  BName   BId  Cat      Id         IdQ
0      Fruit  Banana    NaN   nan  NId  Banana  NId:Banana
1  Vegetable   Onion   GTwo  5252  NId   Onion   NId:Onion
2      Fruit  Grapes    NaN   nan  NId  Grapes  NId:Grapes
3  Vegetable  Potato   GSix  5678  NId  Potato  NId:Potato
4      Fruit   Apple    NaN   nan  NId   Apple   NId:Apple
5  Vegetable     NaN   GOne  5125  BId    GOne    BId:5125
6  Vegetable     NaN  GNine  5923  BId   GNine    BId:5923

I wanted to know if there is a way to combine these operations or there is better approach for the same. Basically what I am doing is Id is NId if not NaN else BName. Cat is NId if updated from NId else BId. And the IdQ column is combination of 'NId' + NId or 'BId' + BId depending upon the logic as coded above.

Use numpy.where :

mask = df['NId'].notna()
df['Cat'] = np.where(mask, 'NId','GId')
df['Id']  = np.where(mask, df['NId'].astype(str), df['BName'])
df['IdQ'] = np.where(mask, 'NId:' +  df['NId'].astype(str), 'BId:' + df['BId'])
print (df)
       Group     NId  BName   BId  Cat      Id         IdQ
0      Fruit  Banana    NaN   nan  NId  Banana  NId:Banana
1  Vegetable   Onion   GTwo  5252  NId   Onion   NId:Onion
2      Fruit  Grapes    NaN   nan  NId  Grapes  NId:Grapes
3  Vegetable  Potato   GSix  5678  NId  Potato  NId:Potato
4      Fruit   Apple    NaN   nan  NId   Apple   NId:Apple
5  Vegetable     NaN   GOne  5125  GId    GOne    BId:5125
6  Vegetable     NaN  GNine  5923  GId   GNine    BId:5923

You can use the assign function of pandas to assign multiple columns simultaneously

df1 = df[df['NId'].notna()].assign(Cat = lambda x: 'NId', Id = lambda x: df.NId, IdQ = lambda x: 'NId:' + df['NId'])
df1.append(df[df['NId'].isna()].assign(Cat = lambda x: 'GId', Id = lambda x: df.BName, IdQ = lambda x: 'BId:' + df['BId']))

    Group     NId    BName  BId   Cat   Id      IdQ
0   Fruit     Banana NaN    nan   NId   Banana  NId:Banana
1   Vegetable Onion  GTwo   5252  NId   Onion   NId:Onion
2   Fruit     Grapes NaN    nan   NId   Grapes  NId:Grapes
3   Vegetable Potato GSix   5678  NId   Potato  NId:Potato
4   Fruit     Apple  NaN    nan   NId   Apple   NId:Apple
5   Vegetable NaN    GOne   5125  GId   GOne    BId:5125
6   Vegetable NaN    GNine  5923  GId   GNine   BId:5923

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM