简体   繁体   中英

Split Comma seperated values into multiple columns and an empty column next to it to map the name- Pandas

I have a columns with numbers separated by comma now the values should be split into new columns.

 Site       UserId
   ABC           '456,567,67,96'
   DEF           '67,987'
 

The new Dataframe should look like:

Site     UserID              UserId1  UserId2  UserId3  UserId4
ABC     '456,567,67,96'      456       567      67        96
DEF     '67,987'             67        987
POC     '4321,96,912         4321      87       912  

Also an empty column next to each column to map the numbers with the name. user

 UserId UserName         Phone No 
  4321   EB_Meter         9980688666
    987    EB_Meter987    9255488721 
    912    DG_Meter912    8897634219
    567    Ups_Meter567   7263193155 
    456    Ups_Meter456   8987222112 
    96     DG_Meter96     
    67     DGB_Meter

So the final DataFrame is:

  Values              Value1  Name1            Phone1         Value2   Name2         Value3 Name3      Value4 Name4
 '456,567,67,96'       456    Ups_Meter456       8987222112    567      Ups_Meter567      67     DGB_Meter   96   DG_Meter96
    '67,987'            67    DGB_Meter                        987      EB_Meter987
    '4321,96,912        4321    EB_Meter          9980688666    96       DG_Meter96    912    DG_Meter912

Use Series.str.strip with Series.str.split for new DataFrame :

df = df1['UserID'].str.strip("'").str.split(',',expand=True)
print (df)
      0    1     2     3
0   456  567    67    96
1    67  987  None  None
2  4321   96   912  None

Then convert df2['UserId'] for strings for mapping data reshaped by DataFrame.stack with Series.map , then reshape back to DataFrame by Series.unstack :

df2['UserId'] = df2['UserId'].astype(str)
s = df2.set_index('UserId')['UserName']
df3 = df.stack(dropna=False).map(s).unstack()
print (df3)
              0             1            2           3
0  Ups_Meter456  Ups_Meter567    DGB_Meter  DG_Meter96
1     DGB_Meter   EB_Meter987          NaN         NaN
2      EB_Meter    DG_Meter96  DG_Meter912         NaN

Join together by concat with change order of columns in MultiIndex by DataFrame.sort_index , last flatten MultiIndex in list comprehension with f-string s and add column df1[['UserID']] by DataFrame.join :

df = (pd.concat([df, df3], axis=1, keys=('Value','Name'))
        .sort_index(axis=1, level=[1,0], ascending=[True, False]))
df.columns = [f'{x}{y+1}' for x, y in df.columns]
df = df1.join(df)
print (df)
          UserID Value1         Name1 Value2         Name2 Value3  \
0  456,567,67,96    456  Ups_Meter456    567  Ups_Meter567     67   
1         67,987     67     DGB_Meter    987   EB_Meter987   None   
2    4321,96,912   4321      EB_Meter     96    DG_Meter96    912   

         Name3 Value4       Name4  
0    DGB_Meter     96  DG_Meter96  
1          NaN   None         NaN  
2  DG_Meter912   None         NaN  

If necessary replace None/NaN s to empty strings by DataFrame.fillna :

df = df.fillna('')
print (df)

          UserID Value1         Name1 Value2         Name2 Value3  \
0  456,567,67,96    456  Ups_Meter456    567  Ups_Meter567     67   
1         67,987     67     DGB_Meter    987   EB_Meter987          
2    4321,96,912   4321      EB_Meter     96    DG_Meter96    912   

         Name3 Value4       Name4  
0    DGB_Meter     96  DG_Meter96  
1                                  
2  DG_Meter912                     

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM