inserting new columns by splitting each column and iterating for many columns in python pandas DataFrame

Question

Here I have an example dataframe:

dfx = pd.DataFrame({
        'name': ['alex','bob','jack'],
        'age': ["0,26,4","1,25,4","5,30,2"],
        'job': ["x,abc,0","y,xyz,1","z,pqr,2"],
        'gender': ["0,1","0,1","0,1"]            
    })

I want to first split column dfx['age'] and insert 3 separate columns for it, one for each substring in age value, naming them dfx['age1'] , dfx['age2'] , dfx['age3'] . I used following code for this:

dfx = dfx.assign(**{'age1':(dfx['age'].str.split(',', expand = True)[0]),\
         'age2':(dfx['age'].str.split(',', expand = True)[1]),\
         'age3':(dfx['age'].str.split(',', expand = True)[2])})
dfx = dfx[['name', 'age','age1', 'age2', 'age3', 'job', 'gender']]
dfx

So far so good!

Now, I want to repeat the same operations to other columns job and gender .

Desired Output

   name     age age1 age2 age3      job job1 job2 job3 gender gender1 gender2
0  alex  0,26,4    0   26    4  x,abc,0    x  abc    0    0,1       0       1
1   bob  1,25,4    1   25    4  y,xyz,1    y  xyz    1    0,1       0       1
2  jack  5,30,2    5   30    2  z,pqr,2    z  pqr    2    0,1       0       1

I have no problem doing it individually for small data frame like this. But, the actual datafile has many such columns. I need iterations .

I found difficulty in iteration over columns, and naming the individual columns.

I would be very glad to have better solution for it.

Thanks !

Answer 1

Use list comprehension for splitting columns defined in list for list of DataFrames, add filtered columns and join together by concat with sorting columns names, then prepend not matched columns by DataFrame.join :

cols = ['age','job','gender']

L = [dfx[x].str.split(',',expand=True).rename(columns=lambda y: f'{x}{y+1}') for x in cols]

df1 = dfx[dfx.columns.difference(cols)]
df = df1.join(pd.concat([dfx[cols]] + L, axis=1).sort_index(axis=1))
print (df)
   name     age age1 age2 age3 gender gender1 gender2      job job1 job2 job3
0  alex  0,26,4    0   26    4    0,1       0       1  x,abc,0    x  abc    0
1   bob  1,25,4    1   25    4    0,1       0       1  y,xyz,1    y  xyz    1
2  jack  5,30,2    5   30    2    0,1       0       1  z,pqr,2    z  pqr    2

Answer 2

Thanks again @jezrael for your answer. Being inspired by the use of 'f-string' I have solved the problem using iteration as follows:

for col in dfx.columns[1:]:
for i in range(len(dfx[col][0].split(','))):
    dfx[f'{col}{i+1}'] = (dfx[col].str.split(',', expand = True)[i])
    
dfx = dfx[['name', 'age','age1', 'age2', 'age3', 'job','job1', 'job2','job3', 'gender' 
, 'gender1', 'gender2']]
    
dfx

inserting new columns by splitting each column and iterating for many columns in python pandas DataFrame

Question

2 answers

solution1
2 ACCPTED 2021-05-25 07:03:05

solution2
0 2021-05-25 08:51:00

inserting new columns by splitting each column and iterating for many columns in python pandas DataFrame

Question

2 answers

solution1 2 ACCPTED 2021-05-25 07:03:05

solution2 0 2021-05-25 08:51:00

solution1
2 ACCPTED 2021-05-25 07:03:05

solution2
0 2021-05-25 08:51:00