简体   繁体   English

通过拆分每列并迭代 python pandas DataFrame 中的许多列来插入新列

[英]inserting new columns by splitting each column and iterating for many columns in python pandas DataFrame

Here I have an example dataframe:这里我有一个例子dataframe:

dfx = pd.DataFrame({
        'name': ['alex','bob','jack'],
        'age': ["0,26,4","1,25,4","5,30,2"],
        'job': ["x,abc,0","y,xyz,1","z,pqr,2"],
        'gender': ["0,1","0,1","0,1"]            
    })

I want to first split column dfx['age'] and insert 3 separate columns for it, one for each substring in age value, naming them dfx['age1'] , dfx['age2'] , dfx['age3'] .我想先拆分列dfx['age']并为其插入 3 个单独的列,每个 substring 的年龄值各一个,将它们命名为dfx['age1']dfx['age2']dfx['age3'] . I used following code for this:我为此使用了以下代码:

dfx = dfx.assign(**{'age1':(dfx['age'].str.split(',', expand = True)[0]),\
         'age2':(dfx['age'].str.split(',', expand = True)[1]),\
         'age3':(dfx['age'].str.split(',', expand = True)[2])})
dfx = dfx[['name', 'age','age1', 'age2', 'age3', 'job', 'gender']]
dfx   

So far so good!到目前为止,一切都很好!

Now, I want to repeat the same operations to other columns job and gender .现在,我想对其他列jobgender重复相同的操作。

Desired Output所需 Output

   name     age age1 age2 age3      job job1 job2 job3 gender gender1 gender2
0  alex  0,26,4    0   26    4  x,abc,0    x  abc    0    0,1       0       1
1   bob  1,25,4    1   25    4  y,xyz,1    y  xyz    1    0,1       0       1
2  jack  5,30,2    5   30    2  z,pqr,2    z  pqr    2    0,1       0       1

   

I have no problem doing it individually for small data frame like this.对于像这样的小数据框,我没有问题单独做。 But, the actual datafile has many such columns.但是,实际的数据文件有很多这样的列。 I need iterations .我需要迭代

I found difficulty in iteration over columns, and naming the individual columns.我发现对列进行迭代和命名各个列的困难。

I would be very glad to have better solution for it.我很高兴有更好的解决方案。

Thanks !谢谢 !

Use list comprehension for splitting columns defined in list for list of DataFrames, add filtered columns and join together by concat with sorting columns names, then prepend not matched columns by DataFrame.join :使用列表推导来拆分列表中定义的列以获取 DataFrames 列表,添加过滤列并通过concat与排序列名称连接在一起,然后通过DataFrame.join在不匹配的列前面添加:

cols = ['age','job','gender']

L = [dfx[x].str.split(',',expand=True).rename(columns=lambda y: f'{x}{y+1}') for x in cols]

df1 = dfx[dfx.columns.difference(cols)]
df = df1.join(pd.concat([dfx[cols]] + L, axis=1).sort_index(axis=1))
print (df)
   name     age age1 age2 age3 gender gender1 gender2      job job1 job2 job3
0  alex  0,26,4    0   26    4    0,1       0       1  x,abc,0    x  abc    0
1   bob  1,25,4    1   25    4    0,1       0       1  y,xyz,1    y  xyz    1
2  jack  5,30,2    5   30    2    0,1       0       1  z,pqr,2    z  pqr    2

Thanks again @jezrael for your answer.再次感谢@jezrael 的回答。 Being inspired by the use of 'f-string' I have solved the problem using iteration as follows:受到使用“f-string”的启发,我使用迭代解决了这个问题,如下所示:

for col in dfx.columns[1:]:
for i in range(len(dfx[col][0].split(','))):
    dfx[f'{col}{i+1}'] = (dfx[col].str.split(',', expand = True)[i])
    
dfx = dfx[['name', 'age','age1', 'age2', 'age3', 'job','job1', 'job2','job3', 'gender' 
, 'gender1', 'gender2']]
    
dfx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM