简体   繁体   中英

The efficient way to transform pandas dataframe into new format

I am currently making new dataframe using existing one.

Assume that we have dataframe looks like below

tt2 = pd.DataFrame(columns=['test','class'])

test = [1,2,3,4,1,2,3,4,4]
test_class = ['a','b','c','d','b','c','a','d','a']

tt2['test'] = test
tt2['class'] = test_class


    test class
0     1     a
1     2     b
2     3     c
3     4     d
4     1     b
5     2     c
6     3     a
7     4     d
8     4     a

Then, I want to transform this structure to

test class1 class2 class3
 1     a       b
 2     b       c
 3     c       a
 4     d       d      a

So, we generate new columns based on the maximum number of elements for the unique key value. Here "4" has 3 class so we make 3 new indexes

After then fill the numbers like stack.

I have tried using groupby method. But still doesn't figure out how to transform properly.

Will this work for you?

Use a groupby, with apply, then the series string methods, with the expand setting:

tt2 = pd.DataFrame(columns=['test','class'])

test = [1,2,3,4,1,2,3,4,4]
test_class = ['a','b','c','d','b','c','a','d','a']

tt2['test'] = test
tt2['class'] = test_class

result_df. = tt2.groupby('test').apply(lambda x: "-".join(x['class'])).str.split('-', expand=True)
result_df.columns = ['class' + str(int(col)+1) for col in result_df.columns]
print result_df

which gives

     class1 class2 class3
test                     
1         a      b   None
2         b      c   None
3         c      a   None
4         d      d      a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM