简体   繁体   中英

Transform rows to columns by the values of two rows in pandas

I have a large dataset which has two columns Name, Value and it looks like this:

import pandas as pd
data = [['code',10],['classe',12],['series','B'], ['code',12],['classe',1],
['series','C'],['code',16],['classe',18],['series','A']]
df1 = pd.DataFrame(data,columns=['Name','Value'])
df1

Output

    Name    Value
0   code    10
1   classe  12
2   series  B
3   code    12
4   classe  1
5   series  C
6   code    16
7   classe  18
8   series  A

And I want some thing like that:

    code  classe series
0   10      10    B
1   12      1     C
2   16      18    A

In my dataset it reapts N time and i want to transform it to three columns code, classe, series .

Thanks for your help in advance!

You can accomplish this using .pivot

df2 = df1.pivot(columns='Name', values='Value')
pd.concat([df2[series].dropna().reset_index(drop=True) for series in df2], axis=1)

Output

  classe    code    series
0   12       10     B
1   1        12     C
2   18       16     A

More so, if you changed the ordered data, you still get the desired output:

import pandas as pd
data = [['code',10],['classe',12],['classe', 14], ['series','B'], ['series', 'C'], ['code',12],['classe',1],
['series','C'],['code',16],['classe',18],['series','A']]
df1 = pd.DataFrame(data,columns=['Name','Value'])
df1

    Name    Value
0   code    10
1   classe  12
2   classe  14 #Added classe
3   series  B
4   series  C  #Added Series
5   code    12
6   classe  1
7   series  C
8   code    16
9   classe  18
10  series  A

The output will be:

   classe   code    series
0   12       10      B
1   14       12      C
2   1        16      C
3   18      NaN      A

Option 1
pd.concat with a groupby should do it.

pd.concat([
     pd.Series(v.values, name=k) for k, v in df1.groupby('Name')['Value']
   ], 
   axis=1
)

  classe code series
0     12   10      B
1      1   12      C
2     18   16      A

Option 2
pivot
Flaky pivot hack, don't count on it! This solution assumes values inside Name alternate regularly - code, classe, series, code, classe, series, ... and so on. Won't work otherwise.

df1.assign(Index=df1.index // 3).pivot('Index', 'Name', 'Value')

Name  classe code series
Index                   
0         12   10      B
1          1   12      C
2         18   16      A

create a new key by using cumsum , then unstack

df1['new']=(df1.Name=='code').cumsum()
df1.set_index(['new','Name']).Value.unstack()
Out[80]: 
Name classe code series
new                    
1        12   10      B
2         1   12      C
3        18   16      A

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM