简体   繁体   中英

Create new columns according row values in pandas

I have a pandas dataframe that looks like this:

     id             name  total  cubierto  no_cubierto  escuela_id  nivel_id 
0   1        direccion      1         1            0   420000707         1   
1   2  frente_a_alunos      4         4            0   420000707         1   
2   3            apoyo      2         2            0   420000707         1   
3   4        direccion      2         2            0   840477414         2   
4   5  frente_a_alunos      8         8            0   840477414         2   
5   6            apoyo      4         3            1   840477414         2   
6   7        direccion      7         7            0   918751515         3   
7   8            apoyo     37        37            0   918751515         3   
8   9        direccion      1         1            0   993683216         1   
9  10  frente_a_alunos      7         7            0   993683216         1   

The column "name" has 3 unique values:

 - direccion
 - frente a alunos
 - apoyo

and I need to get a new dataframe, grouped by "escuela_id" and "nivel_id" that has the columns:

 - direccion_total
 - direccion_cubierto
 - frente_a_alunos_total
 - frente_a_alunos_cubierto
 - apoyo_total
 - apoyo_cubierto
 - escuela_id
 - nivel_id

getting the values from columns "total" and "cubierto" respectively. I don't need the column "no_cubierto" . Is it possible to do it with pandas functions? I am stucked on it and I couldn't find any solution.

The output for the example should look like this:

escuela_id      nivel_id   apoyo_cubierto   apoyo_total   direccion_total  
0   420000707         1              2           2                1   
1   840477414         2              3           4                2   
2   918751515         3             37          37                7   
3   993683216         1             ..          ..                1   


   direccion_cubierto    frente_a_alunos_total    frente_a_alunos_cubierto  
0                   1                     4                        4  
1                   2                     8                        8  
2                   7                    ..                       ..  
3                   1                     7                        7  

You need to use pivot_table here:

df = df.pivot_table(index=['escuela_id', 'nivel_id'], columns='name', values=['total', 'cubierto']).reset_index()
df.columns = ['_'.join(col).strip() for col in df.columns.values]
print(df)

Output:

   escuela_id_  nivel_id_  cubierto_apoyo  cubierto_direccion  cubierto_frente_a_alunos  total_apoyo  total_direccion  total_frente_a_alunos
0    420000707          1             2.0                 1.0                       4.0          2.0              1.0                    4.0
1    840477414          2             3.0                 2.0                       8.0          4.0              2.0                    8.0
2    918751515          3            37.0                 7.0                       NaN         37.0              7.0                    NaN
3    993683216          1             NaN                 1.0                       7.0          NaN              1.0                    7.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM