简体   繁体   中英

Wide to long multiple rows and only two variables

I've searching but haven't find the answer. I have the next dataframe

                       Pais  Anio Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad
0                    NaN   NaN           No           No           No           No           No           Si           Si           Si           Si           Si        Total
1                    NaN   NaN        Rural        Rural       Urbana       Urbana     Total No        Rural        Rural       Urbana       Urbana     Total Si     Total Si
2                    NaN   NaN       Hombre        Mujer       Hombre        Mujer          NaN       Hombre        Mujer       Hombre        Mujer          NaN          NaN
3              Argentina  2015          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
4                Bolivia  2014       513160       462745        24457        25959      1026321      1187340      1243921      3554853      3686894      9673008     10699329
5                 Brasil  2015       287373       216447        28718        15895       548433     15898153     14545231     81355185     88432517    200231086    200779519
6                  Chile  2011        20192        16702         8752         7090        52736      1054604      1053353      6936960      7749581     16794498     16847234

And this is the desired output:

在此处输入图像描述

I put the desired output in an image because it was a lot of data, i managed to get only one melt, but I need to also "melt" the Yes/No, Zone and Gender Rows...

my code is this:

df1 = df.iloc[0:3] # select three first rows
df1 = df1.ffill(axis='columns') #Rellenando los grupos 
df = df.iloc[3:]
# my_dataframe = my_dataframe[my_dataframe.employee_name != 'chad']
df1 = df1.append(df) 
# moviendo primera fila a titulo de columna
df1.columns = df1.iloc[0]
df1 = df1.reindex(df1.index.drop(0)).reset_index(drop=True)
df1.columns.name = None
df1 = pd.melt(df1, id_vars=["Pais", "Anio"], var_name="Pregunta")

Suggestions and advice on this is appreciated!!!

Instead your solution is possible use:

df.iloc[:3] = df.iloc[:3].ffill(axis='columns') #Rellenando los grupos 

#MultiIndex by columns and first 3 rows
df.columns = [df.columns, 
              df.iloc[0], 
              df.iloc[1], 
              df.iloc[2]]


df = (df.iloc[3:] #remove first 3 rows
        .set_index(df.columns.tolist()[:2]) #first 2 cols to MultiIndex
        .rename_axis(['Pais','Anio']) #removed tuples names
        .unstack([0,1]) #reshape
        .rename_axis(['Pregunta','Respuesta','Zona','Sexo','Pais','Anio']) #levels names
        .sort_index(level=['Pais','Anio']) #sorting levels
        .reset_index(name='Total') #Series to DataFrame
        .dropna(subset=['Anio']) #removed NaNs if in Anio column
        .assign(Anio = lambda x: x['Anio'].astype(int)) #Convert Anio to int
        .reindex(['Pais','Anio','Pregunta','Zona','Sexo','Respuesta','Total'],1) #order
    .dropna(subset=['Total']) #removed NaNs by Total column
    .assign(Total = lambda x: x['Total'].astype(int)) #convert Total to ints
    )
print (df.head(10))
       Pais  Anio      Pregunta      Zona    Sexo Respuesta    Total
44  Bolivia  2014  Electricidad     Rural  Hombre        No   513160
45  Bolivia  2014  Electricidad     Rural   Mujer        No   462745
46  Bolivia  2014  Electricidad  Total No   Mujer        No  1026321
47  Bolivia  2014  Electricidad    Urbana  Hombre        No    24457
48  Bolivia  2014  Electricidad    Urbana   Mujer        No    25959
49  Bolivia  2014  Electricidad     Rural  Hombre        Si  1187340
50  Bolivia  2014  Electricidad     Rural   Mujer        Si  1243921
51  Bolivia  2014  Electricidad  Total Si   Mujer        Si  9673008
52  Bolivia  2014  Electricidad    Urbana  Hombre        Si  3554853
53  Bolivia  2014  Electricidad    Urbana   Mujer        Si  3686894

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM