简体   繁体   中英

Handling multiple column headers and same column names in csv - pandas/python

I have a csv file that looks like this

        PROD1   PROD1   PROD2   PROD2
        X         Y       X       Y
AA  A   1         2       9       10
BB  B   3         4       11      12
CC  C   5         6       13      14
DD  D   7         8       15      16

The output I am trying to get has to look like this

                X   Y
AA  A   PROD1   1   2
BB  B   PROD1   3   4
CC  C   PROD1   5   6
DD  D   PROD1   7   8
AA  A   PROD2   9   10
BB  B   PROD2   11  12
CC  C   PROD2   13  14
DD  D   PROD2   15  16

I tried transposing the csv read with

data=pd.read_csv('transposedata.csv', header=None).T

But then I lose column info. I also tried this from another solution provided here at stackoverflow

df = pd.read_csv('transposedata.csv', header=[0,1])
a = df.columns.get_level_values(0).to_series()
b = a.mask(a.str.startswith('Unnamed')).ffill().fillna('')
df.columns = [b, df.columns.get_level_values(1)]

I end up with

                                           PROD1    PROD2    
  Unnamed: 0_level_1 Unnamed: 1_level_1     X  Y     X   Y
0                 AA                  A     1  2     9  10
1                 BB                  B     3  4    11  12
2                 CC                  C     5  6    13  14
3                 DD                  D     7  8    15  16

Any Help?

update when I run the solution given

data=pd.read_csv('transposedata1.csv', header=[0,1]).stack(level=0).sort_index(level=1)

i get this

        Unnamed:0_level_1   Unnamed:1_level_1   X   Y
0   PROD1   NaN NaN 1   2
1   PROD1   NaN NaN 3   4
2   PROD1   NaN NaN 5   6
3   PROD1   NaN NaN 7   8
0   PROD2   NaN NaN 9   10
1   PROD2   NaN NaN 11  12
2   PROD2   NaN NaN 13  14
3   PROD2   NaN NaN 15  16
0   Unnamed:0_level_0   AA  NaN NaN NaN
1   Unnamed:0_level_0   BB  NaN NaN NaN
2   Unnamed:0_level_0   CC  NaN NaN NaN
3   Unnamed:0_level_0   DD  NaN NaN NaN
0   Unnamed:1_level_0   NaN A   NaN NaN
1   Unnamed:1_level_0   NaN B   NaN NaN
2   Unnamed:1_level_0   NaN C   NaN NaN
3   Unnamed:1_level_0   NaN D   NaN NaN

Thanks

You do not want to transpose the dataframe but stack one column level. Simply you must declare to pandas that the csv file has a 2 rows header:

data=pd.read_csv('transposedata.csv', header=[0,1]).stack(level=0).sort_index(level=2)

It should give:

             X   Y
AA A PROD1   1   2
BB B PROD1   3   4
CC C PROD1   5   6
DD D PROD1   7   8
AA A PROD2   9  10
BB B PROD2  11  12
CC C PROD2  13  14
DD D PROD2  15  16

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM