简体   繁体   English

熊猫将重复的列转换为行

[英]Pandas converting repeated columns as rows

I have a dataframe like this with repeating column names: ID is loaded as index我有一个像这样重复列名的数据框:ID 作为索引加载

          JANUARY         FEBRUARY        MARCH 
  ID    Sales   Revenue Sales   Revenue Sales   Revenue
  03    10.00   5.00    0.00    0.00    10.00   19.00
  05    20.00   20.00   20.00   20.00   20.00   20.00
  06    30.00   30.00   30.00   30.00   30.00   30.00
  07    30.00   30.00   30.00   30.00   30.00   30.00

I want to show it as below:我想显示如下:

  ID    Sales   Revenue
  03    10.00   5.00
  05    20.00   20.00
  06    30.00   30.00
  07    30.00   30.00
  03    0.00    0.00
  05    20.00   20.00
  06    30.00   30.00
  07    30.00   30.00
  03    10.00   19.00
  05    20.00   20.00
  06    30.00   30.00
  07    30.00   30.00

Currently I'm using, but expecting a better way.目前我正在使用,但期待更好的方法。 I have tried melt, but that's only for one column:我尝试过融化,但这仅适用于一列:

cols = df.columns.to_list()
for i in range(1, len(cols), 2):  # #Loading each month's data to the data frame
    sub_cols = cols[i:i + 2]
    sub_cols .insert(0, cols[0])
    sub_df = df.filter(sub_cols , axis=1)
    sub_df.columns = ['ID', 'Revenue', 'Sales']
    if i == 1:
        final_df = sub_df
    else:
        final_df = final_df.append(sub_df)

Here's another way to stack the columns.这是堆叠列的另一种方法。 Not sure if it's more efficient but it takes less code.不确定它是否更有效,但它需要更少的代码。

#        JANUARY         FEBRUARY        MARCH 
#  ID    Sales   Revenue Sales   Revenue Sales   Revenue
#  03    10.00   5.00    0.00    0.00    10.00   19.00
#  05    20.00   20.00   20.00   20.00   20.00   20.00
#  06    30.00   30.00   30.00   30.00   30.00   30.00
#  07    30.00   30.00   30.00   30.00   30.00   30.00

import pandas as pd
dd = {
'ID':['03','05','06','07'],
'Sales1':[10,20,30,30],
'Rev1':[5,20,30,30],
'Sales2':[0,20,30,30],
'Rev2':[0,20,30,30],
'Sales3':[10,20,30,30],
'Rev3':[19,20,30,30]
}

df = pd.DataFrame(dd)
print(df.to_string(index=False),'\n') # source dataframe

####################

dfnew = pd.DataFrame(columns = ['ID', 'Sales', 'Revenue'])  # new dataframe with all data
for c in range(1,len(df.columns),2):
   dftmp = df[['ID',df.columns[c],df.columns[c+1]]] # create df for each month
   dftmp.columns = ['ID', 'Sales', 'Revenue'] # must rename columns for append
   dfnew = dfnew.append(dftmp)  # append to stacked df

print(dfnew.to_string(index=False))

Output输出

 ID  Sales1  Rev1  Sales2  Rev2  Sales3  Rev3
 03      10     5       0     0      10    19
 05      20    20      20    20      20    20
 06      30    30      30    30      30    30
 07      30    30      30    30      30    30

 ID Sales Revenue
 03    10       5
 05    20      20
 06    30      30
 07    30      30
 03     0       0
 05    20      20
 06    30      30
 07    30      30
 03    10      19
 05    20      20
 06    30      30
 07    30      30

Pandas lreshape did the trick for me. Pandas lreshape 帮我解决了这个问题。

df = pd.lreshape(df, 
{'Sales': file_df.columns[file_df.columns.str.match(r'^Sales\.?\d?')],
'Revenue': file_df.columns[file_df.columns.str.match(r'^Revenue\.?\d?')]})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM