Python Pandas 将一组初始列融合为多个目标列

Question

I have a need to melt groups of initial columns into multiple target columns in a dataset that is not normalized well.我需要将初始列的组融合到未规范化的数据集中的多个目标列中。 Here is an example (from this question pandas dataframe reshaping/stacking of multiple value variables into seperate columns ):这是一个示例（来自这个问题熊猫数据帧将多个值变量重塑/堆叠到单独的列中）：

         des1 des2 des3 interval1 interval2 interval3
value   
aaa       a    b    c     ##1         ##2       ##3
bbb       d    e    f     ##4         ##5       ##6
ccc       g    h    i     ##7         ##8       ##9

I am trying to melt this into something like this orientation:我试图将其融入这样的方向：

         des      interval
value   
aaa       a         ##1
aaa       b         ##2
aaa       c         ##3
bbb       d         ##4
bbb       e         ##5
bbb       f         ##6
ccc       g         ##7
ccc       h         ##8
ccc       i         ##9

I was hoping to use melt instead of stack to avoid manually subsetting a lot of data.我希望使用melt而不是stack来避免手动设置大量数据的子集。 Here is what I have started out with thus far:到目前为止，这是我开始的内容：

import pandas as pd
import numpy as np
import fnmatch

column_list = list(df_initial.columns.values)

question_sources = [c for c in fnmatch.filter(column_list, "measure*question*source")]     
question_ranks = [c for c in fnmatch.filter(column_list, "measure*rank")]
question_targets = [c for c in fnmatch.filter(column_list, "measure*targeted")]
question_statuses = [c for c in fnmatch.filter(column_list, "measure*status")]

place = [c for c in fnmatch.filter(column_list, "place")]
measure_statuses = [c for c in fnmatch.filter(column_list, "measureInfo_status")]

starter_list = place + measure_statuses

df_gpro_melt_1 = (pd.melt(df_initial, id_vars=starter_list,      
                    value_vars=question_sources, var_name="question_sources", 
                    value_name="question_sources_values"))

Is it possible to melt groups of initial columns into multiple target columns?是否可以将初始列组融合为多个目标列？ Any advice is much appreciated.非常感谢任何建议。

Answer 1

This should work for your example, if your columns follow the pattern in your example data frame:如果您的列遵循示例数据框中的模式，这应该适用于您的示例：

pd.concat((pd.DataFrame({'des':df.iloc[:,i], 
                         'interval':df.iloc[:,i+3]}) 
             for i in range(3)))

If the pairs are different, you can use this pattern, but iterate through a list如果对不同，您可以使用此模式，但遍历列表

tuples = [(0,3),(1,4),(2,5)]

pd.concat((pd.DataFrame({'des':df.iloc[:,i], 
                          'interval':df.iloc[:,j]}) 
             for i,j in tuples))

Answer 2

I know this has been answered already, but:我知道这已经得到了回答，但是：

>>> df
      des1 des2 des3 interval1 interval2 interval3
value                                             
aaa      a    b    c       ##1       ##2       ##3
bbb      d    e    f       ##4       ##5       ##6
ccc      g    h    i       ##7       ##8       ##9

>>> pd.wide_to_long(df.reset_index(), ['des', 'interval'], i='value', j='id')
         des interval
value id             
aaa   1    a      ##1
bbb   1    d      ##4
ccc   1    g      ##7
aaa   2    b      ##2
bbb   2    e      ##5
ccc   2    h      ##8
aaa   3    c      ##3
bbb   3    f      ##6
ccc   3    i      ##9

Then just use .reset_index(level=1, drop=True) if you want to get rid of the id column.如果你想去掉 id 列.reset_index(level=1, drop=True)那么只需使用.reset_index(level=1, drop=True) 。

Answer 3

I guess I found an ugly way to do that!我想我找到了一种丑陋的方法来做到这一点！

In [12]: pd.DataFrame(
             data={'desc': df.values[..., 0:3].ravel(),
                   'interval':df.values[..., 3:6].ravel()},
             index = pd.np.ravel([[i]*3 for i in df.index]))
Out[12]: 
    desc interval
aaa    a      ##1
aaa    b      ##2
aaa    c      ##3
bbb    d      ##4
bbb    e      ##5
bbb    f      ##6
ccc    g      ##7
ccc    h      ##8
ccc    i      ##9

But i'm pretty sure there is more elegant way using some other functions like pandas.MultiIndex (to group your interval1, interval2 and interval3 columns in an "interval" levels) and/or pandas.melt (or maybe the stack method)但我很确定有更优雅的方法使用其他一些函数，如pandas.MultiIndex （将 interval1、interval2 和 interval3 列分组为“间隔”级别）和/或pandas.melt （或者stack方法）

Python Pandas 将一组初始列融合为多个目标列

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-02-03 21:58:06

解决方案2
2 2018-03-14 11:28:56

解决方案3
0 2016-02-03 22:38:42

Python Pandas 将一组初始列融合为多个目标列

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-02-03 21:58:06

解决方案2 2 2018-03-14 11:28:56

解决方案3 0 2016-02-03 22:38:42

解决方案1
2 已采纳 2016-02-03 21:58:06

解决方案2
2 2018-03-14 11:28:56

解决方案3
0 2016-02-03 22:38:42