简体   繁体   English

Python Pandas 将一组初始列融合为多个目标列

[英]Python Pandas Melt Groups of Initial Columns Into Multiple Target Columns

I have a need to melt groups of initial columns into multiple target columns in a dataset that is not normalized well.我需要将初始列的组融合到未规范化的数据集中的多个目标列中。 Here is an example (from this question pandas dataframe reshaping/stacking of multiple value variables into seperate columns ):这是一个示例(来自这个问题熊猫数据帧将多个值变量重塑/堆叠到单独的列中):

         des1 des2 des3 interval1 interval2 interval3
value   
aaa       a    b    c     ##1         ##2       ##3
bbb       d    e    f     ##4         ##5       ##6
ccc       g    h    i     ##7         ##8       ##9

I am trying to melt this into something like this orientation:我试图将其融入这样的方向:

         des      interval
value   
aaa       a         ##1
aaa       b         ##2
aaa       c         ##3
bbb       d         ##4
bbb       e         ##5
bbb       f         ##6
ccc       g         ##7
ccc       h         ##8
ccc       i         ##9

I was hoping to use melt instead of stack to avoid manually subsetting a lot of data.我希望使用melt而不是stack来避免手动设置大量数据的子集。 Here is what I have started out with thus far:到目前为止,这是我开始的内容:

import pandas as pd
import numpy as np
import fnmatch

column_list = list(df_initial.columns.values)

question_sources = [c for c in fnmatch.filter(column_list, "measure*question*source")]     
question_ranks = [c for c in fnmatch.filter(column_list, "measure*rank")]
question_targets = [c for c in fnmatch.filter(column_list, "measure*targeted")]
question_statuses = [c for c in fnmatch.filter(column_list, "measure*status")]

place = [c for c in fnmatch.filter(column_list, "place")]
measure_statuses = [c for c in fnmatch.filter(column_list, "measureInfo_status")]

starter_list = place + measure_statuses

df_gpro_melt_1 = (pd.melt(df_initial, id_vars=starter_list,      
                    value_vars=question_sources, var_name="question_sources", 
                    value_name="question_sources_values"))      

Is it possible to melt groups of initial columns into multiple target columns?是否可以将初始列组融合为多个目标列? Any advice is much appreciated.非常感谢任何建议。

This should work for your example, if your columns follow the pattern in your example data frame:如果您的列遵循示例数据框中的模式,这应该适用于您的示例:

pd.concat((pd.DataFrame({'des':df.iloc[:,i], 
                         'interval':df.iloc[:,i+3]}) 
             for i in range(3)))

If the pairs are different, you can use this pattern, but iterate through a list如果对不同,您可以使用此模式,但遍历列表

tuples = [(0,3),(1,4),(2,5)]

pd.concat((pd.DataFrame({'des':df.iloc[:,i], 
                          'interval':df.iloc[:,j]}) 
             for i,j in tuples))

I know this has been answered already, but:我知道这已经得到了回答,但是:

>>> df
      des1 des2 des3 interval1 interval2 interval3
value                                             
aaa      a    b    c       ##1       ##2       ##3
bbb      d    e    f       ##4       ##5       ##6
ccc      g    h    i       ##7       ##8       ##9

>>> pd.wide_to_long(df.reset_index(), ['des', 'interval'], i='value', j='id')
         des interval
value id             
aaa   1    a      ##1
bbb   1    d      ##4
ccc   1    g      ##7
aaa   2    b      ##2
bbb   2    e      ##5
ccc   2    h      ##8
aaa   3    c      ##3
bbb   3    f      ##6
ccc   3    i      ##9

Then just use .reset_index(level=1, drop=True) if you want to get rid of the id column.如果你想去掉 id 列.reset_index(level=1, drop=True)那么只需使用.reset_index(level=1, drop=True)

I guess I found an ugly way to do that!我想我找到了一种丑陋的方法来做到这一点!

In [12]: pd.DataFrame(
             data={'desc': df.values[..., 0:3].ravel(),
                   'interval':df.values[..., 3:6].ravel()},
             index = pd.np.ravel([[i]*3 for i in df.index]))
Out[12]: 
    desc interval
aaa    a      ##1
aaa    b      ##2
aaa    c      ##3
bbb    d      ##4
bbb    e      ##5
bbb    f      ##6
ccc    g      ##7
ccc    h      ##8
ccc    i      ##9

But i'm pretty sure there is more elegant way using some other functions like pandas.MultiIndex (to group your interval1, interval2 and interval3 columns in an "interval" levels) and/or pandas.melt (or maybe the stack method)但我很确定有更优雅的方法使用其他一些函数,如pandas.MultiIndex (将 interval1、interval2 和 interval3 列分组为“间隔”级别)和/或pandas.melt (或者stack方法)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM