简体   繁体   English

用多列组重塑 Pandas Dataframe

[英]Reshape Pandas Dataframe with multiple column groups

I currently have a wide dataframe that looks like this:我目前有一个宽 dataframe 看起来像这样:

Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
   0   r  1   10    15    17    100   105   107
   1   r  2   20    25    27    110   115   117
   2   s  1   30    35    37    120   125   127
   3   s  2   40    45    47    130   135   137

Each entry has multiple identifier columns (ID1 and ID2).每个条目都有多个标识符列(ID1 和 ID2)。 I then have two separate categories of measurements (Foc and Sat) each of which contains multiple identifiers (A, B, C)(the category identifiers are shared between categories).然后,我有两个单独的测量类别(Foc 和 Sat),每个类别都包含多个标识符(A、B、C)(类别标识符在类别之间共享)。 I'm eventually needing to plot it in a facet_grid with x and y as each category of measurement and separated by category identifier, so I'm trying to reshape it so that it looks like this:我最终需要 plot 它在 facet_grid 中,x 和 y 作为每个测量类别并由类别标识符分隔,所以我试图重塑它,使其看起来像这样:

Index ID1 ID2 Ch  Foc  Sat 
   0   r  1   A   10   100 
   1   r  1   B   15    105
   2   r  1   C   17    107
   3   r  2   A   20    110 
   4   r  2   B   25    115 
   5   r  2   C   27    117 
   6   s  1   A   30    120 
   7   s  1   B   35    125
   8   s  1   C   37    127

I've been trying.melt, .pivot, and.stack but not understanding what I'm doing well enough to make headway.我一直在尝试.melt、.pivot 和.stack,但不了解我做得很好以取得进展。

You are thinking in the right way.你正在以正确的方式思考。 You can do:你可以做:

# melt the dataframe
d1 = df.set_index(['Index', 'ID1', 'ID2']).stack().reset_index()

# create separate column
d1[['flag', 'Ch']] = d1['level_3'].str.split('_', expand=True)
d1 = d1.drop('level_3', 1)
d1 = d1.rename(columns = {0: 'Foc'})

# expand the dataframe to wide
d2 = pd.pivot_table(d1, index=['Index', 'ID1', 'ID2', 'Ch'], columns=['flag']).reset_index()

# fix column names
d2.columns = ['Index', 'ID1', 'ID2', 'Ch', 'Foc', 'Sat']

print(d2.head())

   Index ID1  ID2 Ch  Foc  Sat
0      0   r    1  A   10  100
1      0   r    1  B   15  105
2      0   r    1  C   17  107
3      1   r    2  A   20  110
4      1   r    2  B   25  115

I'd set ID columns to the index, split and expand the columns on the '_' character, then stack the dataframe:我将 ID 列设置为索引,拆分并展开“_”字符上的列,然后堆叠 dataframe:

from io import StringIO
import pandas

datafile = StringIO("""\
Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
   0   r  1   10    15    17    100   105   107
   1   r  2   20    25    27    110   115   117
   2   s  1   30    35    37    120   125   127
   3   s  2   40    45    47    130   135   137
""")

(
    pandas.read_csv(datafile, sep='\s+')
        .set_index(['ID1', 'ID2'])
        .drop(columns='Index')
        .pipe(lambda df:
             df.set_axis(
                 df.columns.str.split('_', expand=True),
                 axis="columns"
            )
        )
        .rename_axis([None, 'Ch'], axis='columns')
        .stack(level='Ch')
        .reset_index()
)

And that give me:这给了我:

   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

Let us do wide_to_long让我们做wide_to_long

df = pd.wide_to_long(df,['Foc','Sat'],i=['ID1','ID2'],j='Ch',sep='_',suffix='\w+').reset_index()
Out[168]: 
   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

you could also achieve this with .melt , .groupby and np.where :你也可以使用.melt.groupbynp.where实现:

df = pd.melt(df, id_vars=['ID1','ID2','Foc_A', 'Foc_B', 'Foc_C'], var_name='Ch', value_name='Sat') \
    .groupby(['ID1','ID2','Ch']).agg({'Foc_A':'max','Foc_B':'max', 'Foc_C':'max','Sat':'max'}).reset_index()
df['Foc'] = np.where((df['Ch'] == 'Sat_A'), df['Foc_A'], '')
df['Foc'] = np.where((df['Ch'] == 'Sat_B'), df['Foc_B'], df['Foc'])
df['Foc'] = np.where((df['Ch'] == 'Sat_C'), df['Foc_C'], df['Foc'])
df['Ch'] = df['Ch'].str.replace('Sat_', '')
df = df.drop(['Foc_A', 'Foc_B', 'Foc_C'], axis=1)
df

output: output:

    ID1 ID2 Ch      Sat Foc
0   r   1   A       100 10
1   r   1   B       105 15
2   r   1   C       107 17
3   r   2   A       110 20
4   r   2   B       115 25
5   r   2   C       117 27
6   s   1   A       120 30
7   s   1   B       125 35
8   s   1   C       127 37
9   s   2   A       130 40
10  s   2   B       135 45
11  s   2   C       137 47

Creating a multi_index then using stack创建一个multi_index然后使用stack

df = df.set_index(['ID1','ID2'])

df.columns = df.columns.str.split('_',expand=True)

df1 = df.stack(1).reset_index().rename(columns={'level_2' : 'Ch'})

   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM