[英]Reshape Pandas Dataframe with multiple column groups
I currently have a wide dataframe that looks like this:我目前有一个宽 dataframe 看起来像这样:
Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
0 r 1 10 15 17 100 105 107
1 r 2 20 25 27 110 115 117
2 s 1 30 35 37 120 125 127
3 s 2 40 45 47 130 135 137
Each entry has multiple identifier columns (ID1 and ID2).每个条目都有多个标识符列(ID1 和 ID2)。 I then have two separate categories of measurements (Foc and Sat) each of which contains multiple identifiers (A, B, C)(the category identifiers are shared between categories).然后,我有两个单独的测量类别(Foc 和 Sat),每个类别都包含多个标识符(A、B、C)(类别标识符在类别之间共享)。 I'm eventually needing to plot it in a facet_grid with x and y as each category of measurement and separated by category identifier, so I'm trying to reshape it so that it looks like this:我最终需要 plot 它在 facet_grid 中,x 和 y 作为每个测量类别并由类别标识符分隔,所以我试图重塑它,使其看起来像这样:
Index ID1 ID2 Ch Foc Sat
0 r 1 A 10 100
1 r 1 B 15 105
2 r 1 C 17 107
3 r 2 A 20 110
4 r 2 B 25 115
5 r 2 C 27 117
6 s 1 A 30 120
7 s 1 B 35 125
8 s 1 C 37 127
I've been trying.melt, .pivot, and.stack but not understanding what I'm doing well enough to make headway.我一直在尝试.melt、.pivot 和.stack,但不了解我做得很好以取得进展。
You are thinking in the right way.你正在以正确的方式思考。 You can do:你可以做:
# melt the dataframe
d1 = df.set_index(['Index', 'ID1', 'ID2']).stack().reset_index()
# create separate column
d1[['flag', 'Ch']] = d1['level_3'].str.split('_', expand=True)
d1 = d1.drop('level_3', 1)
d1 = d1.rename(columns = {0: 'Foc'})
# expand the dataframe to wide
d2 = pd.pivot_table(d1, index=['Index', 'ID1', 'ID2', 'Ch'], columns=['flag']).reset_index()
# fix column names
d2.columns = ['Index', 'ID1', 'ID2', 'Ch', 'Foc', 'Sat']
print(d2.head())
Index ID1 ID2 Ch Foc Sat
0 0 r 1 A 10 100
1 0 r 1 B 15 105
2 0 r 1 C 17 107
3 1 r 2 A 20 110
4 1 r 2 B 25 115
I'd set ID columns to the index, split and expand the columns on the '_' character, then stack the dataframe:我将 ID 列设置为索引,拆分并展开“_”字符上的列,然后堆叠 dataframe:
from io import StringIO
import pandas
datafile = StringIO("""\
Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
0 r 1 10 15 17 100 105 107
1 r 2 20 25 27 110 115 117
2 s 1 30 35 37 120 125 127
3 s 2 40 45 47 130 135 137
""")
(
pandas.read_csv(datafile, sep='\s+')
.set_index(['ID1', 'ID2'])
.drop(columns='Index')
.pipe(lambda df:
df.set_axis(
df.columns.str.split('_', expand=True),
axis="columns"
)
)
.rename_axis([None, 'Ch'], axis='columns')
.stack(level='Ch')
.reset_index()
)
And that give me:这给了我:
ID1 ID2 Ch Foc Sat
0 r 1 A 10 100
1 r 1 B 15 105
2 r 1 C 17 107
3 r 2 A 20 110
4 r 2 B 25 115
5 r 2 C 27 117
6 s 1 A 30 120
7 s 1 B 35 125
8 s 1 C 37 127
9 s 2 A 40 130
10 s 2 B 45 135
11 s 2 C 47 137
Let us do wide_to_long
让我们做wide_to_long
df = pd.wide_to_long(df,['Foc','Sat'],i=['ID1','ID2'],j='Ch',sep='_',suffix='\w+').reset_index()
Out[168]:
ID1 ID2 Ch Foc Sat
0 r 1 A 10 100
1 r 1 B 15 105
2 r 1 C 17 107
3 r 2 A 20 110
4 r 2 B 25 115
5 r 2 C 27 117
6 s 1 A 30 120
7 s 1 B 35 125
8 s 1 C 37 127
9 s 2 A 40 130
10 s 2 B 45 135
11 s 2 C 47 137
you could also achieve this with .melt
, .groupby
and np.where
:你也可以使用.melt
、 .groupby
和np.where
实现:
df = pd.melt(df, id_vars=['ID1','ID2','Foc_A', 'Foc_B', 'Foc_C'], var_name='Ch', value_name='Sat') \
.groupby(['ID1','ID2','Ch']).agg({'Foc_A':'max','Foc_B':'max', 'Foc_C':'max','Sat':'max'}).reset_index()
df['Foc'] = np.where((df['Ch'] == 'Sat_A'), df['Foc_A'], '')
df['Foc'] = np.where((df['Ch'] == 'Sat_B'), df['Foc_B'], df['Foc'])
df['Foc'] = np.where((df['Ch'] == 'Sat_C'), df['Foc_C'], df['Foc'])
df['Ch'] = df['Ch'].str.replace('Sat_', '')
df = df.drop(['Foc_A', 'Foc_B', 'Foc_C'], axis=1)
df
output: output:
ID1 ID2 Ch Sat Foc
0 r 1 A 100 10
1 r 1 B 105 15
2 r 1 C 107 17
3 r 2 A 110 20
4 r 2 B 115 25
5 r 2 C 117 27
6 s 1 A 120 30
7 s 1 B 125 35
8 s 1 C 127 37
9 s 2 A 130 40
10 s 2 B 135 45
11 s 2 C 137 47
Creating a multi_index
then using stack
创建一个multi_index
然后使用stack
df = df.set_index(['ID1','ID2'])
df.columns = df.columns.str.split('_',expand=True)
df1 = df.stack(1).reset_index().rename(columns={'level_2' : 'Ch'})
ID1 ID2 Ch Foc Sat
0 r 1 A 10 100
1 r 1 B 15 105
2 r 1 C 17 107
3 r 2 A 20 110
4 r 2 B 25 115
5 r 2 C 27 117
6 s 1 A 30 120
7 s 1 B 35 125
8 s 1 C 37 127
9 s 2 A 40 130
10 s 2 B 45 135
11 s 2 C 47 137
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.