[英]Reshape Pandas dataframe columns by block of N columns
I have 1 dataframe where blocks of columns need to be reshaped to rows.我有 1 dataframe 需要将列块重新整形为行。 I tried to use stack() and melt() but could not manage to find the right way.
我尝试使用 stack() 和 melt() 但无法找到正确的方法。
Here is an example of what I expect:这是我期望的一个例子:
data = {'id':['a1', 'a2', 'a3', 'a4'],
'year':[20, 20, 19, 18],
'b_A': [1, 2, 3, 4],
'b_B': [5, 6, 7, 8],
'b_C': [9, 10, 11, 12],
'c_A': [13, 14, 15, 16],
'c_B': [17, 18, 19, 20],
'c_C': [21, 22, 23, 24],
'd_A': [25, 26, 27, 28],
'd_B': [29, 30, 31, 32],
'd_C': [33, 34, 35, 36],
}
df = pd.DataFrame(data)
id year b_A b_B b_C c_A c_B c_C d_A d_B d_C
0 a1 20 1 5 9 13 17 21 25 29 33
1 a2 20 2 6 10 14 18 22 26 30 34
2 a3 19 3 7 11 15 19 23 27 31 35
3 a4 18 4 8 12 16 20 24 28 32 36
The expected result should be:预期的结果应该是:
id year origin A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
Thanks for your time and help.感谢您的时间和帮助。
You can convert non columns names with _
to index by DataFrame.set_index
, then splitting columns by Series.str.split
and reshape by DataFrame.stack
:您可以通过 DataFrame.set_index 将带有
_
的非列名称转换为索引,然后通过DataFrame.set_index
拆分列并通过Series.str.split
DataFrame.stack
:
df1 = df.set_index(['id','year'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(level=0).reset_index()
print (df1)
id year level_2 A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
If need also set column origin
is possible use DataFrame.rename_axis
:如果需要还可以设置列
origin
,可以使用DataFrame.rename_axis
:
df1 = df.set_index(['id','year'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis(['origin',None], axis=1).stack(0).reset_index()
print (df1)
id year origin A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
Or use wide_to_long
with change order of values with _
like A_b
to b_A
:或者使用
wide_to_long
来改变_
的值顺序,比如A_b
到b_A
:
df.columns = [f'{"_".join(x[::-1])}' for x in df.columns.str.split('_')]
df1 = pd.wide_to_long(df,
stubnames=['A','B','C'],
i=['id','year'],
j='origin',
sep='_',
suffix=r'\w+').reset_index()
print (df1)
id year origin A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
You could also use pivot_longer function from pyjanitor ;您还可以使用pyjanitor的 pivot_longer function ; at the moment you have to install the latest development version from github :
目前您必须从github安装最新的开发版本:
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
df.pivot_longer(index=["id", "year"],
names_to=("origin", ".value"),
names_sep="_")
id year origin A B C
0 a1 20 b 1 5 9
1 a2 20 b 2 6 10
2 a3 19 b 3 7 11
3 a4 18 b 4 8 12
4 a1 20 c 13 17 21
5 a2 20 c 14 18 22
6 a3 19 c 15 19 23
7 a4 18 c 16 20 24
8 a1 20 d 25 29 33
9 a2 20 d 26 30 34
10 a3 19 d 27 31 35
11 a4 18 d 28 32 36
The names_sep
value splits the columns; names_sep
值拆分列; the split values that pair with .value
remain as column headers, while the other values are lumped underneath the origin
column.与
.value
配对的拆分值保留为列标题,而其他值集中在origin
列下方。
if you want the data in order of appearance, you can use the sort_by_appearance
parameter:如果您希望数据按出现顺序排列,可以使用
sort_by_appearance
参数:
df.pivot_longer(
index=["id", "year"],
names_to=("origin", ".value"),
names_sep="_",
sort_by_appearance=True,
)
id year origin A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.