I have 1 dataframe where blocks of columns need to be reshaped to rows. I tried to use stack() and melt() but could not manage to find the right way.
Here is an example of what I expect:
data = {'id':['a1', 'a2', 'a3', 'a4'],
'year':[20, 20, 19, 18],
'b_A': [1, 2, 3, 4],
'b_B': [5, 6, 7, 8],
'b_C': [9, 10, 11, 12],
'c_A': [13, 14, 15, 16],
'c_B': [17, 18, 19, 20],
'c_C': [21, 22, 23, 24],
'd_A': [25, 26, 27, 28],
'd_B': [29, 30, 31, 32],
'd_C': [33, 34, 35, 36],
}
df = pd.DataFrame(data)
id year b_A b_B b_C c_A c_B c_C d_A d_B d_C
0 a1 20 1 5 9 13 17 21 25 29 33
1 a2 20 2 6 10 14 18 22 26 30 34
2 a3 19 3 7 11 15 19 23 27 31 35
3 a4 18 4 8 12 16 20 24 28 32 36
The expected result should be:
id year origin A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
Thanks for your time and help.
You can convert non columns names with _
to index by DataFrame.set_index
, then splitting columns by Series.str.split
and reshape by DataFrame.stack
:
df1 = df.set_index(['id','year'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(level=0).reset_index()
print (df1)
id year level_2 A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
If need also set column origin
is possible use DataFrame.rename_axis
:
df1 = df.set_index(['id','year'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis(['origin',None], axis=1).stack(0).reset_index()
print (df1)
id year origin A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
Or use wide_to_long
with change order of values with _
like A_b
to b_A
:
df.columns = [f'{"_".join(x[::-1])}' for x in df.columns.str.split('_')]
df1 = pd.wide_to_long(df,
stubnames=['A','B','C'],
i=['id','year'],
j='origin',
sep='_',
suffix=r'\w+').reset_index()
print (df1)
id year origin A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
You could also use pivot_longer function from pyjanitor ; at the moment you have to install the latest development version from github :
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
df.pivot_longer(index=["id", "year"],
names_to=("origin", ".value"),
names_sep="_")
id year origin A B C
0 a1 20 b 1 5 9
1 a2 20 b 2 6 10
2 a3 19 b 3 7 11
3 a4 18 b 4 8 12
4 a1 20 c 13 17 21
5 a2 20 c 14 18 22
6 a3 19 c 15 19 23
7 a4 18 c 16 20 24
8 a1 20 d 25 29 33
9 a2 20 d 26 30 34
10 a3 19 d 27 31 35
11 a4 18 d 28 32 36
The names_sep
value splits the columns; the split values that pair with .value
remain as column headers, while the other values are lumped underneath the origin
column.
if you want the data in order of appearance, you can use the sort_by_appearance
parameter:
df.pivot_longer(
index=["id", "year"],
names_to=("origin", ".value"),
names_sep="_",
sort_by_appearance=True,
)
id year origin A B C
0 a1 20 b 1 5 9
1 a1 20 c 13 17 21
2 a1 20 d 25 29 33
3 a2 20 b 2 6 10
4 a2 20 c 14 18 22
5 a2 20 d 26 30 34
6 a3 19 b 3 7 11
7 a3 19 c 15 19 23
8 a3 19 d 27 31 35
9 a4 18 b 4 8 12
10 a4 18 c 16 20 24
11 a4 18 d 28 32 36
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.