I have the following dataframe that I am trying to melt:
import numpy as np
import pandas as pd
dates = pd.date_range('1/1/2014', periods=4)
df = pd.DataFrame(np.eye(4, ), index=dates, columns=['A_var1', 'A_var2', 'B_var1', 'B_var2'])
print(df)
A_var1 A_var2 B_var1 B_var2
2014-01-01 1.0 0.0 0.0 0.0
2014-01-02 0.0 1.0 0.0 0.0
2014-01-03 0.0 0.0 1.0 0.0
2014-01-04 0.0 0.0 0.0 1.0
I want to obtain the following:
type var1 var2
2014-01-01 A 1.0 0.0
2014-01-01 B 0.0 0.0
2014-01-02 A 0.0 1.0
2014-01-02 B 0.0 0.0
2014-01-03 A 0.0 0.0
2014-01-03 B 1.0 0.0
2014-01-04 A 0.0 0.0
2014-01-04 B 0.0 1.0
Any idea on how to do that efficiently? I know I can use the melt function but I can't get it to work in that context.
Many thanks,
You could use stack
on multi-indexed columns.
In [304]: df.columns = df.columns.str.split('_', expand=True)
In [305]: df.stack(0).reset_index(1)
Out[305]:
level_1 var1 var2
2014-01-01 A 1.0 0.0
2014-01-01 B 0.0 0.0
2014-01-02 A 0.0 1.0
2014-01-02 B 0.0 0.0
2014-01-03 A 0.0 0.0
2014-01-03 B 1.0 0.0
2014-01-04 A 0.0 0.0
2014-01-04 B 0.0 1.0
One option is with the pivot_longer function from pyjanitor , using the .value
placeholder:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(names_to=("type", ".value"),
names_sep="_",
ignore_index=False,
sort_by_appearance = True)
type var1 var2
2014-01-01 A 1.0 0.0
2014-01-01 B 0.0 0.0
2014-01-02 A 0.0 1.0
2014-01-02 B 0.0 0.0
2014-01-03 A 0.0 0.0
2014-01-03 B 1.0 0.0
2014-01-04 A 0.0 0.0
2014-01-04 B 0.0 1.0
The .value
keeps the part of the column associated with it as header, while the rest goes into the type
column.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.