[英]Pandas: reshaping dataframe by splitting columns in a column and a variable
I have the following dataframe that I am trying to melt:我有以下 dataframe 我正在尝试融化:
import numpy as np
import pandas as pd
dates = pd.date_range('1/1/2014', periods=4)
df = pd.DataFrame(np.eye(4, ), index=dates, columns=['A_var1', 'A_var2', 'B_var1', 'B_var2'])
print(df)
A_var1 A_var2 B_var1 B_var2
2014-01-01 1.0 0.0 0.0 0.0
2014-01-02 0.0 1.0 0.0 0.0
2014-01-03 0.0 0.0 1.0 0.0
2014-01-04 0.0 0.0 0.0 1.0
I want to obtain the following:我想获得以下内容:
type var1 var2
2014-01-01 A 1.0 0.0
2014-01-01 B 0.0 0.0
2014-01-02 A 0.0 1.0
2014-01-02 B 0.0 0.0
2014-01-03 A 0.0 0.0
2014-01-03 B 1.0 0.0
2014-01-04 A 0.0 0.0
2014-01-04 B 0.0 1.0
Any idea on how to do that efficiently?关于如何有效地做到这一点的任何想法? I know I can use the melt function but I can't get it to work in that context.
我知道我可以使用 melt function 但我无法让它在那种情况下工作。
Many thanks,非常感谢,
You could use stack
on multi-indexed columns. 您可以在多索引列上使用
stack
。
In [304]: df.columns = df.columns.str.split('_', expand=True)
In [305]: df.stack(0).reset_index(1)
Out[305]:
level_1 var1 var2
2014-01-01 A 1.0 0.0
2014-01-01 B 0.0 0.0
2014-01-02 A 0.0 1.0
2014-01-02 B 0.0 0.0
2014-01-03 A 0.0 0.0
2014-01-03 B 1.0 0.0
2014-01-04 A 0.0 0.0
2014-01-04 B 0.0 1.0
One option is with the pivot_longer function from pyjanitor , using the .value
placeholder:一种选择是使用pyjanitor中的 pivot_longer function ,使用
.value
占位符:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(names_to=("type", ".value"),
names_sep="_",
ignore_index=False,
sort_by_appearance = True)
type var1 var2
2014-01-01 A 1.0 0.0
2014-01-01 B 0.0 0.0
2014-01-02 A 0.0 1.0
2014-01-02 B 0.0 0.0
2014-01-03 A 0.0 0.0
2014-01-03 B 1.0 0.0
2014-01-04 A 0.0 0.0
2014-01-04 B 0.0 1.0
The .value
keeps the part of the column associated with it as header, while the rest goes into the type
column. .value
将与其关联的列的一部分保留为 header,而 rest 进入type
列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.