[英]Set column names when stacking pandas DataFrame
When stacking a pandas DataFrame
, a Series
is returned.堆叠 pandas
DataFrame
,返回一个Series
。 Normally after I stack a DataFrame
, I convert it back into a DataFrame
.通常在我堆叠
DataFrame
之后,我将它转换回DataFrame
。 However, the default names coming from the stacked data make renaming the columns a bit hacky.但是,来自堆叠数据的默认名称使得重命名列有点麻烦。 What I'm looking for is an easier/built-in way to give columns sensible names after stacking.
我正在寻找的是一种更简单/内置的方法,可以在堆叠后为列提供合理的名称。
Eg, for the following DataFrame
:例如,对于以下
DataFrame
:
In [64]: df = pd.DataFrame({'id':[1,2,3],
...: 'date':['2015-09-31']*3,
...: 'value':[100, 95, 42],
...: 'value2':[200, 57, 27]}).set_index(['id','date'])
In [65]: df
Out[65]:
value value2
id date
1 2015-09-31 100 200
2 2015-09-31 95 57
3 2015-09-31 42 27
I stack and convert it back to a DataFrame
like so:我像这样堆叠并将其转换回
DataFrame
:
In [68]: df.stack().reset_index()
Out[68]:
id date level_2 0
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
So in order to name these columns appropriately I would need to do something like this:所以为了适当地命名这些列,我需要做这样的事情:
In [72]: stacked = df.stack()
In [73]: stacked
Out[73]:
id date
1 2015-09-31 value 100
value2 200
2 2015-09-31 value 95
value2 57
3 2015-09-31 value 42
value2 27
dtype: int64
In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True)
In [88]: stacked.reset_index().rename(columns={0:'value'})
Out[88]:
id date var_name value
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
Ideally, the solution would look something like this:理想情况下,解决方案看起来像这样:
df.stack(new_index_name='var_name', new_col_name='value')
But looking at the docs it doesn't look like stack
takes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?但是看看文档,它看起来不像
stack
需要任何这样的 arguments。pandas 中是否有更简单/内置的方式来处理这个工作流程?
So here's one way that you may find a bit cleaner, using the fact that columns
and Series
can also carry names.因此,这里有一种您可能会觉得更简洁的方法,即使用
columns
和Series
也可以带有名称的事实。
In [45]: df
Out[45]:
value value2
id date
1 2015-09-31 100 200
2 2015-09-31 95 57
3 2015-09-31 42 27
In [46]: df.columns.name = 'var_name'
In [47]: s = df.stack()
In [48]: s.name = 'value'
In [49]: s.reset_index()
Out[49]:
id date var_name value
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
pd.melt
is often useful for converting DataFrames from "wide" to "long" format. pd.melt
通常用于将数据帧从“宽”格式转换为“长”格式。 You could use pd.melt
here if you convert the id
and date
index levels to columns first:如果首先将
id
和date
索引级别转换为列,则可以在此处使用pd.melt
:
In [56]: pd.melt(df.reset_index(), id_vars=['id', 'date'], value_vars=['value', 'value2'], var_name='var_name', value_name='value')
Out[56]:
id date var_name value
0 1 2015-09-31 value 100
1 2 2015-09-31 value 95
2 3 2015-09-31 value 42
3 1 2015-09-31 value2 200
4 2 2015-09-31 value2 57
5 3 2015-09-31 value2 27
A pipe-ing friendly alternative to chrisb's answer: chrisb 答案的管道友好替代方案:
df.stack().rename_axis(['id', 'date', 'var_name']).rename('value').reset_index()
And if explicit is better than implicit:如果显式优于隐式:
(
df
.stack()
.rename_axis(index={'id': 'id', 'date': 'date', None: 'var_name'})
.rename('value')
.reset_index()
)
When using the dict mapper, you can skip the names which should stay the same:使用 dict 映射器时,您可以跳过应该保持不变的名称:
df.stack().rename_axis(index={None: 'var_name'}).rename('value').reset_index()
Why not something like this?为什么不是这样的? Sometimes
melt
is great, but sometimes you want to keep your index, and/or you want to have an index on that new column.有时,
melt
很棒,但有时您想保留索引,和/或您想在该新列上建立索引。 This is like @krassowski's answer, but it doesn't require you to know the names of df's indices in advance.这就像@krassowski 的答案,但它不需要您提前知道 df 索引的名称。
df.stack().rename_axis([*df.index.names, "var_name"]).rename("value")
To avoid a phantom column name when calling stack
, just rename the column axis beforehand:为避免在调用
stack
时出现幻像的列名,只需预先重命名列轴:
df = pd.DataFrame({col: range(3) for col in list("ABC")})
df.rename_axis(columns="lol_goodbye_columns").stack()
lol_goodbye_columns
0 A 0
B 0
C 0
1 A 1
B 1
C 1
2 A 2
B 2
C 2
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.