简体   繁体   English

堆叠时设置列名 pandas DataFrame

[英]Set column names when stacking pandas DataFrame

When stacking a pandas DataFrame , a Series is returned.堆叠 pandas DataFrame ,返回一个Series Normally after I stack a DataFrame , I convert it back into a DataFrame .通常在我堆叠DataFrame之后,我将它转换回DataFrame However, the default names coming from the stacked data make renaming the columns a bit hacky.但是,来自堆叠数据的默认名称使得重命名列有点麻烦。 What I'm looking for is an easier/built-in way to give columns sensible names after stacking.我正在寻找的是一种更简单/内置的方法,可以在堆叠后为列提供合理的名称。

Eg, for the following DataFrame :例如,对于以下DataFrame

In [64]: df = pd.DataFrame({'id':[1,2,3], 
    ...:                    'date':['2015-09-31']*3, 
    ...:                    'value':[100, 95, 42], 
    ...:                    'value2':[200, 57, 27]}).set_index(['id','date'])

In [65]: df
Out[65]: 
               value  value2
id date                     
1  2015-09-31    100     200
2  2015-09-31     95      57
3  2015-09-31     42      27

I stack and convert it back to a DataFrame like so:我像这样堆叠并将其转换回DataFrame

In [68]: df.stack().reset_index()
Out[68]: 
   id        date level_2    0
0   1  2015-09-31   value  100
1   1  2015-09-31  value2  200
2   2  2015-09-31   value   95
3   2  2015-09-31  value2   57
4   3  2015-09-31   value   42
5   3  2015-09-31  value2   27

So in order to name these columns appropriately I would need to do something like this:所以为了适当地命名这些列,我需要做这样的事情:

In [72]: stacked = df.stack()

In [73]: stacked
Out[73]: 
id  date              
1   2015-09-31  value     100
                value2    200
2   2015-09-31  value      95
                value2     57
3   2015-09-31  value      42
                value2     27
dtype: int64

In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True)

In [88]: stacked.reset_index().rename(columns={0:'value'})
Out[88]: 
   id        date var_name  value
0   1  2015-09-31    value    100
1   1  2015-09-31   value2    200
2   2  2015-09-31    value     95
3   2  2015-09-31   value2     57
4   3  2015-09-31    value     42
5   3  2015-09-31   value2     27

Ideally, the solution would look something like this:理想情况下,解决方案看起来像这样:

df.stack(new_index_name='var_name', new_col_name='value')

But looking at the docs it doesn't look like stack takes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?但是看看文档,它看起来不像stack需要任何这样的 arguments。pandas 中是否有更简单/内置的方式来处理这个工作流程?

So here's one way that you may find a bit cleaner, using the fact that columns and Series can also carry names.因此,这里有一种您可能会觉得更简洁的方法,即使用columnsSeries也可以带有名称的事实。

In [45]: df
Out[45]: 
               value  value2
id date                     
1  2015-09-31    100     200
2  2015-09-31     95      57
3  2015-09-31     42      27

In [46]: df.columns.name = 'var_name'

In [47]: s = df.stack()

In [48]: s.name = 'value'

In [49]: s.reset_index()
Out[49]: 
   id        date var_name  value
0   1  2015-09-31    value    100
1   1  2015-09-31   value2    200
2   2  2015-09-31    value     95
3   2  2015-09-31   value2     57
4   3  2015-09-31    value     42
5   3  2015-09-31   value2     27

pd.melt is often useful for converting DataFrames from "wide" to "long" format. pd.melt通常用于将数据帧从“宽”格式转换为“长”格式。 You could use pd.melt here if you convert the id and date index levels to columns first:如果首先将iddate索引级别转换为列,则可以在此处使用pd.melt

In [56]: pd.melt(df.reset_index(), id_vars=['id', 'date'], value_vars=['value', 'value2'], var_name='var_name', value_name='value')
Out[56]: 
   id        date var_name  value
0   1  2015-09-31    value    100
1   2  2015-09-31    value     95
2   3  2015-09-31    value     42
3   1  2015-09-31   value2    200
4   2  2015-09-31   value2     57
5   3  2015-09-31   value2     27

A pipe-ing friendly alternative to chrisb's answer: chrisb 答案的管道友好替代方案:

df.stack().rename_axis(['id', 'date', 'var_name']).rename('value').reset_index()

And if explicit is better than implicit:如果显式优于隐式:

(
    df
    .stack()
    .rename_axis(index={'id': 'id', 'date': 'date', None: 'var_name'})
    .rename('value')
    .reset_index()
)

When using the dict mapper, you can skip the names which should stay the same:使用 dict 映射器时,您可以跳过应该保持不变的名称:

df.stack().rename_axis(index={None: 'var_name'}).rename('value').reset_index()

Why not something like this?为什么不是这样的? Sometimes melt is great, but sometimes you want to keep your index, and/or you want to have an index on that new column.有时, melt很棒,但有时您想保留索引,和/或您想在该新列上建立索引。 This is like @krassowski's answer, but it doesn't require you to know the names of df's indices in advance.这就像@krassowski 的答案,但它不需要您提前知道 df 索引的名称。

df.stack().rename_axis([*df.index.names, "var_name"]).rename("value")

To avoid a phantom column name when calling stack , just rename the column axis beforehand:为避免在调用stack时出现幻像的列名,只需预先重命名列轴:

df = pd.DataFrame({col: range(3) for col in list("ABC")})
df.rename_axis(columns="lol_goodbye_columns").stack()
   lol_goodbye_columns
0  A                      0
   B                      0
   C                      0
1  A                      1
   B                      1
   C                      1
2  A                      2
   B                      2
   C                      2
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM