从DataFrame堆叠时命名熊猫系列

Question

A common workflow that I have in pandas is getting data from some numerical function in "wide" form and turning it into a "long" form dataframe for plotting and statistical modeling. 在熊猫中，我有一个常见的工作流程是从某些数值函数中以“宽”形式获取数据，然后将其转换为“长”形式的数据框以进行绘图和统计建模。

What I mean by wide form is that there is variable information encoding in the columns. 我广义上的意思是，列中有可变信息编码。 For instance, say I measured some value at each of 5 timepoints in 10 different subjects: 例如，假设我在10个不同的主题的5个时间点分别测量了一些值：

wide_df = pd.DataFrame(np.random.randn(10, 5),
                       index=pd.Series(list("abcdefghij"), name="subject"),
                       columns=pd.Series(np.arange(5) * 2, name="timepoint"))
print wide_df


timepoint         0         2         4         6         8
subject                                                    
a         -0.670881  0.959608 -0.480081  0.142092  1.697058
b          2.369493 -0.561081 -0.183635 -0.807523 -0.421347
c         -0.908420  0.629171  0.196728 -0.907443  0.264352
d         -0.390138 -1.821304 -1.994605  0.225164  0.187649
e         -0.860542 -0.998323 -0.490968 -0.815570 -1.009524
f         -0.917390 -0.120567 -0.893095 -0.359155 -0.204112
g          0.557500 -1.522631 -1.175746  0.705043 -0.366932
h         -0.817043  2.204493 -0.305202  0.464969  0.280027
i         -1.137253  0.350984  0.095577  0.468167 -0.058058
j         -0.569986  2.438580 -0.514894  0.860504  1.397393

[10 rows x 5 columns]

The quickest way I know how to wrangle this thing into a long form dataframe is using stack and then reset_index : 我知道如何将这件事reset_index成长格式的数据reset_index ，最快的方法是使用stack ，然后使用reset_index ：

long_df = wide_df.stack().reset_index()
print long_df.head()

 subject  timepoint         0
0       a          0 -0.670881
1       a          2  0.959608
2       a          4 -0.480081
3       a          6  0.142092
4       a          8  1.697058

[5 rows x 3 columns]

The problem is that my "value" column is now named 0 . 问题是我的“值”列现在命名为0 。 I could do 我可以做

long_series = wide_df.stack()
long_series.name = "value"
long_df = long_series.reset_index()

But that is more typing, requires naming an intermediate variable, and mixes method calls with attribute assignment in a way that really breaks up my flow. 但这更多的类型化，需要命名一个中间变量，并以一种真正破坏我流程的方式将方法调用与属性分配混合在一起。

Is there a way to do this in one line? 有没有一种方法可以做到这一点？ I thought maybe df.stack would take a name argument, but it doesn't, and Series objects don't seem to have a set_name method that I can find. 我以为df.stack可能会带有一个name参数，但事实并非如此，而且Series对象似乎没有可以找到的set_name方法。

I do know about pandas.melt , but it seems like overkill in this case of "pure" wide table data, and it drops the subject index which is important. 我确实知道pandas.melt ，但是在这种“纯粹的”宽表数据的情况下，似乎pandas.melt过头了，并且它删除了重要的subject索引。 Is there another answer here? 这里还有其他答案吗？

Answer 1

Their is a name argument to Series.reset_index for just this reason 正因为如此，它们是Series.reset_index的name参数

In [14]: wide_df.stack().reset_index(name='foo')
Out[14]: 
   subject  timepoint       foo
0        a          0 -0.179968
1        a          2  1.559283
2        a          4  1.020142
3        a          6 -0.899663
4        a          8  2.983990
5        b          0  0.586476
6        b          2  0.055108
7        b          4  1.834005
8        b          6  1.226371
9        b          8  0.953103
10       c          0 -0.919273

You could define this if you want to as well (and would be a nice add to DataFrame): 如果需要的话，也可以定义此名称（对DataFrame来说是一个不错的添加）：

In [14]: def _melt(self, *args, **kwargs):
   ....:     return pd.melt(self.reset_index(), *args, **kwargs)
   ....: 

In [15]: DataFrame.melt = _melt

In [19]: wide_df.melt('subject',value_name='foo')
Out[19]: 
   subject  timepoint       foo
0        a          0  0.374912
1        b          0 -0.016272
2        c          0 -0.510553
3        d          0 -1.532472
4        e          0 -0.115107
5        f          0 -0.101772
6        g          0 -0.020966
7        h          0  0.427469

从DataFrame堆叠时命名熊猫系列

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-03-13 16:24:54

从DataFrame堆叠时命名熊猫系列

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-03-13 16:24:54

解决方案1
5 已采纳 2014-03-13 16:24:54