[英]Add a column to a stacked pandas dataframe
我想在amax列旁边添加一个days列,并计算amin和amax之间的天数差。
用这种方法创建新列失败
df["dates"]["diff"] = df["dates"]["amax"]-df["dates"]["amin"]
在这里,您可以看到我的数据框的示例。
我使用以下代码创建了数据框:
gb = stock_prices.groupby(['stock_name'])
df = gb.agg({'date' : [np.min, np.max]})
您可以重置列中的multiindex
,然后添加新列diff
:
print df
activity date name
0 slept 2014-12-02 Elon
1 tripped 2013-08-04 Bill
2 spoke 2012-05-08 Larry
3 swam 2015-04-11 Elon
4 spooked 2014-12-09 Jeff
5 liked 2009-10-23 Larry
6 whistled 2013-09-21 Larry
7 up dog 2011-01-02 Bill
8 smiled 2013-07-28 Larry
9 donated 2014-11-19 Elon
10 grant men paternity leave 2015-10-24 Marissa
11 fondled 2013-08-24 Jeff
#aggregate to min and max date
g = df.groupby(['name']).agg({'date' : [np.max, np.min]})
print g
date
amax amin
name
Bill 2013-08-04 2011-01-02
Elon 2015-04-11 2014-11-19
Jeff 2014-12-09 2013-08-24
Larry 2013-09-21 2009-10-23
Marissa 2015-10-24 2015-10-24
#reset columns multiindex
levels = g.columns.levels
labels = g.columns.labels
g.columns = levels[1][labels[1]]
g['diff'] = g['amax'] - g['amin']
print g
amax amin diff
name
Bill 2013-08-04 2011-01-02 945 days
Elon 2015-04-11 2014-11-19 143 days
Jeff 2014-12-09 2013-08-24 472 days
Larry 2013-09-21 2009-10-23 1429 days
Marissa 2015-10-24 2015-10-24 0 days
但是,如果您不想在列中重置multiindex
,请使用loc
:
print g
date
amax amin
name
Bill 2013-08-04 2011-01-02
Elon 2015-04-11 2014-11-19
Jeff 2014-12-09 2013-08-24
Larry 2013-09-21 2009-10-23
Marissa 2015-10-24 2015-10-24
g.loc[:, ('date', 'diff')] = g.loc[:, ('date', 'amax')] - g.loc[:, ('date', 'amin')]
print g
date
amax amin diff
name
Bill 2013-08-04 2011-01-02 945 days
Elon 2015-04-11 2014-11-19 143 days
Jeff 2014-12-09 2013-08-24 472 days
Larry 2013-09-21 2009-10-23 1429 days
Marissa 2015-10-24 2015-10-24 0 days
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.