![](/img/trans.png)
[英]How to subtract second level columns in multiIndex level dataframe
[英]How to resort a MultiIndex DataFrame by second level
我有一个DataFrame
具有MultiIndex
。 索引字段是OptionSymbol
(级别0)和QuoteDatetime
(级别1)。 我已经对DataFrame
进行了索引和排序, DataFrame
所示:
sorted = df.sort_values(
['OptionSymbol', 'QuoteDatetime'],
ascending=[False, True]
)
indexed = sorted.set_index(
['OptionSymbol', 'QuoteDatetime'],
drop=True
)
结果如下:
Id Strike Expiration OptionType
OptionSymbol QuoteDatetime
ZBYMZ 2013-09-02 234669 170.0 2011-01-22 put
2013-09-03 234901 170.0 2011-01-22 put
2013-09-04 235133 170.0 2011-01-22 put
... ... ... ... ... ...
YBWNA 2010-02-12 262202 95.0 2010-02-20 call
2010-02-16 262454 95.0 2010-02-20 call
2010-02-17 262707 95.0 2010-02-20 call
... ... ... ... ... ...
XWNAX 2012-07-12 262201 90.0 2010-02-20 call
2012-07-16 262453 90.0 2010-02-20 call
2012-07-17 262706 90.0 2010-02-20 call
... ... ... ... ... ...
WWWAX 2012-04-12 262201 90.0 2010-02-20 call
2012-04-16 262453 90.0 2010-02-20 call
2012-04-17 262706 90.0 2010-02-20 call
... ... ... ... ... ...
如预期的那样,首先在OptionSymbol
组中以OptionSymbol
降序和升序对帧进行排序。
我需要做的是立即使用QuoteDatetime
的第一个值,因此结果如下所示:
Id Strike Expiration OptionType
OptionSymbol QuoteDatetime
XBWNA 2010-02-12 262202 95.0 2010-02-20 call
2010-02-16 262454 95.0 2010-02-20 call
2010-02-17 262707 95.0 2010-02-20 call
... ... ... ... ... ...
NWWAX 2012-04-12 262201 90.0 2010-02-20 call
2012-04-16 262453 90.0 2010-02-20 call
2012-04-17 262706 90.0 2010-02-20 call
... ... ... ... ... ...
BWNAX 2012-07-12 262201 90.0 2010-02-20 call
2012-07-16 262453 90.0 2010-02-20 call
2012-07-17 262706 90.0 2010-02-20 call
... ... ... ... ... ...
XBYMZ 2013-09-02 234669 170.0 2011-01-22 put
2013-09-03 234901 170.0 2011-01-22 put
2013-09-04 235133 170.0 2011-01-22 put
... ... ... ... ... ...
我尝试了各种通过index = 1进行OptionSymbol
方法,但是后来我失去了OptionSymbol
组。 我该怎么做?
from collections import OrderedDict
df = OrderedDict((
('OptionSymbol', pd.Series(['ZBYMZ', 'ZBYMZ', 'ZBYMZ', 'YBWNA', 'YBWNA', 'YBWNA', 'XWNAX', 'XWNAX', 'XWNAX', 'WWWAX', 'WWWAX', 'WWWAX', ])),
('QuoteDatetime', pd.Series(['2013-09-02', '2013-09-03', '2013-09-04', '2010-02-12', '2010-02-16', '2010-02-17', '2012-07-12', '2012-07-16', '2012-07-17', '2012-04-12', '2012-04-16', '2012-04-17'])),
('Id', pd.Series(np.random.randn(12,))),
('Strike', pd.Series(np.random.randn(12,))),
('Expiration', pd.Series(np.random.randn(12,))),
('OptionType', pd.Series(np.random.randn(12,)))
))
在这种情况下,使用df.sort_index(level=1)
奇怪,但是在我的整个数据集(超过20列)上,我却失去了OptionSymbol
分组的OptionSymbol
。
IIUC您可以简单地按第二级对索引进行排序:
In [27]: df.sort_index(level=1)
Out[27]:
Id Strike Expiration OptionType
OptionSymbol QuoteDatetime
YBWNA 2010-02-12 262202 95.0 2010-02-20 call
2010-02-16 262454 95.0 2010-02-20 call
2010-02-17 262707 95.0 2010-02-20 call
WWWAX 2012-04-12 262201 90.0 2010-02-20 call
2012-04-16 262453 90.0 2010-02-20 call
2012-04-17 262706 90.0 2010-02-20 call
XWNAX 2012-07-12 262201 90.0 2010-02-20 call
2012-07-16 262453 90.0 2010-02-20 call
2012-07-17 262706 90.0 2010-02-20 call
ZBYMZ 2013-09-02 234669 170.0 2011-01-22 put
2013-09-03 234901 170.0 2011-01-22 put
2013-09-04 235133 170.0 2011-01-22 put
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.