[英]Pandas Pivot Table Subsetting
My pivot table looks like this: 我的数据透视表如下所示:
Symbol DIA QQQ SPY XLE DIA QQQ SPY XLE DIA QQQ \
Open Open Open Open High High High High Low Low
Date
19930129 NaN NaN 29.083294 NaN NaN NaN 29.083294 NaN NaN NaN
19930201 NaN NaN 29.083294 NaN NaN NaN 29.269328 NaN NaN NaN
19930202 NaN NaN 29.248658 NaN NaN NaN 29.352010 NaN NaN NaN
19930203 NaN NaN 29.372680 NaN NaN NaN 29.662066 NaN NaN NaN
19930204 NaN NaN 29.744748 NaN NaN NaN 29.827430 NaN NaN NaN
Symbol SPY XLE DIA QQQ SPY XLE DIA \
Low Low Close Close Close Close Total Volume
Date
19930129 28.938601 NaN NaN NaN 29.062624 NaN NaN
19930201 29.083294 NaN NaN NaN 29.269328 NaN NaN
19930202 29.186647 NaN NaN NaN 29.331340 NaN NaN
19930203 29.352010 NaN NaN NaN 29.641396 NaN NaN
19930204 29.414021 NaN NaN NaN 29.765419 NaN NaN
Symbol QQQ SPY XLE
Total Volume Total Volume Total Volume
Date
19930129 NaN 15167 NaN
19930201 NaN 7264 NaN
19930202 NaN 3043 NaN
19930203 NaN 8004 NaN
19930204 NaN 8035 NaN
How does one go about subsetting for a particular day and for a particular column value, say Closing prices for all symbols? 如何对特定的一天和特定的列值进行子集设置,例如关闭所有交易品种的价格?
19930129 NaN NaN 29.062624 NaN
i tried pt['Close']
, but it didn't seem to work. 我尝试了
pt['Close']
,但似乎没有用。 Only pt['SPY']
gives me the whole table values for symbol SPY. 只有
pt['SPY']
给出了符号SPY的整个表值。
You could use pd.IndexSlice
: 您可以使用
pd.IndexSlice
:
pt = pt.sortlevel(axis=1)
pt.loc['19930129', pd.IndexSlice[:,'Close']]
Using IndexSlicer requires the selection axes are fully lexsorted, hence the call to sortlevel
. 使用IndexSlicer需要对选择轴进行完全lexsorted,因此要调用
sortlevel
。
Alternatively, slice(None)
could also be used to select everything from the first column index level: 另外,
slice(None)
也可以用于从第一列索引级别选择所有内容:
pt = pt.sortlevel(axis=1)
pt.loc['19930129', (slice(None), 'Close')]
To select the ith
row, but select the columns by label, you could use 要选择第
ith
行,但按标签选择列,则可以使用
pt.loc[pt.index[i], (slice(None), 'Close')]
Or, you could use pt.ix
as Andy Hayden suggests, but be aware that if pt
has an integer-valued index, then pt.ix
performs label-based row indexing, not ordinal indexing. 或者,您可以按照Andy Hayden的建议使用
pt.ix
,但要注意,如果pt
具有整数索引,则pt.ix
执行基于标签的行索引,而不是顺序索引。
So as long as 19930129
(and the other index values) are not integers -- ie pt.index
is not a Int64Index
-- you could use 因此,只要
19930129
(和其他索引值)不是整数-即pt.index
不是Int64Index
您可以使用
pt.ix[i, (slice(None), 'Close')]
Note that chained indexing , such as 请注意, 链式索引 ,例如
pt.iloc[i].loc[(slice(None), 'Close')]
should be avoided when performing assignments, since assignment with chained indexing may fail to modify pt
. 执行分配时应避免使用,因为具有链接索引的分配可能无法修改
pt
。
An alternative is to use xs , "cross-section": 一种替代方法是使用xs ,“横截面”:
In [21]: df.xs(axis=1, level=1, key="Open")
Out[21]:
Symbol DIA QQQ SPY XLE
Date
19930129 NaN NaN 29.083294 NaN
19930201 NaN NaN 29.083294 NaN
19930202 NaN NaN 29.248658 NaN
19930203 NaN NaN 29.372680 NaN
19930204 NaN NaN 29.744748 NaN
In [22]: df.xs(axis=1, level=1, key="Open").loc[19930129]
Out[22]:
Symbol
DIA NaN
QQQ NaN
SPY 29.083294
XLE NaN
Name: 19930129, dtype: float64
This is somewhat less powerful that unutbu's answer (using IndexSlice). 这没有unutbu的答案强大(使用IndexSlice)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.