为什么在具有一级索引的 MultiIndex 列的 pandas DataFrame 中的行为不同？

Question

使用此处找到的pandas文档中的示例，以下索引可以完美运行，结果是pd.Series ：

import pandas as pd
tuples = [(1, 'red'), (1, 'blue'),
          (2, 'red'), (2, 'blue')]
columns = pd.MultiIndex.from_tuples(tuples, names=('number', 'color'))
asdf = pd.DataFrame(columns=columns, index=[0, 1])
asdf.loc[:, (1, 'red')]

但如果我稍微更改代码，消除一个级别，相同的索引不起作用：

import pandas as pd
tuples = [(1,), (2,)]
columns = pd.MultiIndex.from_tuples(tuples, names=['number'])
asdf = pd.DataFrame(columns=columns, index=[0, 1])
asdf.loc[:, (1,)]

IndexError                                Traceback (most recent call last)
<ipython-input-43-d55399a979fa> in <module>
----> 1 asdf.loc[:, (1,)]

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1760                 except (KeyError, IndexError, AttributeError):
   1761                     pass
-> 1762             return self._getitem_tuple(key)
   1763         else:
   1764             # we by definition only have the 0th axis

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1270     def _getitem_tuple(self, tup: Tuple):
   1271         try:
-> 1272             return self._getitem_lowerdim(tup)
   1273         except IndexingError:
   1274             pass

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
   1371         # we may have a nested tuples indexer here
   1372         if self._is_nested_tuple_indexer(tup):
-> 1373             return self._getitem_nested_tuple(tup)
   1374 
   1375         # we maybe be using a tuple to represent multiple dimensions here

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_nested_tuple(self, tup)
   1451 
   1452             current_ndim = obj.ndim
-> 1453             obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
   1454             axis += 1
   1455 

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1963         # fall thru to straight lookup
   1964         self._validate_key(key, axis)
-> 1965         return self._get_label(key, axis=axis)
   1966 
   1967 

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
    620             # see GH5667
    621             return self.obj._xs(label, axis=axis)
--> 622         elif isinstance(label, tuple) and isinstance(label[axis], slice):
    623             raise IndexingError("no slices here, handle elsewhere")
    624 

IndexError: tuple index out of range

此外，将其索引为asdf.loc[:, 1]会引发TypeError ，更进一步，将其索引为asdf.loc[:, ((1,),)]有效，但结果是pd.DataFrame ，而不是一个pd.Series .系列！

为什么会这样？ 非常感谢您！

PS：我有兴趣从这些问题中“抽象”我的代码（一个级别与pd.DataFrame.columns中的多个级别）。 在我工作的公司中，有时我们会在需要多个级别的情况下获取客户数据，但有时只需要一个级别。

Answer 1

您是否更新了您的 pandas 版本？ 在pandas v1.1.0中，您可以像所做的那样使用一级索引，并且切片返回一个pd.Series

import pandas as pd
tuples = [(1,), (2,)]
columns = pd.MultiIndex.from_tuples(tuples, names=['number'])
asdf = pd.DataFrame(columns=columns, index=[0, 1])
asdf.loc[:, (1,)]

Output：

0    NaN
1    NaN

为什么在具有一级索引的 MultiIndex 列的 pandas DataFrame 中的行为不同？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-07 05:17:01

为什么在具有一级索引的 MultiIndex 列的 pandas DataFrame 中的行为不同？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-07 05:17:01

解决方案1
1 已采纳 2020-08-07 05:17:01