简体   繁体   English

如何切片为 MultiIndex Pandas DataFrame?

[英]How to slice into a MultiIndex Pandas DataFrame?

Suppose you have the following data frame:假设您有以下数据框:

In [1]: import pandas as pd
In [2]: index = [('California',2000),('California', 2010), ('New York', 2000),
 ('New York', 2000), ('New York', 2010), ('Texas', 2000), ('Texas',2010)]
In [3]: populations = [33871648, 37253956,189765457,19378102,20851820,25145561
     ...: ]
In [4]: pop_df = pd.DataFrame(populations,index=index,columns=["Data"])
In [5]: pop_df
Out[5]:
                         Data
(California, 2000)   33871648
(California, 2010)   37253956
(New York, 2000)    189765457
(New York, 2010)     19378102
(Texas, 2000)        20851820
(Texas, 2010)        25145561

How can one index into this dataframe to get all of the California data?如何通过索引 dataframe 获取所有加州数据? I tried the following and got a key error pop_df[('California,)] .我尝试了以下并得到一个关键错误pop_df[('California,)] So then I executed the following and still got a key error:因此,我执行了以下操作,但仍然出现关键错误:

In [6]: index2 = pd.MultiIndex.from_tuples(index)
In [7]: pop_df2 = pop_df.reindex(index2)
In [8]: pop_df2
Out[8]:
                      Data
California 2000   33871648
           2010   37253956
New York   2000  189765457
           2010   19378102
Texas      2000   20851820
           2010   25145561

In [9]: pop_df2['California']

pop_df2['California']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'California'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-141-18a1a54664b0> in <module>
----> 1 pop_df2['California']

~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083
   3084         if tolerance is not None:

KeyError: 'California'

What is the right way to index into a multiindex dataframe?索引到多索引 dataframe 的正确方法是什么?

df['somename'] looks for columns, df.loc['somename'] looks for index. df['somename']查找列, df.loc['somename']查找索引。 You want:你要:

pop_df2.loc['California']

Output: Output:

          Data
2000  33871648
2010  37253956

You also have xs option, which allows slicing on different level, and also keeping the full index hierarchy:您还有xs选项,它允许在不同级别上进行切片,并保持完整的索引层次结构:

# default `drop_level` is True
# which behave like `.loc` on top level
pop_df.xs('California', level=0, drop_level=False)

Output: Output:

                     Data
California 2000  33871648
           2010  37253956

Or xs on second level:或第二级的xs

pop_df.xs(2010, level=1, drop_level=False)

gives you:给你:

                     Data
California 2010  37253956
New York   2010  19378102
Texas      2010  25145561

You want .loc[] .你想要.loc[] Without it, you are looking for a column named 'California', not an index label.没有它,您正在寻找名为“California”的列,而不是索引 label。

By the way, you had a typo in your input where you were duplicating an index entry.顺便说一句,您的输入中有一个错字,您正在复制索引条目。 Here is the full code.这是完整的代码。

In [1]: import pandas as pd
   ...: index = [
   ...: ('California',2000),
   ...: ('California', 2010),
   ...: ('New York', 2000),
   ...: ('New York', 2010),
   ...: ('Texas', 2000),
   ...: ('Texas',2010)
   ...: ]
   ...: populations = [33871648, 37253956,189765457,19378102,20851820,25145561]
   ...: pop_df = pd.DataFrame(populations,index=index,columns=["Data"])
   ...: index2 = pd.MultiIndex.from_tuples(index)
   ...: pop_df2 = pop_df.reindex(index2)
   ...: pop_df2.loc['California']
Out[1]: 
          Data
2000  33871648
2010  37253956

Try with IndexSlice尝试使用IndexSlice

pop_df2.loc[pd.IndexSlice[['California'],],]
Out[52]: 
                     Data
California 2000  33871648
           2010  37253956

Here is a solution.这是一个解决方案。 You need to indicate level:您需要指明级别:

pop_df2[pop_df2.index.get_level_values(0) == 'California']

#Output:
                     Data
California  2000    33871648
            2010    37253956

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM