简体   繁体   English

熊猫:沿着多索引的第一层切片

[英]pandas: slicing along first level of multiindex

I've set up a DataFrame with two indices. 我已经设置了一个带有两个索引的DataFrame。 But slicing doesn't behave as expected. 但切片的行为并不像预期的那样。 I realize that this is a very basic problem, so I searched for similar questions: 我意识到这是一个非常基本的问题,所以我搜索了类似的问题:

pandas: slice a MultiIndex by range of secondary index pandas:按二级索引的范围切片MultiIndex

Python Pandas slice multiindex by second level index (or any other level) Python Pandas通过二级索引(或任何其他级别)切片多索引

I also looked at the corresponding documentation 我还查看了相应的文档

Strangely none of the proposed solutions work for me. 奇怪的是,所提出的解决方案都不适用于我。 I've set up a simple example to showcase the problem: 我已经设置了一个简单的例子来展示问题:

# this is my DataFrame
frame = pd.DataFrame([
{"a":1, "b":1, "c":"11"},
{"a":1, "b":2, "c":"12"},
{"a":2, "b":1, "c":"21"},
{"a":2, "b":2, "c":"22"},
{"a":3, "b":1, "c":"31"},
{"a":3, "b":2, "c":"32"}])

# now set a and b as multiindex
frame = frame.set_index(["a","b"])

Now I'm trying different ways of slicing the frame. 现在我正在尝试不同的切片方式。 The first two lines work, the third throws an exception: 前两行有效,第三行抛出异常:

# selecting a specific cell works
frame.loc[1,2]

# slicing along the second index works
frame.loc[1,:]

# slicing along the first doesn't work
frame.loc[:,1]

It's a TypeError: 这是一个TypeError:

TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

Solution 1: Using tuples of slices 解决方案1:使用切片元组

This is proposed in this question: pandas: slice a MultiIndex by range of secondary index 这是在这个问题中提出的: pandas:按二级索引的范围切片MultiIndex

Indeed, you can pass a slice for each level 实际上,您可以为每个级别传递切片

But that doesn't work for me, the same type error as above is produced. 但这对我不起作用,产生与上面相同的类型错误。

frame.loc[(slice(1,2), 1)]

Solution 2: Using IndexSlice 解决方案2:使用IndexSlice

Python Pandas slice multiindex by second level index (or any other level) Python Pandas通过二级索引(或任何其他级别)切片多索引

Use an indexer to slice arbitrary values in arbitrary dimensions 使用索引器以任意维度切片任意值

Again, that doesn't work for me, it produces the same type error. 同样,这对我不起作用,它会产生相同的类型错误。

frame.loc[pd.IndexSlice[:,2]]

I don't understand how this typeerror can be produced. 我不明白这种类型的错误是如何产生的。 After all I can use integers to select specific cells, and ranges along the second dimension work fine. 毕竟我可以使用整数来选择特定的单元格,并且沿着第二维度的范围可以正常工作。 Googling for my specific error message doesn't really help. 谷歌搜索我的具体错误消息并没有真正帮助。 For example, here someone tries to use integers to slice along an index of type float: https://github.com/pandas-dev/pandas/issues/12333 例如,这里有人试图使用整数沿着float类型的索引进行切片: https//github.com/pandas-dev/pandas/issues/12333

I tried explicitly converting my indices to int, maybe the numpy backend stores everything as float by default ? 我尝试将我的索引显式转换为int,也许numpy后端默认将所有内容存储为浮点数? But that didn't change anything, afterwards the same errors as above appear: 但是这并没有改变任何东西,之后出现了与上面相同的错误:

frame["a"]=frame["a"].apply(lambda x : int(x))
frame["b"]=frame["b"].apply(lambda x : int(x))

type(frame["b"][0])  # it's numpy.int64

IIUC you just have to specify : for columns when indexing a multi-index DF: 您只需指定IIUC :索引多索引DF时的列:

In [40]: frame.loc[pd.IndexSlice[:,2], :]
Out[40]:
      c
a b
1 2  12
2 2  22
3 2  32

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM