简体   繁体   English

在Pandas DataFrame中使用set_index

[英]Working with set_index in Pandas DataFrame

Using an imported CSV file, I indexed the DataFrame like this... 使用导入的CSV文件,我像这样索引DataFrame ...

 rdata.set_index(['race_date', 'track_code', 'race_number', 'horse_name'])

This is what a section of the DataFrame looks like... 这就是DataFrame的一部分看起来像......

 race_date  track_code race_number horse_name          work_date  work_track
 2007-08-24 BM         8           Count Me Twice     2007-05-31         PLN
                                   Count Me Twice     2007-06-09         PLN
                                   Count Me Twice     2007-06-16         PLN
                                   Count Me Twice     2007-06-23         PLN
                                   Count Me Twice     2007-08-05         PLN
                                   Judge's Choice     2007-06-07          BM
                                   Judge's Choice     2007-06-14          BM
                                   Judge's Choice     2007-07-08          BM
                                   Judge's Choice     2007-08-18          BM

Why isn't the 'horse_name' column being grouped like the date, track and race? 为什么'horse_name'列不像日期,曲目和种族一样被分组? Perhaps it's by design, thus how can I slice this larger DataFrame by race to have a new DataFrame with 'horse_name' as its index? 也许它是设计的,因此如何通过竞赛来切割这个更大的DataFrame,以获得一个以'horse_name'作为索引的新DataFrame?

It's not a bug. 这不是一个错误。 This is exactly how it's intended to work. 这正是它的工作方式。

DataFrame has to show show every single item in it's data. DataFrame必须显示其数据中的每个项目。 So if the index has one level, that level will be fully expanded. 因此,如果索引具有一个级别,则该级别将完全展开。 If it has two levels, first level will be grouped and the second will be fully expanded, if it has tree levels, first two will be grouped and the third will be expanded, and so on. 如果它有两个级别,第一级将被分组,第二级将完全展开,如果它具有树级别,前两个将被分组,第三个将被扩展,依此类推。

So this is why the horse name is not grouped. 所以这就是马名没有分组的原因。 How would you be able to see all the items in the DataFrame if you group also by the horse name :) 如果你也按马名分组,你怎么能看到DataFrame中的所有项目:)

Try doing: 尝试做:

 rdata.set_index(['race_date', 'track_code', 'race_number'])

or: 要么:

 rdata.set_index(['race_date', 'track_code'])

You'll see that the last level of the index is always fully expanded, to enable you to see all the items in the DataFrame. 您将看到索引的最后一级始终完全展开,以便您可以查看DataFrame中的所有项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM