简体   繁体   English

用熊猫切片Mutliindex数据

[英]Slicing Mutliindex data with Pandas

I have imported a csv as a multi-indexed Dataframe. 我已将csv导入为多索引数据框。 Here's a mockup of the data: 这是数据的模型:

df = pd.read_csv("coursedata2.csv", index_col=[0,2])

print (df)

                                  COURSE

ID Course List
12345 Interior Environments DESN10000 Rendering & Present Skills DESN20065 Lighting DESN20025 22345 Drawing Techniques DESN10016 Colour Theory DESN14049 Finishes & Sustainable Issues DESN12758 Lighting DESN20025 32345 Window Treatments&Soft Furnish DESN27370 42345 Introduction to CADD INFO16859 Principles of Drafting DESN10065 Drawing Techniques DESN10016 The Fundamentals of Design DESN15436 Colour Theory DESN14049 Interior Environments DESN10000 Drafting DESN10123 Textiles and Applications DESN10199 Finishes & Sustainable Issues DESN12758

[17 rows x 1 columns]

I can easily slice it by label using .xs -- eg: 我可以使用.xs通过标签轻松对其进行切片-例如:

selected = df.xs (12345, level='ID') print selected

                        COURSE
Course List                          
Interior Environments       DESN10000
Rendering & Present Skills  DESN20065
Lighting                    DESN20025

[3 rows x 1 columns]

> >

But what I want to do is step through the dataframe and perform an operation on each block of courses, by ID. 但是我想做的是逐步遍历数据框,并通过ID在课程的每个块上执行操作。 The ID values in the real data are fairly random integers, sorted in ascending order. 实际数据中的ID值是相当随机的整数,以升序排序。

df.index shows: df.index显示:

df.index MultiIndex(levels=[[12345, 22345, 32345, 42345], [u'Colour Theory', u'Colour Theory ', u'Drafting', u'Drawing Techniques', u'Finishes & Sustainable Issues', u'Interior Environments', u'Introduction to CADD', u'Lighting', u'Principles of Drafting', u'Rendering & Present Skills', u'Textiles and Applications', u'The Fundamentals of Design', u'Window Treatments&Soft Furnish']], labels=[[0, 0, 0, 1, 1, 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3], [5, 9, 7, 3, 1, 4, 7, 12, 6, 8, 3, 11, 0, 5, 2, 10, 4]], names=[u'ID', u'Course List'])

It seems to me that I should be able to use the first index labels to increment through the Dataframe. 在我看来,我应该能够使用第一个索引标签在整个Dataframe中进行递增。 Ie. 就是 Get all the courses for label 0 then 1 then 2 then 3,... but it looks like .xs will not slice by label. 获取标签0,然后1,然后2,然后3,...的所有课程。但是看起来.xs不会按标签进行分片。

Am I missing something? 我想念什么吗?

So there may be more efficient ways to do this, depending on what you're trying to do to the data. 因此,可能有更有效的方法来执行此操作,具体取决于您要对数据执行的操作。 However, there are two approaches which immediately come to mind: 但是,有两种方法可以立即想到:

for id_label in df.index.levels[0]:
    some_func(df.xs(id_label, level='ID'))

and

for id_label in df.index.levels[0]:
    df.xs(id_label, level='ID').apply(some_func, axis=1)

depending on whether you want to operate on the group as a whole or on each row with in it. 取决于要对整个组还是对其中的每一行进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM