简体   繁体   中英

Slicing Mutliindex data with Pandas

I have imported a csv as a multi-indexed Dataframe. Here's a mockup of the data:

df = pd.read_csv("coursedata2.csv", index_col=[0,2])

print (df)

                                  COURSE

ID Course List
12345 Interior Environments DESN10000 Rendering & Present Skills DESN20065 Lighting DESN20025 22345 Drawing Techniques DESN10016 Colour Theory DESN14049 Finishes & Sustainable Issues DESN12758 Lighting DESN20025 32345 Window Treatments&Soft Furnish DESN27370 42345 Introduction to CADD INFO16859 Principles of Drafting DESN10065 Drawing Techniques DESN10016 The Fundamentals of Design DESN15436 Colour Theory DESN14049 Interior Environments DESN10000 Drafting DESN10123 Textiles and Applications DESN10199 Finishes & Sustainable Issues DESN12758

[17 rows x 1 columns]

I can easily slice it by label using .xs -- eg:

selected = df.xs (12345, level='ID') print selected

                        COURSE
Course List                          
Interior Environments       DESN10000
Rendering & Present Skills  DESN20065
Lighting                    DESN20025

[3 rows x 1 columns]

>

But what I want to do is step through the dataframe and perform an operation on each block of courses, by ID. The ID values in the real data are fairly random integers, sorted in ascending order.

df.index shows:

df.index MultiIndex(levels=[[12345, 22345, 32345, 42345], [u'Colour Theory', u'Colour Theory ', u'Drafting', u'Drawing Techniques', u'Finishes & Sustainable Issues', u'Interior Environments', u'Introduction to CADD', u'Lighting', u'Principles of Drafting', u'Rendering & Present Skills', u'Textiles and Applications', u'The Fundamentals of Design', u'Window Treatments&Soft Furnish']], labels=[[0, 0, 0, 1, 1, 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3], [5, 9, 7, 3, 1, 4, 7, 12, 6, 8, 3, 11, 0, 5, 2, 10, 4]], names=[u'ID', u'Course List'])

It seems to me that I should be able to use the first index labels to increment through the Dataframe. Ie. Get all the courses for label 0 then 1 then 2 then 3,... but it looks like .xs will not slice by label.

Am I missing something?

So there may be more efficient ways to do this, depending on what you're trying to do to the data. However, there are two approaches which immediately come to mind:

for id_label in df.index.levels[0]:
    some_func(df.xs(id_label, level='ID'))

and

for id_label in df.index.levels[0]:
    df.xs(id_label, level='ID').apply(some_func, axis=1)

depending on whether you want to operate on the group as a whole or on each row with in it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM