简体   繁体   中英

Make a new dataframe from the first row of every multi-index

I have a df:

          pageid
sid vid
 1  ABC     dog
    ABC     dog
    ABC     dog
    ABC     dog
 2  DEF     cat
    DEF     cat
    DEF     pig
    DEF     cat
 3  GHI     pig
    GHI     cat
    GHI     dog
    GHI     dog

Constructor:

import pandas as pd

i = pd.MultiIndex.from_arrays(
    [[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
     ['ABC', 'ABC', 'ABC', 'ABC', 'DEF', 'DEF', 'DEF', 'DEF', 'GHI', 'GHI',
      'GHI', 'GHI']],
    names=('sid', 'vid')
)

df = pd.DataFrame({
    'pageid': ['dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'pig', 'cat',
               'pig', 'cat', 'dog', 'dog']
}, index=i)

I want to make a brand new table with the first pageid in the multi-index

So result:

New table

sid vid pageid
1   ABC  dog
2   DEF  cat
3   GHI  pig

I tried data['initial'] = data.sort_values(['sid','ts'],ascending=True).groupby(level=['sid','vid')[0] but I get a memory error and don't know its the exact thing im looking for

Group the DataFrame by the index levels, and then select the first row of each group using first .

new_table = df.groupby(level=['sid', 'vid']).first().reset_index()

>>> new_table 

    sid  vid pageid
0    1  ABC    dog
1    2  DEF    cat
2    3  GHI    pig

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM