Make a new dataframe from the first row of every multi-index

Question

I have a df:

          pageid
sid vid
 1  ABC     dog
    ABC     dog
    ABC     dog
    ABC     dog
 2  DEF     cat
    DEF     cat
    DEF     pig
    DEF     cat
 3  GHI     pig
    GHI     cat
    GHI     dog
    GHI     dog

Constructor:

import pandas as pd

i = pd.MultiIndex.from_arrays(
    [[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
     ['ABC', 'ABC', 'ABC', 'ABC', 'DEF', 'DEF', 'DEF', 'DEF', 'GHI', 'GHI',
      'GHI', 'GHI']],
    names=('sid', 'vid')
)

df = pd.DataFrame({
    'pageid': ['dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'pig', 'cat',
               'pig', 'cat', 'dog', 'dog']
}, index=i)

I want to make a brand new table with the first pageid in the multi-index

So result:

New table

sid vid pageid
1   ABC  dog
2   DEF  cat
3   GHI  pig

I tried data['initial'] = data.sort_values(['sid','ts'],ascending=True).groupby(level=['sid','vid')[0] but I get a memory error and don't know its the exact thing im looking for

Answer 1

Group the DataFrame by the index levels, and then select the first row of each group using first .

new_table = df.groupby(level=['sid', 'vid']).first().reset_index()

>>> new_table 

    sid  vid pageid
0    1  ABC    dog
1    2  DEF    cat
2    3  GHI    pig

Make a new dataframe from the first row of every multi-index

Question

1 answers

solution1
0 2021-11-10 23:59:19

Make a new dataframe from the first row of every multi-index

Question

1 answers

solution1 0 2021-11-10 23:59:19

solution1
0 2021-11-10 23:59:19