I have a df:
pageid
sid vid
1 ABC dog
ABC dog
ABC dog
ABC dog
2 DEF cat
DEF cat
DEF pig
DEF cat
3 GHI pig
GHI cat
GHI dog
GHI dog
Constructor:
import pandas as pd
i = pd.MultiIndex.from_arrays(
[[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
['ABC', 'ABC', 'ABC', 'ABC', 'DEF', 'DEF', 'DEF', 'DEF', 'GHI', 'GHI',
'GHI', 'GHI']],
names=('sid', 'vid')
)
df = pd.DataFrame({
'pageid': ['dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'pig', 'cat',
'pig', 'cat', 'dog', 'dog']
}, index=i)
I want to make a brand new table with the first pageid
in the multi-index
So result:
New table
sid vid pageid
1 ABC dog
2 DEF cat
3 GHI pig
I tried data['initial'] = data.sort_values(['sid','ts'],ascending=True).groupby(level=['sid','vid')[0]
but I get a memory error and don't know its the exact thing im looking for
Group the DataFrame by the index levels, and then select the first row of each group using first
.
new_table = df.groupby(level=['sid', 'vid']).first().reset_index()
>>> new_table
sid vid pageid
0 1 ABC dog
1 2 DEF cat
2 3 GHI pig
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.