简体   繁体   中英

Python Pandas - How to select only the first N rows for each unique value of a column

I have a DataFrame where I need to select no more than 3 rows having a certain value in the column Company:

Name Job Company
Jimmy Driver Amazon
Kate Driver Amazon
Jhonny Weiter Domino's
Mark Manager Amazon
Hugo Manager Domino's
Carl Driver Amazon
Jimmy Manager Amazon
Jimmy Manager Domino's
Betty Driver Amazon

Which should become:

Name Job Company
Jimmy Driver Amazon
Kate Driver Amazon
Jhonny Weiter Domino's
Mark Manager Amazon
Hugo Manager Domino's
Jimmy Manager Domino's

I tried with the .groupby().size() but I am missing something for sure.

Simply, Option 1:

df.groupby('COMPANY').head(3)

Option 2: You could loop through all the unique values in the column and print the output:

for i in df['COMPANY'].unique():
    x = df[df['COMPANY']==i].head(3)
    print(x)

I believe to filter based on specific company:

df[df.COMPANY == 'xxxx'].head(3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM