简体   繁体   中英

Limit DataFrame rows by value frequency in specific column

Essentially I have a basic dataframe, within this dataframe there is a 'Streaming Service' column. I want to limit the results to the first 5 records for each service provider. In other words I want to limit this dataframe from possibly thousands of records of shows to just the last 5 of each Streaming service.

import pandas as pd
import numpy as np

data = {'Show Name': ['GameOfThrones', 'StrangerThings', 'Casual', ...], 
        'Streaming Service': ['HBO', 'Netflix', 'Hulu']}
df1 = pd.DataFrame(data)

What's the best approach to doing this?

df1.groupby('Streaming Service').head(5)

I ended up coming up with my own solution. Problem was over complicated:

service_dfs = []

for c in df['Streaming Service'].unique():
    df_c = df.loc[df[ 'Streaming Service'] == c].tail(100)
    service_dfs.append(df_c)
df = pd.concat(service_dfs)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM