按特定列中的值频率限制 DataFrame 行

Question

Essentially I have a basic dataframe, within this dataframe there is a 'Streaming Service' column.基本上我有一个基本的 dataframe，在这个 dataframe 中有一个“流媒体服务”列。 I want to limit the results to the first 5 records for each service provider.我想将结果限制为每个服务提供商的前 5 条记录。 In other words I want to limit this dataframe from possibly thousands of records of shows to just the last 5 of each Streaming service.换句话说，我想将这个 dataframe 从可能的数千条节目记录限制到每个流媒体服务的最后 5 条。

import pandas as pd
import numpy as np

data = {'Show Name': ['GameOfThrones', 'StrangerThings', 'Casual', ...], 
        'Streaming Service': ['HBO', 'Netflix', 'Hulu']}
df1 = pd.DataFrame(data)

What's the best approach to doing this?这样做的最佳方法是什么？

Answer 1

df1.groupby('Streaming Service').head(5)

Answer 2

I ended up coming up with my own solution.我最终想出了自己的解决方案。 Problem was over complicated:问题过于复杂：

service_dfs = []

for c in df['Streaming Service'].unique():
    df_c = df.loc[df[ 'Streaming Service'] == c].tail(100)
    service_dfs.append(df_c)
df = pd.concat(service_dfs)

按特定列中的值频率限制 DataFrame 行

问题描述

2 个解决方案

解决方案1
0 2022-09-08 23:11:20

解决方案2
0 2022-09-09 02:43:34

按特定列中的值频率限制 DataFrame 行

问题描述

2 个解决方案

解决方案1 0 2022-09-08 23:11:20

解决方案2 0 2022-09-09 02:43:34

解决方案1
0 2022-09-08 23:11:20

解决方案2
0 2022-09-09 02:43:34