从 Pandas 中的列中选择特定数量的结果

Question

I have a larger data source where I'm looking to gather the User IDs (Column 'A') for a specific group of people based on the value in column 'B' and so I have created a new dataframe with the info that I need using:我有一个更大的数据源，我希望根据“B”列中的值收集特定人群的用户 ID（“A”列），因此我创建了一个新的 dataframe，其中包含我的信息需要使用：

df2 = df1[df1['B'].isin([8,9,9.5,10,11])]

Now I need to get the the first 40 values from col 'A' for value 8 in col 'B' and then the first 32 values from col 'A' for value 9 etc. etc. which i can do because my data is already sorted by the most relevant users - I just need to pick out X amount of them per the value in col 'B'现在我需要从 col 'A' 中获取前 40 个值以获得 col 'B' 中的值 8，然后从 col 'A' 中获取前 32 个值以获得值 9 等等。我可以这样做，因为我的数据已经是按最相关的用户排序 - 我只需要根据 col 'B' 中的值挑选 X 数量的用户

I want the output of that to be in this format ideally:我希望 output 理想情况下采用这种格式：

 A   B 
ID1  8
ID2  8
. . 
ID41 9 
ID42 9

I thought of using this for example我想用这个例如

df2[(df2['B']== 8)][0:40]

but then i have to slice the dataframe X times to get all the User IDs for the values I need and there must be a quick way to specify the number of values from each column without slicing for each value in col 'B'但是然后我必须将 dataframe 切片 X 次以获得我需要的值的所有用户 ID，并且必须有一种快速的方法来指定每列中的值的数量，而无需对 col 'B' 中的每个值进行切片

Thanks in advance!提前致谢！

Answer 1

First we need build the condition map dict , then just do groupby with head首先我们需要建立条件 map dict ，然后用head做groupby

d = {8:40,9:32}

out = df.groupby('B').apply(lambda x : x.head(d[x['B'].iloc[0]])).reset_index(drop=True)

Or try with cumcount或尝试使用cumcount

out = df[df.groupby('B').cumcount() < df.B.map(d)]

从 Pandas 中的列中选择特定数量的结果

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-05-27 15:56:34

从 Pandas 中的列中选择特定数量的结果

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-05-27 15:56:34

解决方案1
2 已采纳 2021-05-27 15:56:34