简体   繁体   English

从 Pandas 中的列中选择特定数量的结果

[英]Selecting a specific number of results from columns in Pandas

I have a larger data source where I'm looking to gather the User IDs (Column 'A') for a specific group of people based on the value in column 'B' and so I have created a new dataframe with the info that I need using:我有一个更大的数据源,我希望根据“B”列中的值收集特定人群的用户 ID(“A”列),因此我创建了一个新的 dataframe,其中包含我的信息需要使用:

df2 = df1[df1['B'].isin([8,9,9.5,10,11])] 

Now I need to get the the first 40 values from col 'A' for value 8 in col 'B' and then the first 32 values from col 'A' for value 9 etc. etc. which i can do because my data is already sorted by the most relevant users - I just need to pick out X amount of them per the value in col 'B'现在我需要从 col 'A' 中获取前 40 个值以获得 col 'B' 中的值 8,然后从 col 'A' 中获取前 32 个值以获得值 9 等等。我可以这样做,因为我的数据已经是按最相关的用户排序 - 我只需要根据 col 'B' 中的值挑选 X 数量的用户

I want the output of that to be in this format ideally:我希望 output 理想情况下采用这种格式:

 A   B 
ID1  8
ID2  8
. . 
ID41 9 
ID42 9

I thought of using this for example我想用这个例如

df2[(df2['B']== 8)][0:40]

but then i have to slice the dataframe X times to get all the User IDs for the values I need and there must be a quick way to specify the number of values from each column without slicing for each value in col 'B'但是然后我必须将 dataframe 切片 X 次以获得我需要的值的所有用户 ID,并且必须有一种快速的方法来指定每列中的值的数量,而无需对 col 'B' 中的每个值进行切片

Thanks in advance!提前致谢!

First we need build the condition map dict , then just do groupby with head首先我们需要建立条件 map dict ,然后用headgroupby

d = {8:40,9:32}

out = df.groupby('B').apply(lambda x : x.head(d[x['B'].iloc[0]])).reset_index(drop=True)

Or try with cumcount或尝试使用cumcount

out = df[df.groupby('B').cumcount() < df.B.map(d)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM