[英]Selecting a specific number of results from columns in Pandas
I have a larger data source where I'm looking to gather the User IDs (Column 'A') for a specific group of people based on the value in column 'B' and so I have created a new dataframe with the info that I need using:我有一个更大的数据源,我希望根据“B”列中的值收集特定人群的用户 ID(“A”列),因此我创建了一个新的 dataframe,其中包含我的信息需要使用:
df2 = df1[df1['B'].isin([8,9,9.5,10,11])]
Now I need to get the the first 40 values from col 'A' for value 8 in col 'B' and then the first 32 values from col 'A' for value 9 etc. etc. which i can do because my data is already sorted by the most relevant users - I just need to pick out X amount of them per the value in col 'B'现在我需要从 col 'A' 中获取前 40 个值以获得 col 'B' 中的值 8,然后从 col 'A' 中获取前 32 个值以获得值 9 等等。我可以这样做,因为我的数据已经是按最相关的用户排序 - 我只需要根据 col 'B' 中的值挑选 X 数量的用户
I want the output of that to be in this format ideally:我希望 output 理想情况下采用这种格式:
A B
ID1 8
ID2 8
. .
ID41 9
ID42 9
I thought of using this for example我想用这个例如
df2[(df2['B']== 8)][0:40]
but then i have to slice the dataframe X times to get all the User IDs for the values I need and there must be a quick way to specify the number of values from each column without slicing for each value in col 'B'但是然后我必须将 dataframe 切片 X 次以获得我需要的值的所有用户 ID,并且必须有一种快速的方法来指定每列中的值的数量,而无需对 col 'B' 中的每个值进行切片
Thanks in advance!提前致谢!
First we need build the condition map dict
, then just do groupby
with head
首先我们需要建立条件 map
dict
,然后用head
做groupby
d = {8:40,9:32}
out = df.groupby('B').apply(lambda x : x.head(d[x['B'].iloc[0]])).reset_index(drop=True)
Or try with cumcount
或尝试使用
cumcount
out = df[df.groupby('B').cumcount() < df.B.map(d)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.