简体   繁体   English

DataFrame:获取每种类型的前n个值

[英]DataFrame : Get the top n value of each type

I have a group of data like below 我有一组如下数据

ID  Type    value_1 value_2
1   A   12  89
2   A   13  78
3   A   11  92
4   A   9   79
5   B   15  83
6   B   34  91
7   B   2   87
8   B   3   86
9   B   7   85
10  C   9   83
11  C   3   85
12  C   2   87
13  C   12  88
14  C   11  82

I want to get the top 3 member of each Type according to the value_1 . 我想根据value_1获取每个Type的前3名成员。 The only solution occurs to me is that: first , get each Type data into a dataframe and sorted according to the value_1 and get the top 3; 我遇到的唯一解决方案是:首先,将每个Type数据放入一个数据帧,然后根据value_1进行排序,得到前3个; Then, merge the result together. 然后,将结果合并在一起。 But is ther any simple method to solve it ? 但是,有任何简单的方法可以解决它吗? For easy discuss , I have codes below 为便于讨论,我的代码如下

#coding:utf-8
import pandas as pd
_data = [
    ["1","A",12,89],
    ["2","A",13,78],
    ["3","A",11,92],
    ["4","A",9,79],
    ["5","B",15,83],
    ["6","B",34,91],
    ["7","B",2,87],
    ["8","B",3,86],
    ["9","B",7,85],
    ["10","C",9,83],
    ["11","C",3,85],
    ["12","C",2,87],
    ["13","C",12,88],
    ["14","C",11,82]
]
head= ["ID","type","value_1","value_2"]
df = pd.DataFrame(_data, columns=head)

Then we using groupby tail with sort_values 然后我们使用groupby tailsort_values

newdf=df.sort_values(['type','value_1']).groupby('type').tail(3)
newer
    ID type  value_1  value_2
2    3    A       11       92
0    1    A       12       89
1    2    A       13       78
8    9    B        7       85
4    5    B       15       83
5    6    B       34       91
9   10    C        9       83
13  14    C       11       82
12  13    C       12       88

Sure! 当然! DataFrame.groupby can split a dataframe into different parts by the group fields and apply function can apply UDF on each group. DataFrame.groupby可以通过组字段apply数据帧拆分为不同的部分, apply函数可以在每个组上应用UDF。

df.groupby('type', as_index=False, group_keys=False)\
    .apply(lambda x: x.sort_values('value_1', ascending=False).head(3))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Python 数据框中的每个类别中获取前 n 条记录? - How to get top n records from each category in a Python dataframe? 检索pyspark中a DataFrame的每组中的top n - Retrieve top n in each group of a DataFrame in pyspark 在每个pandas数据帧行中查找top-n最高值列的名称 - Find names of top-n highest-value columns in each pandas dataframe row Sklearn - 按类别分组并从 dataframe 的每个类别中获取前 n 个单词? - Sklearn - group by category and get top n words from each category of dataframe? 从 Pandas dataframe 的每一行中获取前 N 个值及其各自的列名 - Get the top N values from each row of a Pandas dataframe with their respective Column names 在数据框中的每一行中获取前 n 个值和它们出现的列的名称 - Get both the top-n values and the names of columns they occur in, within each row in dataframe 将每个数据帧值复制 n 次 - Duplicating each dataframe value n times 对列进行排序并在每个组中选择前n行pandas数据帧 - Sorting columns and selecting top n rows in each group pandas dataframe 如何仅保留熊猫数据帧每组的前n%行? - How to keep only the top n% rows of each group of a pandas dataframe? pandas DataFrame 中每一行的前 n 个值的列名 - Columns names of top n values of each row in a pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM