简体   繁体   English

如何从 Python 数据框中的每个类别中获取前 n 条记录?

[英]How to get top n records from each category in a Python dataframe?

The data is sorted in descending order on column 'id' in the following dataframe -数据在以下数据框中的“id”列上按降序排序 -

id   Name     version     copies   price
6    MSFT       10.0        5       100   
6    TSLA       10.0        10      200
6    ORCL       10.0        15      300

5    MSFT       10.0        20      400
5    TSLA       10.0        25      500
5    ORCL       10.0        30      600

4    MSFT       10.0        35      700
4    TSLA       10.0        40      800
4    ORCL       10.0        45      900

3    MSFT       5.0         50      1000 
3    TSLA       5.0         55      1100
3    ORCL       5.0         60      1200

2    MSFT       5.0         65      1300
2    TSLA       5.0         70      1400
2    ORCL       5.0         75      1500

1    MSFT       15.0        80      1600
1    TSLA       15.0        85      1700
1    ORCL       15.0        90      1800
...

Based on the input 'n', I would like to filter above data such that, if input is '2', the resulting dataframe should look like -基于输入'n',我想过滤上面的数据,如果输入是'2',结果数据框应该看起来像 -

Name     version     copies   price
MSFT       10.0        5       100   
TSLA       10.0        10      200
ORCL       10.0        15      300

MSFT       10.0        20      400
TSLA       10.0        25      500
ORCL       10.0        30      600

MSFT       5.0         50      1000 
TSLA       5.0         55      1100
ORCL       5.0         60      1200

MSFT       5.0         65      1300
TSLA       5.0         70      1400
ORCL       5.0         75      1500

MSFT       15.0        80      1600
TSLA       15.0        85      1700
ORCL       15.0        90      1800

Basically, only the top 'n' groups of 'id' for a specific version should be present in the resulting dataframe.基本上,只有特定版本的“id”的前“n”组应该出现在结果数据框中。 If a version has id's < n (eg in version 15.0 there is only one group with id = 1), then all the groups of id's should be present.如果一个版本的 id < n(例如,在 15.0 版中只有一个组的 id = 1),那么所有组的 id 都应该存在。

I tried using groupy and head , but it didn't work for me.我尝试使用groupyhead ,但它对我不起作用。 I absolutely have no other clue in getting this to work.我绝对没有其他线索可以让这个工作。

I really appreciate any help with this, thank you.我非常感谢您对此的任何帮助,谢谢。

you can use groupby.transform on the column version, and factorize the column id to have an incremental value (from 0 to ...) for each id per group, then compare to your n and use loc with this mask to select the wanted rows.您可以在列版本上使用groupby.transform ,并将列 id factorize为每个组的每个 id 的增量值(从 0 到 ...),然后与您的 n 进行比较并使用带有此掩码的loc来选择想要的行。

n = 2
print(df.loc[df.groupby('version')['id'].transform(lambda x: pd.factorize(x)[0])<n])
    id  Name  version  copies  price
0    6  MSFT     10.0       5    100
1    6  TSLA     10.0      10    200
2    6  ORCL     10.0      15    300
3    5  MSFT     10.0      20    400
4    5  TSLA     10.0      25    500
5    5  ORCL     10.0      30    600
9    3  MSFT      5.0      50   1000
10   3  TSLA      5.0      55   1100
11   3  ORCL      5.0      60   1200
12   2  MSFT      5.0      65   1300
13   2  TSLA      5.0      70   1400
14   2  ORCL      5.0      75   1500
15   1  MSFT     15.0      80   1600
16   1  TSLA     15.0      85   1700
17   1  ORCL     15.0      90   1800

Another option is to use groupby.head once you drop_duplicated to keep unique version-id couples.另一种选择是使用groupby.head一旦你drop_duplicated保留唯一版本-ID夫妇。 then use select version-id in a merge .然后在merge使用 select version-id 。

df.merge(df[['version','id']].drop_duplicates().groupby('version').head(n))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Sklearn - 按类别分组并从 dataframe 的每个类别中获取前 n 个单词? - Sklearn - group by category and get top n words from each category of dataframe? 为每个类别选择前 10 条记录 python - Select top 10 records for each category python 如何从 dataframe 中的每个类别中获得 10,000 多个条目的前 10 个单词? - How to get top 10 words from each category in dataframe with 10,000+ entries? [Pandas]如何在每个组中获得前n%的记录 - [Pandas]how to get top-n% records within each group DataFrame:获取每种类型的前n个值 - DataFrame : Get the top n value of each type 如何将每个类别的前 2 个项目放入未堆叠的条形图中? - How to get Top 2 items from each category into an unstacked bar chart? 如何从数据框中获取每个类别中的唯一元素及其计数? - How to get unique elements and their counts in each category from a dataframe? 如何从 DataFrame 中找到前 N 个最小值,Python-3 - How to find top N minimum values from the DataFrame, Python-3 Pandas groupby 类别,评级,从每个类别中获得最高价值? - Pandas groupby category, rating, get top value from each category? 从 Pandas dataframe 的每一行中获取前 N 个值及其各自的列名 - Get the top N values from each row of a Pandas dataframe with their respective Column names
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM