简体   繁体   中英

Pandas: Filter grouped Series by count of occurencies

I have a dataframe, which I use groupby on for further data aggregation:

import pandas as pd
test_df = pd.DataFrame(data={"id": [1,2,2,3,3], "review_id": [1,2,3,4,5], "text": ["good", "bad", "nice", "awesome", "dont buy"]})
grouped_df = test_df.groupby(by=["id", "review_id"]).apply(lambda x: [x["text"]])

Which give me the following series:

    id  review_id
1   1                [[good]]
2   2                 [[bad]]
    3                [[nice]]
3   4             [[awesome]]
    5            [[dont buy]]
dtype: object

Now I need a way, how I can further reduce this series, as I only want ids with more than 1 review. so I want the id 1 to be dropped. I just dont know how I could use aggregate() or apply() for this task.

How can I achieve this?

Let us do transform

out = grouped_df[grouped_df.groupby(level=0).transform('size')>1]
id  review_id
2   2                 [[bad]]
    3                [[nice]]
3   4             [[awesome]]
    5            [[dont buy]]
dtype: object

Or let us do duplicated

out = grouped_df[grouped_df.index.get_level_values(0).duplicated(keep=False)]
id  review_id
2   2                 [[bad]]
    3                [[nice]]
3   4             [[awesome]]
    5            [[dont buy]]
dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM