I have a dataframe, which I use groupby
on for further data aggregation:
import pandas as pd
test_df = pd.DataFrame(data={"id": [1,2,2,3,3], "review_id": [1,2,3,4,5], "text": ["good", "bad", "nice", "awesome", "dont buy"]})
grouped_df = test_df.groupby(by=["id", "review_id"]).apply(lambda x: [x["text"]])
Which give me the following series:
id review_id
1 1 [[good]]
2 2 [[bad]]
3 [[nice]]
3 4 [[awesome]]
5 [[dont buy]]
dtype: object
Now I need a way, how I can further reduce this series, as I only want ids with more than 1 review. so I want the id 1 to be dropped. I just dont know how I could use aggregate()
or apply()
for this task.
How can I achieve this?
Let us do transform
out = grouped_df[grouped_df.groupby(level=0).transform('size')>1]
id review_id
2 2 [[bad]]
3 [[nice]]
3 4 [[awesome]]
5 [[dont buy]]
dtype: object
Or let us do duplicated
out = grouped_df[grouped_df.index.get_level_values(0).duplicated(keep=False)]
id review_id
2 2 [[bad]]
3 [[nice]]
3 4 [[awesome]]
5 [[dont buy]]
dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.