简体   繁体   English

如何根据 groupby 列表的多个值对 dataframe 进行子集化

[英]How to subset a dataframe based on multiple values of groupby list

I have a dataframe like as below我有一个 dataframe,如下所示

ID,color
1, Yellow
1, Red
1, Green
2, Red
2, np.nan
3, Green
3, Red
3, Green
4, Yellow
4, Red
5, Green
5, np.nan
6, Red
7, Red
8, Green
8, Yellow

fd = pd.read_clipboard(sep=',')
fd = fd.groupby('ID',as_index=False)['color'].aggregate(lambda x: list(x))

As you can see in the input dataframe, some ID's have multiple colors associated to them.正如您在输入 dataframe 中看到的,一些 ID 有多个关联的 colors。

Now, I would like to create a subset of dataframe with ID's that have both Yellow and Green现在,我想创建一个 dataframe 的子集,其 ID 同时具有YellowGreen

So, I tried the below and got the list of colors for each ID因此,我尝试了以下操作并获得了每个 ID 的 colors 列表

fd.groupby('ID',as_index=False)['color'].aggregate(lambda x: list(x))

I would like to check for values like Yellow and Green in the groupby list and then subset the dataframe我想检查 groupby 列表中的YellowGreen等值,然后子集 dataframe

I expect my output to be like as shown below (only two IDs have Yellow and Green together)我希望我的 output 如下所示(只有两个 ID 有黄色和绿色)

ID
1
1
8
8

update更新

input dataframe looks like below输入 dataframe 如下所示

在此处输入图像描述

Filter the rows having color as Yellow or Green, then group the dataframe on ID and transform color with nunique to check the ID having 2 unique color.过滤颜色为黄色或绿色的行,然后将 dataframe 按ID分组并使用nunique转换颜色以检查具有 2 个唯一颜色的ID

s = df[df['color'].isin(['Yellow', 'Green'])]
s.loc[s.groupby('ID')['color'].transform('nunique').eq(2), 'ID']

Result结果

0     1
2     1
14    8
15    8
Name: ID, dtype: int64

Update as per the new requirements, here I'm assuming df1 is the input dataframe obtained after groupby :根据新要求更新,这里我假设df1groupby后获得的输入 dataframe :

s = pd.DataFrame([*df1['color']])
df1[s.mask(~s.isin(['Yellow', 'Green'])).nunique(1).eq(2)]

Result:结果:

   ID                 color
0   1  [Yellow, Red, Green]
7   8       [Green, Yellow]

From your input dataframe, you can use:通过输入 dataframe,您可以使用:

colors = ['Yellow', 'Green']
out = df[df['color'].apply(lambda x: set(x).issuperset(colors))]
print(out)

# Output
   ID                 color
0   1  [Yellow, Red, Green]
7   8       [Green, Yellow]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM