如何根据 groupby 列表的多个值对 dataframe 进行子集化

Question

I have a dataframe like as below我有一个 dataframe，如下所示

ID,color
1, Yellow
1, Red
1, Green
2, Red
2, np.nan
3, Green
3, Red
3, Green
4, Yellow
4, Red
5, Green
5, np.nan
6, Red
7, Red
8, Green
8, Yellow

fd = pd.read_clipboard(sep=',')
fd = fd.groupby('ID',as_index=False)['color'].aggregate(lambda x: list(x))

As you can see in the input dataframe, some ID's have multiple colors associated to them.正如您在输入 dataframe 中看到的，一些 ID 有多个关联的 colors。

Now, I would like to create a subset of dataframe with ID's that have both Yellow and Green现在，我想创建一个 dataframe 的子集，其 ID 同时具有Yellow和Green

So, I tried the below and got the list of colors for each ID因此，我尝试了以下操作并获得了每个 ID 的 colors 列表

fd.groupby('ID',as_index=False)['color'].aggregate(lambda x: list(x))

I would like to check for values like Yellow and Green in the groupby list and then subset the dataframe我想检查 groupby 列表中的Yellow和Green等值，然后子集 dataframe

I expect my output to be like as shown below (only two IDs have Yellow and Green together)我希望我的 output 如下所示（只有两个 ID 有黄色和绿色）

update更新

input dataframe looks like below输入 dataframe 如下所示

Answer 1

Filter the rows having color as Yellow or Green, then group the dataframe on ID and transform color with nunique to check the ID having 2 unique color.过滤颜色为黄色或绿色的行，然后将 dataframe 按ID分组并使用nunique转换颜色以检查具有 2 个唯一颜色的ID 。

s = df[df['color'].isin(['Yellow', 'Green'])]
s.loc[s.groupby('ID')['color'].transform('nunique').eq(2), 'ID']

Result结果

0     1
2     1
14    8
15    8
Name: ID, dtype: int64

Update as per the new requirements, here I'm assuming df1 is the input dataframe obtained after groupby :根据新要求更新，这里我假设df1是groupby后获得的输入 dataframe ：

s = pd.DataFrame([*df1['color']])
df1[s.mask(~s.isin(['Yellow', 'Green'])).nunique(1).eq(2)]

Result:结果：

   ID                 color
0   1  [Yellow, Red, Green]
7   8       [Green, Yellow]

Answer 2

From your input dataframe, you can use:通过输入 dataframe，您可以使用：

colors = ['Yellow', 'Green']
out = df[df['color'].apply(lambda x: set(x).issuperset(colors))]
print(out)

# Output
   ID                 color
0   1  [Yellow, Red, Green]
7   8       [Green, Yellow]

如何根据 groupby 列表的多个值对 dataframe 进行子集化

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-03-20 07:18:26

解决方案2
1 2022-03-20 07:41:48

如何根据 groupby 列表的多个值对 dataframe 进行子集化

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-03-20 07:18:26

解决方案2 1 2022-03-20 07:41:48

解决方案1
2 已采纳 2022-03-20 07:18:26

解决方案2
1 2022-03-20 07:41:48