简体   繁体   English

根据一个列值在另一列中包含多个类别值拆分 dataframe

[英]Split dataframe based on one column value containing multiple category values in another column

I am trying to create two dataframes from the following data:我正在尝试从以下数据创建两个数据框:

df = pd.DataFrame({'Product':['Prod1','Prod2','Prod3','Prod2','Prod5','Prod3']*4, 
                  'Inv_Type': ['X', 'Y']*12,
                 'Quant': np.random.randint(2,20, size=24)})

df.sort_values('Product', inplace=True, ignore_index=True) --Help with visual

They need to be separated based on whether the Products have both an X and Y associated with them or just all X's or all Y's.它们需要根据产品是否同时具有与它们相关联的 X 和 Y 或只有所有 X 或所有 Y 来区分。

Desired Output:所需的 Output:

df1 = df[df['Product'] == 'Prod3']
df2 = df[df['Product'].str.contains('Prod1|Prod2|Prod5', na=False)]

I have tried numerous groupby attempts with filters, but I am obviously missing something.我已经尝试过无数次使用过滤器的 groupby 尝试,但我显然遗漏了一些东西。

m = df.groupby("Product")["Inv_Type"].transform(lambda x: len(x.unique()) == 1)

df1 = df[~m]
df2 = df[m]
print(df1)
print(df2)

Prints:印刷:

   Product Inv_Type  Quant
12   Prod3        X      4
13   Prod3        Y     18
14   Prod3        Y     11
15   Prod3        X      5
16   Prod3        Y      5
17   Prod3        X      3
18   Prod3        X     16
19   Prod3        Y     11

   Product Inv_Type  Quant
0    Prod1        X      5
1    Prod1        X      6
2    Prod1        X      8
3    Prod1        X     17
4    Prod2        Y      3
5    Prod2        Y     13
6    Prod2        Y      9
7    Prod2        Y      8
8    Prod2        Y      7
9    Prod2        Y      5
10   Prod2        Y     18
11   Prod2        Y     11
20   Prod5        X      4
21   Prod5        X     15
22   Prod5        X     10
23   Prod5        X      6

you can create a custom boolean to groupby and create two separate data frames inside a dictionary.您可以创建自定义groupby来分组并在字典中创建两个单独的数据框。 Assuming that there are only two values in your Inv_Type so we can use nunique to fidn any group that has more than one value.假设您的Inv_Type中只有两个值,因此我们可以使用nunique来查找具有多个值的任何组。

dfs = {int(grp) : data for grp,data 
          in df.groupby([df.groupby('Product')['Inv_Type'].transform('nunique') > 1])}


print(dfs[1])

   Product Inv_Type  Quant
12   Prod3        X      2
13   Prod3        Y     12
14   Prod3        Y      2
15   Prod3        X     19
16   Prod3        Y      6
17   Prod3        X      5
18   Prod3        X      4
19   Prod3        Y     13

print(dfs[0])

   Product Inv_Type  Quant
0    Prod1        X     16
1    Prod1        X     13
2    Prod1        X      8
3    Prod1        X     16
4    Prod2        Y     14
5    Prod2        Y     10
6    Prod2        Y      4
7    Prod2        Y     13
8    Prod2        Y      7
9    Prod2        Y     16
10   Prod2        Y     13
11   Prod2        Y     11
20   Prod5        X     11
21   Prod5        X     10
22   Prod5        X     13
23   Prod5        X     10

We can also do it with boolean mask and Pandas built-in aggregate function (for better execution speed) instead of custom lambda function (which is not optimized and slow), as follows: We can also do it with boolean mask and Pandas built-in aggregate function (for better execution speed) instead of custom lambda function (which is not optimized and slow), as follows:

mask = df.groupby("Product")["Inv_Type"].transform('nunique') > 1
df1 = df[mask]
df2 = df[~mask]

Result:结果:

print(df1)


   Product Inv_Type  Quant
12   Prod3        X     15
13   Prod3        Y     19
14   Prod3        Y     16
15   Prod3        X     12
16   Prod3        Y      9
17   Prod3        X      8
18   Prod3        X      8
19   Prod3        Y      7



print(df2)


   Product Inv_Type  Quant
0    Prod1        X     17
1    Prod1        X     12
2    Prod1        X      9
3    Prod1        X      9
4    Prod2        Y      2
5    Prod2        Y     16
6    Prod2        Y     16
7    Prod2        Y      9
8    Prod2        Y     17
9    Prod2        Y     12
10   Prod2        Y     12
11   Prod2        Y     13
20   Prod5        X      2
21   Prod5        X     19
22   Prod5        X     16
23   Prod5        X     18

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据值将一个数据帧拆分为多个具有相同列标题的数据帧 - Split one dataframe into multiple dataframes with same column header based on values 根据另一个 dataframe 的列值打印一个 dataframe 的列值 - print column values of one dataframe based on the column values of another dataframe 根据多列索引将值从一个 dataframe 复制到另一个 - Copy value from one dataframe to another based on multiple column index 根据在另一个 dataframe 中匹配/包含特定列的值过滤 dataframe - Filter a dataframe based on values matching/containing in particular column in another dataframe 基于具有正则表达式和 lambda 的另一列值拆分列 dataframe 中的文本 - Split text in column dataframe based on another column value with regex and lambda 必须根据另一列中的长度值拆分数据框列 - Have to split dataframe column based on length value in another column 根据条件将一个 dataframe 列的值分配给另一个 dataframe 列 - assign values of one dataframe column to another dataframe column based on condition 基于python中另一个数据框的列值拆分数据框 - Split dataframe based on column values of another dataframe in python 一个 dataframe 列与另一个 dataframe 列的倍数基于条件 - One dataframe column multiple with another dataframe column based on condition 如果值出现在基于一个列值的另一个数据框中,则更改一个数据框中的值 - Change values in one dataframe if values appear in another dataframe based on one column value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM