[英]Split dataframe based on one column value containing multiple category values in another column
I am trying to create two dataframes from the following data:我正在尝试从以下数据创建两个数据框:
df = pd.DataFrame({'Product':['Prod1','Prod2','Prod3','Prod2','Prod5','Prod3']*4,
'Inv_Type': ['X', 'Y']*12,
'Quant': np.random.randint(2,20, size=24)})
df.sort_values('Product', inplace=True, ignore_index=True) --Help with visual
They need to be separated based on whether the Products have both an X and Y associated with them or just all X's or all Y's.它们需要根据产品是否同时具有与它们相关联的 X 和 Y 或只有所有 X 或所有 Y 来区分。
Desired Output:所需的 Output:
df1 = df[df['Product'] == 'Prod3']
df2 = df[df['Product'].str.contains('Prod1|Prod2|Prod5', na=False)]
I have tried numerous groupby attempts with filters, but I am obviously missing something.我已经尝试过无数次使用过滤器的 groupby 尝试,但我显然遗漏了一些东西。
m = df.groupby("Product")["Inv_Type"].transform(lambda x: len(x.unique()) == 1)
df1 = df[~m]
df2 = df[m]
print(df1)
print(df2)
Prints:印刷:
Product Inv_Type Quant
12 Prod3 X 4
13 Prod3 Y 18
14 Prod3 Y 11
15 Prod3 X 5
16 Prod3 Y 5
17 Prod3 X 3
18 Prod3 X 16
19 Prod3 Y 11
Product Inv_Type Quant
0 Prod1 X 5
1 Prod1 X 6
2 Prod1 X 8
3 Prod1 X 17
4 Prod2 Y 3
5 Prod2 Y 13
6 Prod2 Y 9
7 Prod2 Y 8
8 Prod2 Y 7
9 Prod2 Y 5
10 Prod2 Y 18
11 Prod2 Y 11
20 Prod5 X 4
21 Prod5 X 15
22 Prod5 X 10
23 Prod5 X 6
you can create a custom boolean to groupby
and create two separate data frames inside a dictionary.您可以创建自定义
groupby
来分组并在字典中创建两个单独的数据框。 Assuming that there are only two values in your Inv_Type
so we can use nunique
to fidn any group that has more than one value.假设您的
Inv_Type
中只有两个值,因此我们可以使用nunique
来查找具有多个值的任何组。
dfs = {int(grp) : data for grp,data
in df.groupby([df.groupby('Product')['Inv_Type'].transform('nunique') > 1])}
print(dfs[1])
Product Inv_Type Quant
12 Prod3 X 2
13 Prod3 Y 12
14 Prod3 Y 2
15 Prod3 X 19
16 Prod3 Y 6
17 Prod3 X 5
18 Prod3 X 4
19 Prod3 Y 13
print(dfs[0])
Product Inv_Type Quant
0 Prod1 X 16
1 Prod1 X 13
2 Prod1 X 8
3 Prod1 X 16
4 Prod2 Y 14
5 Prod2 Y 10
6 Prod2 Y 4
7 Prod2 Y 13
8 Prod2 Y 7
9 Prod2 Y 16
10 Prod2 Y 13
11 Prod2 Y 11
20 Prod5 X 11
21 Prod5 X 10
22 Prod5 X 13
23 Prod5 X 10
We can also do it with boolean mask and Pandas built-in aggregate function (for better execution speed) instead of custom lambda function (which is not optimized and slow), as follows: We can also do it with boolean mask and Pandas built-in aggregate function (for better execution speed) instead of custom lambda function (which is not optimized and slow), as follows:
mask = df.groupby("Product")["Inv_Type"].transform('nunique') > 1
df1 = df[mask]
df2 = df[~mask]
Result:结果:
print(df1)
Product Inv_Type Quant
12 Prod3 X 15
13 Prod3 Y 19
14 Prod3 Y 16
15 Prod3 X 12
16 Prod3 Y 9
17 Prod3 X 8
18 Prod3 X 8
19 Prod3 Y 7
print(df2)
Product Inv_Type Quant
0 Prod1 X 17
1 Prod1 X 12
2 Prod1 X 9
3 Prod1 X 9
4 Prod2 Y 2
5 Prod2 Y 16
6 Prod2 Y 16
7 Prod2 Y 9
8 Prod2 Y 17
9 Prod2 Y 12
10 Prod2 Y 12
11 Prod2 Y 13
20 Prod5 X 2
21 Prod5 X 19
22 Prod5 X 16
23 Prod5 X 18
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.