[英]Find possible unique combinations of 3 columns in pandas
I am trying to find all possible combinations of 3 variable columns in pandas.我试图在 pandas 中找到 3 个变量列的所有可能组合。 The sample df looks as such:
示例 df 如下所示:
Variable_Name Variable1 Variable2 Variable3
0 X 6.0% 8.0% 10.0%
1 Y 3.0% 4.0% 5.0%
2 Z 1.0% 3.0% 5.0%
These combinations must only take values from that column and not move values to other columns, eg Using 4.0% as a 'X' would be incorrect.这些组合只能从该列获取值,而不能将值移动到其他列,例如,使用 4.0% 作为“X”是不正确的。
Tried to use itertools.combinations
, itertools.product
, itertools.permutation
but these results give ALL possible combinations.尝试使用
itertools.combinations
、 itertools.product
、 itertools.permutation
但这些结果给出了所有可能的组合。
I would want the results to look like this, giving 27 possible combinations:我希望结果看起来像这样,给出 27 种可能的组合:
Y X Z
0 3.0% 6.0% 1.0%
1 3.0% 6.0% 3.0%
2 3.0% 6.0% 5.0%
3 3.0% 8.0% 1.0%
4 3.0% 8.0% 3.0%
5 3.0% 8.0% 5.0%
6 3.0% 10.0% 1.0%
7 3.0% 10.0% 3.0%
8 3.0% 10.0% 5.0%
9 4.0% 8.0% 3.0%
10 4.0% 8.0% 1.0%
11 4.0% 8.0% 5.0%
12 4.0% 6.0% 1.0%
13 4.0% 6.0% 3.0%
14 4.0% 6.0% 5.0%
15 4.0% 10.0% 1.0%
16 4.0% 10.0% 3.0%
17 4.0% 10.0% 5.0%
18 5.0% 10.0% 5.0%
19 5.0% 10.0% 1.0%
20 5.0% 10.0% 3.0%
21 5.0% 8.0% 1.0%
22 5.0% 8.0% 3.0%
23 5.0% 8.0% 5.0%
24 5.0% 6.0% 1.0%
25 5.0% 6.0% 3.0%
26 5.0% 6.0% 5.0%
Any help will be appreciated.任何帮助将不胜感激。
Let's try successively cross merging each variable's values:让我们尝试连续交叉合并每个变量的值:
from functools import reduce
import pandas as pd
df = pd.DataFrame({'Variable_Name': {0: 'X', 1: 'Y', 2: 'Z'},
'Variable1': {0: '6.0%', 1: '3.0%', 2: '1.0%'},
'Variable2': {0: '8.0%', 1: '4.0%', 2: '3.0%'},
'Variable3': {0: '10.0%', 1: '5.0%', 2: '5.0%'}})
# Save Var Names for later
var_names = df['Variable_Name']
# Get Variables Options in Own Rows
new_df = df.set_index('Variable_Name').stack() \
.droplevel(1, 0) \
.reset_index()
# Get Collection of DataFrames each with its own variable
dfs = tuple(new_df[new_df['Variable_Name'].eq(v)]
.drop(columns=['Variable_Name']) for v in var_names)
# Successive Cross Merges
new_df = reduce(lambda left, right: pd.merge(left, right, how='cross'), dfs)
# Fix Column Names
new_df.columns = var_names
# Fix Axis Names
new_df = new_df.rename_axis(None, axis=1)
# For Display
print(new_df.to_string())
Output: Output:
X Y Z 0 6.0% 3.0% 1.0% 1 6.0% 3.0% 3.0% 2 6.0% 3.0% 5.0% 3 6.0% 4.0% 1.0% 4 6.0% 4.0% 3.0% 5 6.0% 4.0% 5.0% 6 6.0% 5.0% 1.0% 7 6.0% 5.0% 3.0% 8 6.0% 5.0% 5.0% 9 8.0% 3.0% 1.0% 10 8.0% 3.0% 3.0% 11 8.0% 3.0% 5.0% 12 8.0% 4.0% 1.0% 13 8.0% 4.0% 3.0% 14 8.0% 4.0% 5.0% 15 8.0% 5.0% 1.0% 16 8.0% 5.0% 3.0% 17 8.0% 5.0% 5.0% 18 10.0% 3.0% 1.0% 19 10.0% 3.0% 3.0% 20 10.0% 3.0% 5.0% 21 10.0% 4.0% 1.0% 22 10.0% 4.0% 3.0% 23 10.0% 4.0% 5.0% 24 10.0% 5.0% 1.0% 25 10.0% 5.0% 3.0% 26 10.0% 5.0% 5.0%
You can use CROSS JOIN .您可以使用CROSS JOIN 。 In pandas you can use
pd.merge()
or pd.DataFrame.join()
with parameter how='cross'
.在 pandas 中,您可以使用
pd.merge()
或pd.DataFrame.join()
参数how='cross'
。 But before cross joining you need to place each variable in a dataframe with long (unpivoted) format (Your table is in a wide format (pivoted)).但在交叉加入之前,您需要将每个变量放在 dataframe 中,采用长(非透视)格式(您的表格是宽格式(透视))。
df_X = df.loc[df['Variable_Name'] == 'X', ['Variable1', 'Variable2', 'Variable3']].T
df_Y = df.loc[df['Variable_Name'] == 'Y', ['Variable1', 'Variable2', 'Variable3']].T
df_Z = df.loc[df['Variable_Name'] == 'Z', ['Variable1', 'Variable2', 'Variable3']].T
cross_join_df = df_X.join(df_Y, how='cross').join(df_Z, how='cross')
cross_join_df.columns = ['X','Y','Z']
If you need to use the code in a loop, it would be like this.如果你需要在循环中使用代码,它会是这样的。
variables = df['Variable_Name'].unique()
columns_to_cross = ['Variable1', 'Variable2', 'Variable3']
cross_join_df = df.loc[df['Variable_Name'] == variables[0], columns_to_cross].T
for var in variables[1:]:
to_join_df = df.loc[df['Variable_Name'] == var, columns_to_cross].T
cross_join_df = pd.merge(cross_join_df, to_join_df, how='cross')
cross_join_df.columns = variables
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.