I am trying to find all possible combinations of 3 variable columns in pandas. The sample df looks as such:
Variable_Name Variable1 Variable2 Variable3
0 X 6.0% 8.0% 10.0%
1 Y 3.0% 4.0% 5.0%
2 Z 1.0% 3.0% 5.0%
These combinations must only take values from that column and not move values to other columns, eg Using 4.0% as a 'X' would be incorrect.
Tried to use itertools.combinations
, itertools.product
, itertools.permutation
but these results give ALL possible combinations.
I would want the results to look like this, giving 27 possible combinations:
Y X Z
0 3.0% 6.0% 1.0%
1 3.0% 6.0% 3.0%
2 3.0% 6.0% 5.0%
3 3.0% 8.0% 1.0%
4 3.0% 8.0% 3.0%
5 3.0% 8.0% 5.0%
6 3.0% 10.0% 1.0%
7 3.0% 10.0% 3.0%
8 3.0% 10.0% 5.0%
9 4.0% 8.0% 3.0%
10 4.0% 8.0% 1.0%
11 4.0% 8.0% 5.0%
12 4.0% 6.0% 1.0%
13 4.0% 6.0% 3.0%
14 4.0% 6.0% 5.0%
15 4.0% 10.0% 1.0%
16 4.0% 10.0% 3.0%
17 4.0% 10.0% 5.0%
18 5.0% 10.0% 5.0%
19 5.0% 10.0% 1.0%
20 5.0% 10.0% 3.0%
21 5.0% 8.0% 1.0%
22 5.0% 8.0% 3.0%
23 5.0% 8.0% 5.0%
24 5.0% 6.0% 1.0%
25 5.0% 6.0% 3.0%
26 5.0% 6.0% 5.0%
Any help will be appreciated.
Let's try successively cross merging each variable's values:
from functools import reduce
import pandas as pd
df = pd.DataFrame({'Variable_Name': {0: 'X', 1: 'Y', 2: 'Z'},
'Variable1': {0: '6.0%', 1: '3.0%', 2: '1.0%'},
'Variable2': {0: '8.0%', 1: '4.0%', 2: '3.0%'},
'Variable3': {0: '10.0%', 1: '5.0%', 2: '5.0%'}})
# Save Var Names for later
var_names = df['Variable_Name']
# Get Variables Options in Own Rows
new_df = df.set_index('Variable_Name').stack() \
.droplevel(1, 0) \
.reset_index()
# Get Collection of DataFrames each with its own variable
dfs = tuple(new_df[new_df['Variable_Name'].eq(v)]
.drop(columns=['Variable_Name']) for v in var_names)
# Successive Cross Merges
new_df = reduce(lambda left, right: pd.merge(left, right, how='cross'), dfs)
# Fix Column Names
new_df.columns = var_names
# Fix Axis Names
new_df = new_df.rename_axis(None, axis=1)
# For Display
print(new_df.to_string())
Output:
X Y Z 0 6.0% 3.0% 1.0% 1 6.0% 3.0% 3.0% 2 6.0% 3.0% 5.0% 3 6.0% 4.0% 1.0% 4 6.0% 4.0% 3.0% 5 6.0% 4.0% 5.0% 6 6.0% 5.0% 1.0% 7 6.0% 5.0% 3.0% 8 6.0% 5.0% 5.0% 9 8.0% 3.0% 1.0% 10 8.0% 3.0% 3.0% 11 8.0% 3.0% 5.0% 12 8.0% 4.0% 1.0% 13 8.0% 4.0% 3.0% 14 8.0% 4.0% 5.0% 15 8.0% 5.0% 1.0% 16 8.0% 5.0% 3.0% 17 8.0% 5.0% 5.0% 18 10.0% 3.0% 1.0% 19 10.0% 3.0% 3.0% 20 10.0% 3.0% 5.0% 21 10.0% 4.0% 1.0% 22 10.0% 4.0% 3.0% 23 10.0% 4.0% 5.0% 24 10.0% 5.0% 1.0% 25 10.0% 5.0% 3.0% 26 10.0% 5.0% 5.0%
You can use CROSS JOIN . In pandas you can use pd.merge()
or pd.DataFrame.join()
with parameter how='cross'
. But before cross joining you need to place each variable in a dataframe with long (unpivoted) format (Your table is in a wide format (pivoted)).
df_X = df.loc[df['Variable_Name'] == 'X', ['Variable1', 'Variable2', 'Variable3']].T
df_Y = df.loc[df['Variable_Name'] == 'Y', ['Variable1', 'Variable2', 'Variable3']].T
df_Z = df.loc[df['Variable_Name'] == 'Z', ['Variable1', 'Variable2', 'Variable3']].T
cross_join_df = df_X.join(df_Y, how='cross').join(df_Z, how='cross')
cross_join_df.columns = ['X','Y','Z']
If you need to use the code in a loop, it would be like this.
variables = df['Variable_Name'].unique()
columns_to_cross = ['Variable1', 'Variable2', 'Variable3']
cross_join_df = df.loc[df['Variable_Name'] == variables[0], columns_to_cross].T
for var in variables[1:]:
to_join_df = df.loc[df['Variable_Name'] == var, columns_to_cross].T
cross_join_df = pd.merge(cross_join_df, to_join_df, how='cross')
cross_join_df.columns = variables
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.