简体   繁体   English

在 pandas 中查找 3 列的可能唯一组合

[英]Find possible unique combinations of 3 columns in pandas

I am trying to find all possible combinations of 3 variable columns in pandas.我试图在 pandas 中找到 3 个变量列的所有可能组合。 The sample df looks as such:示例 df 如下所示:

          Variable_Name Variable1 Variable2 Variable3
0                  X      6.0%      8.0%     10.0%
1                  Y      3.0%      4.0%      5.0%
2                  Z      1.0%      3.0%      5.0%

These combinations must only take values from that column and not move values to other columns, eg Using 4.0% as a 'X' would be incorrect.这些组合只能从该列获取值,而不能将值移动到其他列,例如,使用 4.0% 作为“X”是不正确的。

Tried to use itertools.combinations , itertools.product , itertools.permutation but these results give ALL possible combinations.尝试使用itertools.combinationsitertools.productitertools.permutation但这些结果给出了所有可能的组合。

I would want the results to look like this, giving 27 possible combinations:我希望结果看起来像这样,给出 27 种可能的组合:

     Y      X     Z
0   3.0%   6.0%  1.0%
1   3.0%   6.0%  3.0%
2   3.0%   6.0%  5.0%
3   3.0%   8.0%  1.0%
4   3.0%   8.0%  3.0%
5   3.0%   8.0%  5.0%
6   3.0%  10.0%  1.0%
7   3.0%  10.0%  3.0%
8   3.0%  10.0%  5.0%
9   4.0%   8.0%  3.0%
10  4.0%   8.0%  1.0%
11  4.0%   8.0%  5.0%
12  4.0%   6.0%  1.0%
13  4.0%   6.0%  3.0%
14  4.0%   6.0%  5.0%
15  4.0%  10.0%  1.0%
16  4.0%  10.0%  3.0%
17  4.0%  10.0%  5.0%
18  5.0%  10.0%  5.0%
19  5.0%  10.0%  1.0%
20  5.0%  10.0%  3.0%
21  5.0%   8.0%  1.0%
22  5.0%   8.0%  3.0%
23  5.0%   8.0%  5.0%
24  5.0%   6.0%  1.0%
25  5.0%   6.0%  3.0%
26  5.0%   6.0%  5.0%

Any help will be appreciated.任何帮助将不胜感激。

Let's try successively cross merging each variable's values:让我们尝试连续交叉合并每个变量的值:

from functools import reduce

import pandas as pd

df = pd.DataFrame({'Variable_Name': {0: 'X', 1: 'Y', 2: 'Z'},
                   'Variable1': {0: '6.0%', 1: '3.0%', 2: '1.0%'},
                   'Variable2': {0: '8.0%', 1: '4.0%', 2: '3.0%'},
                   'Variable3': {0: '10.0%', 1: '5.0%', 2: '5.0%'}})

# Save Var Names for later
var_names = df['Variable_Name']

# Get Variables Options in Own Rows
new_df = df.set_index('Variable_Name').stack() \
    .droplevel(1, 0) \
    .reset_index()

# Get Collection of DataFrames each with its own variable
dfs = tuple(new_df[new_df['Variable_Name'].eq(v)]
            .drop(columns=['Variable_Name']) for v in var_names)
# Successive Cross Merges
new_df = reduce(lambda left, right: pd.merge(left, right, how='cross'), dfs)
# Fix Column Names
new_df.columns = var_names
# Fix Axis Names
new_df = new_df.rename_axis(None, axis=1)

# For Display
print(new_df.to_string())

Output: Output:

        X     Y     Z
0    6.0%  3.0%  1.0%
1    6.0%  3.0%  3.0%
2    6.0%  3.0%  5.0%
3    6.0%  4.0%  1.0%
4    6.0%  4.0%  3.0%
5    6.0%  4.0%  5.0%
6    6.0%  5.0%  1.0%
7    6.0%  5.0%  3.0%
8    6.0%  5.0%  5.0%
9    8.0%  3.0%  1.0%
10   8.0%  3.0%  3.0%
11   8.0%  3.0%  5.0%
12   8.0%  4.0%  1.0%
13   8.0%  4.0%  3.0%
14   8.0%  4.0%  5.0%
15   8.0%  5.0%  1.0%
16   8.0%  5.0%  3.0%
17   8.0%  5.0%  5.0%
18  10.0%  3.0%  1.0%
19  10.0%  3.0%  3.0%
20  10.0%  3.0%  5.0%
21  10.0%  4.0%  1.0%
22  10.0%  4.0%  3.0%
23  10.0%  4.0%  5.0%
24  10.0%  5.0%  1.0%
25  10.0%  5.0%  3.0%
26  10.0%  5.0%  5.0%

You can use CROSS JOIN .您可以使用CROSS JOIN In pandas you can use pd.merge() or pd.DataFrame.join() with parameter how='cross' .在 pandas 中,您可以使用pd.merge()pd.DataFrame.join()参数how='cross' But before cross joining you need to place each variable in a dataframe with long (unpivoted) format (Your table is in a wide format (pivoted)).但在交叉加入之前,您需要将每个变量放在 dataframe 中,采用长(非透视)格式(您的表格是宽格式(透视))。

df_X = df.loc[df['Variable_Name'] == 'X', ['Variable1', 'Variable2', 'Variable3']].T
df_Y = df.loc[df['Variable_Name'] == 'Y', ['Variable1', 'Variable2', 'Variable3']].T
df_Z = df.loc[df['Variable_Name'] == 'Z', ['Variable1', 'Variable2', 'Variable3']].T

cross_join_df = df_X.join(df_Y, how='cross').join(df_Z, how='cross')
cross_join_df.columns = ['X','Y','Z']

If you need to use the code in a loop, it would be like this.如果你需要在循环中使用代码,它会是这样的。

variables = df['Variable_Name'].unique()
columns_to_cross = ['Variable1', 'Variable2', 'Variable3']
cross_join_df = df.loc[df['Variable_Name'] == variables[0], columns_to_cross].T
for var in variables[1:]:
    to_join_df = df.loc[df['Variable_Name'] == var, columns_to_cross].T
    cross_join_df = pd.merge(cross_join_df, to_join_df, how='cross')
cross_join_df.columns = variables

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 pandas groupby 查找列的唯一组合并另存为 df - Use pandas groupby to find unique combinations of columns and save as df 如何找到2列的唯一组合,删除不唯一的组合,仅在熊猫中保留第一行 - How can i find unique combinations of 2 columns, delete not unique combinations, keeping only first rows in pandas Python Pandas - 查找DataFrame行的所有唯一组合,而不重复列中的值 - Python Pandas - find all unique combinations of rows of a DataFrame without repeating values in the columns python pandas,试图找到两列的独特组合并在对第三列求和的同时进行合并 - python pandas, trying to find unique combinations of two columns and merging while summing a third column DataFrame的所有可能的列组合 - pandas / python - All possible combinations of columns of a DataFrame - pandas / python Pandas 两列所有可能组合 - Pandas all possible combinations of two columns 列表中 pandas 列的可能组合 - Possible combinations of pandas columns from a list pandas DataFrame 中所有可能的列和行的组合 - All possible combinations of columns and rows in pandas DataFrame 如何创建 pandas 列的所有可能组合? - How to create all possible combinations of pandas columns? 在 pandas 数据框中查找所有可能的组合并求和 - To find all possible combinations and sum in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM