[英]Use pandas groupby to find unique combinations of columns and save as df
[英]Find possible unique combinations of 3 columns in pandas
我試圖在 pandas 中找到 3 個變量列的所有可能組合。 示例 df 如下所示:
Variable_Name Variable1 Variable2 Variable3
0 X 6.0% 8.0% 10.0%
1 Y 3.0% 4.0% 5.0%
2 Z 1.0% 3.0% 5.0%
這些組合只能從該列獲取值,而不能將值移動到其他列,例如,使用 4.0% 作為“X”是不正確的。
嘗試使用itertools.combinations
、 itertools.product
、 itertools.permutation
但這些結果給出了所有可能的組合。
我希望結果看起來像這樣,給出 27 種可能的組合:
Y X Z
0 3.0% 6.0% 1.0%
1 3.0% 6.0% 3.0%
2 3.0% 6.0% 5.0%
3 3.0% 8.0% 1.0%
4 3.0% 8.0% 3.0%
5 3.0% 8.0% 5.0%
6 3.0% 10.0% 1.0%
7 3.0% 10.0% 3.0%
8 3.0% 10.0% 5.0%
9 4.0% 8.0% 3.0%
10 4.0% 8.0% 1.0%
11 4.0% 8.0% 5.0%
12 4.0% 6.0% 1.0%
13 4.0% 6.0% 3.0%
14 4.0% 6.0% 5.0%
15 4.0% 10.0% 1.0%
16 4.0% 10.0% 3.0%
17 4.0% 10.0% 5.0%
18 5.0% 10.0% 5.0%
19 5.0% 10.0% 1.0%
20 5.0% 10.0% 3.0%
21 5.0% 8.0% 1.0%
22 5.0% 8.0% 3.0%
23 5.0% 8.0% 5.0%
24 5.0% 6.0% 1.0%
25 5.0% 6.0% 3.0%
26 5.0% 6.0% 5.0%
任何幫助將不勝感激。
讓我們嘗試連續交叉合並每個變量的值:
from functools import reduce
import pandas as pd
df = pd.DataFrame({'Variable_Name': {0: 'X', 1: 'Y', 2: 'Z'},
'Variable1': {0: '6.0%', 1: '3.0%', 2: '1.0%'},
'Variable2': {0: '8.0%', 1: '4.0%', 2: '3.0%'},
'Variable3': {0: '10.0%', 1: '5.0%', 2: '5.0%'}})
# Save Var Names for later
var_names = df['Variable_Name']
# Get Variables Options in Own Rows
new_df = df.set_index('Variable_Name').stack() \
.droplevel(1, 0) \
.reset_index()
# Get Collection of DataFrames each with its own variable
dfs = tuple(new_df[new_df['Variable_Name'].eq(v)]
.drop(columns=['Variable_Name']) for v in var_names)
# Successive Cross Merges
new_df = reduce(lambda left, right: pd.merge(left, right, how='cross'), dfs)
# Fix Column Names
new_df.columns = var_names
# Fix Axis Names
new_df = new_df.rename_axis(None, axis=1)
# For Display
print(new_df.to_string())
Output:
X Y Z 0 6.0% 3.0% 1.0% 1 6.0% 3.0% 3.0% 2 6.0% 3.0% 5.0% 3 6.0% 4.0% 1.0% 4 6.0% 4.0% 3.0% 5 6.0% 4.0% 5.0% 6 6.0% 5.0% 1.0% 7 6.0% 5.0% 3.0% 8 6.0% 5.0% 5.0% 9 8.0% 3.0% 1.0% 10 8.0% 3.0% 3.0% 11 8.0% 3.0% 5.0% 12 8.0% 4.0% 1.0% 13 8.0% 4.0% 3.0% 14 8.0% 4.0% 5.0% 15 8.0% 5.0% 1.0% 16 8.0% 5.0% 3.0% 17 8.0% 5.0% 5.0% 18 10.0% 3.0% 1.0% 19 10.0% 3.0% 3.0% 20 10.0% 3.0% 5.0% 21 10.0% 4.0% 1.0% 22 10.0% 4.0% 3.0% 23 10.0% 4.0% 5.0% 24 10.0% 5.0% 1.0% 25 10.0% 5.0% 3.0% 26 10.0% 5.0% 5.0%
您可以使用CROSS JOIN 。 在 pandas 中,您可以使用pd.merge()
或pd.DataFrame.join()
參數how='cross'
。 但在交叉加入之前,您需要將每個變量放在 dataframe 中,采用長(非透視)格式(您的表格是寬格式(透視))。
df_X = df.loc[df['Variable_Name'] == 'X', ['Variable1', 'Variable2', 'Variable3']].T
df_Y = df.loc[df['Variable_Name'] == 'Y', ['Variable1', 'Variable2', 'Variable3']].T
df_Z = df.loc[df['Variable_Name'] == 'Z', ['Variable1', 'Variable2', 'Variable3']].T
cross_join_df = df_X.join(df_Y, how='cross').join(df_Z, how='cross')
cross_join_df.columns = ['X','Y','Z']
如果你需要在循環中使用代碼,它會是這樣的。
variables = df['Variable_Name'].unique()
columns_to_cross = ['Variable1', 'Variable2', 'Variable3']
cross_join_df = df.loc[df['Variable_Name'] == variables[0], columns_to_cross].T
for var in variables[1:]:
to_join_df = df.loc[df['Variable_Name'] == var, columns_to_cross].T
cross_join_df = pd.merge(cross_join_df, to_join_df, how='cross')
cross_join_df.columns = variables
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.