将 function 应用于每个列组合

Question

I have this dataframe of values (ignore the numbers)我有这个 dataframe 值（忽略数字）

| A | B || C | D |
| 5 | 7 ||15 | 9 |
|13 | 12||15 | 9 |
|15 | 9 ||15 | 9 |
| 5 | 7 ||15 | 9 |
|13 | 12||15 | 9 |

I would like to apply a function(y,x) where (y,x) refers to every possible pair of columns in the dataframe (where y is not the same as x), then save the output from the function into a dictionary.我想应用一个函数（y，x），其中（y，x）指的是 dataframe 中的每一对可能的列（其中 y 与 x 不同），然后将 output 从 ZC1C425Z074E68384F1111145Z074aC 保存到字典中。

dict = {
  "A_B": ["blah", "blah", "blah"]
  "A_C": ["blah", "blah", "blah"]
  "A_D': ["blah", "blah", "blah"]
}

The dictionary shld store the output ["blah", "blah", "blah"] as a list in the dictionary.字典将 output ["blah", "blah", "blah"] 存储为字典中的列表。

Is there any way I can do this?有什么办法可以做到这一点吗？

I suppose I have to make a for loop for every possible combination of the columns then apply the function to this, but I'm not sure how I can approach this.我想我必须为列的每个可能组合创建一个 for 循环，然后将 function 应用于此，但我不确定如何解决这个问题。 Appreciate any help.感谢任何帮助。 Thanks!谢谢！

Answer 1

You can use itertools to get all the unique pairs of columns, and then create a loop to apply an operation to them:您可以使用itertools获取所有唯一的列对，然后创建一个循环以对它们应用操作：

import itertools

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4,4), columns=['A','B','C','D'])

output = {}
for x, y in itertools.combinations(df.columns, 2):
    output[f'{x}_{y}'] = (df[x] * df[y]).to_list()

print(output)

Result:结果：

{'A_B': [0.5373750437559887,
  0.12077240904054208,
  0.02128148027667116,
  0.007578133428536173],
 'A_C': [0.039529837951733815,
  0.6965081691767399,
  0.04341804790548652,
  0.06767107788435821],
 'A_D': [0.07986784691848457,
  0.6510775100893785,
  0.05386515105603322,
  0.031171732070095028],
 'B_C': [0.03800661675931452,
  0.1710653815593833,
  0.136685425122361,
  0.07191805478874766],
 'B_D': [0.0767902629130131,
  0.15990741762557956,
  0.16957420765210682,
  0.033128042362621186],
 'C_D': [0.00564876743811126,
  0.9222041985664514,
  0.3459618868451074,
  0.2958261893932231]}

Here, the operation is multiplication * , but you could replace that with some other function, eg:在这里，操作是乘法* ，但您可以将其替换为其他一些 function，例如：

# find the maximum across the two columns
output = {}
for x, y in itertools.combinations(df.columns, 2):
    output[f'{x}_{y}'] = np.max(pd.concat([df[x], df[y]]))

print(output)
# {'A_B': 0.8884565587865458, 'A_C': 0.8687553149454967, 'A_D': 0.9452913147551872, 'B_C': 0.8884565587865458, 'B_D': 0.9452913147551872, 'C_D': 0.9452913147551872}

将 function 应用于每个列组合

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-03-02 16:16:59

将 function 应用于每个列组合

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-03-02 16:16:59

解决方案1
0 已采纳 2021-03-02 16:16:59