简体   繁体   English

乘以列的所有组合

[英]multiplying all combinations of columns

I am trying to find an efficient way of multiplying each column combination within a pandas dataframe. 我试图找到一种在pandas数据帧中乘以每个列组合的有效方法。 I have managed to achieve this with itertools, however when the size of the dataframe increases it dramatically slows down. 我已经设法用itertools来实现这一点,但是当数据帧的大小增加时,它会大大减慢。 I am going to need to perform this on a dataframe with a size of about (100,1000) 我将需要在大小约为(100,1000)的数据帧上执行此操作

Example of working code with smaller dataframe below, 下面使用较小数据帧的工作代码示例,

import numpy as np
import pandas as pd
from itertools import combinations_with_replacement

df = pd.DataFrame(np.random.randn(3, 10))
new_df = pd.DataFrame()

for p in combinations_with_replacement(df.columns,2):
        title = p
        new_df[title] = df[p[0]]*df[p[1]]  

Does anybody have any suggestions on how this could be achieved? 有没有人对如何实现这一点有任何建议?

Combining index view and array.prod(axis) , this runs ~100 times faster: 将索引视图和array.prod(axis)结合起来,运行速度提高了约100倍:

def f1():
    #with loop
    new_df = pd.DataFrame()
    for p in combinations_with_replacement(df.columns,2):
            title = p
            new_df[title] = df[p[0]]*df[p[1]]
    return new_df

def f2():
    n = len(df.columns)
    ix = np.indices((n,n))[:, ~np.tri(n, k=-1, dtype=bool)]
    return pd.DataFrame(df.values.T[ix.T].prod(1).T, columns=list(map(tuple, ix.T)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM