简体   繁体   English

Pandas 中的手动特征工程 - 1 列与所有其他列的平均值

[英]Manual Feature Engineering in Pandas - Mean of 1 Column vs All Other Columns

Hard to describe this one, but for every column in a dataframe, create a new column that contains the mean of the current column vs the one next to it, then get the mean of that first column vs the next one down the line.很难描述这一点,但对于数据框中的每一列,创建一个新列,其中包含当前列的平均值与相邻列的平均值,然后获取第一列与下一行的平均值。 Running Python 3.6.运行 Python 3.6。

For Example, given this dataframe:例如,给定这个数据框:

起始数据集

I would like to get this output:我想得到这个输出:

Ending_Datasest

That exact order of the added columns at the end isn't important, but it needs to be able to handle every possible combination of means between all columns, with a depth of 2 (ie compare one column to another).最后添加列的确切顺序并不重要,但它需要能够处理所有列之间的所有可能的均值组合,深度为 2(即,将一列与另一列进行比较)。 Ideally, I would like to have the depth set as a separate variable, so I could have a depth of 3, where it would do this but compare 3 columns to one another.理想情况下,我希望将深度设置为一个单独的变量,因此我可以将深度设置为 3,它可以执行此操作但将 3 列相互比较。

Ideas?想法? Thanks!谢谢!

UPDATE更新

I got this to work, but wondering if there's a more computationally fast way of doing it.我让它工作了,但想知道是否有一种计算速度更快的方法。 I basically just created 2 of the same loops (loop within a loop) to compare 1 column vs the rest, skipping the same column comparisons:我基本上只是创建了 2 个相同的循环(循环内的循环)来比较 1 列与其余列,跳过相同的列比较:

eng_features = pd.DataFrame()

for col in df.columns:
    for col2 in df.columns:

        # Don't compare same columns, or inversed same columns
        if col == col2 or (str(col2) + '_' + str(col)) in eng_features:
            continue
        else:
            eng_features[str(col) + '_' + str(col2)] = df[[col, col2]].mean(axis=1)
            continue

    df = pd.concat([df, eng_features], axis=1)

Use itertools , a python built in utility package for iterators:使用itertools ,一个用于迭代器的 Python 内置实用程序包:

from itertools import permutations

for col1, col2 in permutations(df.columns, r=2):
    df[f'Mean_of_{col1}-{col2}'] = df[[col1,col2]].mean(axis=1)

and you will get what you need:你会得到你需要的:

   a  b  c  Mean_of_a-b  Mean_of_a-c  Mean_of_b-a  Mean_of_b-c  Mean_of_c-a  \
0  1  1  0          1.0          0.5          1.0          0.5          0.5   
1  0  1  0          0.5          0.0          0.5          0.5          0.0   
2  1  1  0          1.0          0.5          1.0          0.5          0.5   

   Mean_of_c-b  
0          0.5  
1          0.5  
2          0.5  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM