简体   繁体   中英

Manual Feature Engineering in Pandas - Mean of 1 Column vs All Other Columns

Hard to describe this one, but for every column in a dataframe, create a new column that contains the mean of the current column vs the one next to it, then get the mean of that first column vs the next one down the line. Running Python 3.6.

For Example, given this dataframe:

起始数据集

I would like to get this output:

Ending_Datasest

That exact order of the added columns at the end isn't important, but it needs to be able to handle every possible combination of means between all columns, with a depth of 2 (ie compare one column to another). Ideally, I would like to have the depth set as a separate variable, so I could have a depth of 3, where it would do this but compare 3 columns to one another.

Ideas? Thanks!

UPDATE

I got this to work, but wondering if there's a more computationally fast way of doing it. I basically just created 2 of the same loops (loop within a loop) to compare 1 column vs the rest, skipping the same column comparisons:

eng_features = pd.DataFrame()

for col in df.columns:
    for col2 in df.columns:

        # Don't compare same columns, or inversed same columns
        if col == col2 or (str(col2) + '_' + str(col)) in eng_features:
            continue
        else:
            eng_features[str(col) + '_' + str(col2)] = df[[col, col2]].mean(axis=1)
            continue

    df = pd.concat([df, eng_features], axis=1)

Use itertools , a python built in utility package for iterators:

from itertools import permutations

for col1, col2 in permutations(df.columns, r=2):
    df[f'Mean_of_{col1}-{col2}'] = df[[col1,col2]].mean(axis=1)

and you will get what you need:

   a  b  c  Mean_of_a-b  Mean_of_a-c  Mean_of_b-a  Mean_of_b-c  Mean_of_c-a  \
0  1  1  0          1.0          0.5          1.0          0.5          0.5   
1  0  1  0          0.5          0.0          0.5          0.5          0.0   
2  1  1  0          1.0          0.5          1.0          0.5          0.5   

   Mean_of_c-b  
0          0.5  
1          0.5  
2          0.5  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM