简体   繁体   English

如何在创建附加列时将 function 应用于整个 pandas df?

[英]How to apply a function to an entire pandas df in creating additional columns?

So my question is a follow up from a previous post which has had some updates:所以我的问题是对之前帖子的跟进,该帖子有一些更新:

Problem问题

Given a (400 * 18) pandas dataframe I would like to produce a (400*153) size dataframe which includes the unique pairs of the 18 existing columns (18./2,(18-2).=153). Given a (400 * 18) pandas dataframe I would like to produce a (400*153) size dataframe which includes the unique pairs of the 18 existing columns (18./2,(18-2).=153). I have so far managed this with one row, however I am not able to apply it correctly to all 400 observations.到目前为止,我已经用一行来管理它,但是我无法将它正确地应用于所有 400 个观察值。

Attempt so far到目前为止的尝试

My code:我的代码:

#df2 = (400 * 18) df
#list(combinations(df2.iloc[i], 2)): yields 152 unique pairs of the 18 variables

rows = {} #storing data in a dict as 400 keys which have  a list of 153 pairs
vars = {} #final dictionary for the computed pairs to be stored in

#For each pair I need to compute 400 values because n=400
for j in range(0,153): 
  for i in range(0,400):
      #grabbing all unique pairs of the variables and storing it as a list for each observation
      rows[i] = list(combinations(df2.iloc[i], 2)) 
      for x in rows.keys():
        #Performing the computation of each pair in each row
        vars[x] = rows[x][j][0] * rows[x][j][1] 

#vars yields a dictionary containing 400 keys each 
#only having one value within them rather than 153 values

My approach was to create a dictionary containing 400 keys which would then have 153 values each, where each of the 153 values is the product of the pairs that were acquired previously.我的方法是创建一个包含 400 个键的字典,然后每个键都有 153 个值,其中 153 个值中的每一个都是先前获得的对的乘积。 So far I am receiving a dictionary of 400 keys each which only have 1 value within it, as opposed to the 153 I am wanting到目前为止,我收到了一个包含 400 个键的字典,每个键只有一个值,而不是我想要的 153 个

You can simply apply across each row (axis=1) and create a new series based on the index of the combination and the results.您可以简单地应用每行(轴 = 1)并根据组合的索引和结果创建一个新系列。

Here's a sample with 5 columns producing 10 ( 5!/(2!(5-2)!) ) results.这是一个包含 5 列的示例,产生 10 ( 5!/(2!(5-2)!) ) 个结果。

Currently, column names are being generated from the index provided by enumerate , but you could also modify the keys to give more meaningful column names.目前,列名是从enumerate提供的索引生成的,但您也可以修改键以提供更有意义的列名。

from itertools import combinations

import pandas as pd

df2 = pd.DataFrame({'A': [0, 1, 2],
                    'B': [3, 4, 5],
                    'C': [6, 7, 8],
                    'D': [9, 10, 11],
                    'E': [12, 13, 14]})

new_df = df2.apply(lambda s:
                   pd.Series(
                       {i: c for i, c in enumerate(combinations(s.values, 2))}
                   ),
                   axis=1)

# For Display
print(new_df.to_string(index=False))

Output: Output:

     0      1       2       3      4       5       6       7       8        9
(0, 3) (0, 6)  (0, 9) (0, 12) (3, 6)  (3, 9) (3, 12)  (6, 9) (6, 12)  (9, 12)
(1, 4) (1, 7) (1, 10) (1, 13) (4, 7) (4, 10) (4, 13) (7, 10) (7, 13) (10, 13)
(2, 5) (2, 8) (2, 11) (2, 14) (5, 8) (5, 11) (5, 14) (8, 11) (8, 14) (11, 14)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM