简体   繁体   English

熊猫:两个数据帧的元素乘法

[英]Pandas: Elementwise multiplication of two dataframes

I know how to do element by element multiplication between two Pandas dataframes. 我知道如何在两个Pandas数据帧之间进行逐元素乘法。 However, things get more complicated when the dimensions of the two dataframes are not compatible. 但是,当两个数据帧的尺寸不兼容时,事情变得更加复杂。 For instance below df * df2 is straightforward, but df * df3 is a problem: 例如, df * df2下面很简单,但是df * df3是一个问题:

df = pd.DataFrame({'col1' : [1.0] * 5, 
                   'col2' : [2.0] * 5, 
                   'col3' : [3.0] * 5 }, index = range(1,6),)
df2 = pd.DataFrame({'col1' : [10.0] * 5, 
                    'col2' : [100.0] * 5, 
                    'col3' : [1000.0] * 5 }, index = range(1,6),)
df3 = pd.DataFrame({'col1' : [0.1] * 5}, index = range(1,6),)

df.mul(df2, 1) # element by element multiplication no problems

df.mul(df3, 1) # df(row*col) is not equal to df3(row*col)
   col1  col2  col3
1   0.1   NaN   NaN
2   0.1   NaN   NaN
3   0.1   NaN   NaN
4   0.1   NaN   NaN
5   0.1   NaN   NaN

In the above situation, how can I multiply every column of df with df3.col1 ? 在上面的情况下, 我如何将每列df与df3.col1相乘

My attempt: I tried to replicate df3.col1 len(df.columns.values) times to get a dataframe that is of the same dimension as df : 我的尝试:我尝试复制df3.col1 len(df.columns.values)次,以获得与df具有相同维度的数据帧:

df3 = pd.DataFrame([df3.col1 for n in range(len(df.columns.values)) ])
df3
        1    2    3    4    5
col1  0.1  0.1  0.1  0.1  0.1
col1  0.1  0.1  0.1  0.1  0.1
col1  0.1  0.1  0.1  0.1  0.1

But this creates a dataframe of dimensions 3 * 5, whereas I am after 5*3. 但这会创建一个尺寸为3 * 5的数据框,而我的数据框则为5 * 3。 I know I can take the transpose with df3.T() to get what I need but I think this is not that the fastest way. 我知道我可以用df3.T()进行转置以得到我需要的东西,但我认为这不是最快的方法。

In [161]: pd.DataFrame(df.values*df2.values, columns=df.columns, index=df.index)
Out[161]: 
   col1  col2  col3
1    10   200  3000
2    10   200  3000
3    10   200  3000
4    10   200  3000
5    10   200  3000

A simpler way to do this is just to multiply the dataframe whose colnames you want to keep with the values (ie numpy array) of the other, like so: 一种更简单的方法就是将要保留其colnames的数据帧与另一个的值(即numpy数组)相乘,如下所示:

In [63]: df * df2.values
Out[63]: 
   col1  col2  col3
1    10   200  3000
2    10   200  3000
3    10   200  3000
4    10   200  3000
5    10   200  3000

This way you do not have to write all that new dataframe boilerplate. 这样您就不必编写所有新的数据框样板文件。

This works for me: 这对我有用:

mul = df.mul(df3.c, axis=0)

Or, when you want to subtract (divide) instead: 或者,当您想要减去(除)时:

sub = df.sub(df3.c, axis=0)
div = df.div(df3.c, axis=0)

Works also with a nan in df (eg if you apply this to the df: df.iloc[0]['col2'] = np.nan) 也可以使用df中的nan (例如,如果将其应用于df: df.iloc[0]['col2'] = np.nan)

要使用Pandas广播属性,您可以使用multiply

df.multiply(df3['col1'], axis=0)

Another way is create list of columns and join them: 另一种方法是创建列列表并加入它们:

cols = [pd.DataFrame(df[col] * df3.col1, columns=[col]) for col in df]
mul = cols[0].join(cols[1:])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM