简体   繁体   English

Python:将DataFrame的每一行除以另一个DataFrame向量

[英]Python: Divide each row of a DataFrame by another DataFrame vector

I have a DataFrame (df1) with a dimension 2000 rows x 500 columns (excluding the index) for which I want to divide each row by another DataFrame (df2) with dimension 1 rows X 500 columns . 我有一个DataFrame(df1),其尺寸为2000 rows x 500 columns (不包括索引),为此,我想将其每一行除以另一个DataFrame(df2),其尺寸为1 rows X 500 columns Both have the same column headers. 两者具有相同的列标题。 I tried: 我试过了:

df.divide(df2) and df.divide(df2, axis='index') and multiple other solutions and I always get a df with nan values in every cell. df.divide(df2)df.divide(df2, axis='index')以及其他多个解决方案,我总是在每个单元格中得到一个带有nan值的df。 What argument am I missing in the function df.divide ? 我在函数df.divide缺少什么参数?

In df.divide(df2, axis='index') , you need to provide the axis/row of df2 (ex. df2.iloc[0] ). df.divide(df2, axis='index') ,您需要提供df2的轴/行(例如df2.iloc[0] )。

import pandas as pd

data1 = {"a":[1.,3.,5.,2.],
         "b":[4.,8.,3.,7.],
         "c":[5.,45.,67.,34]}
data2 = {"a":[4.],
         "b":[2.],
         "c":[11.]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2) 

df1.div(df2.iloc[0], axis='columns')

or you can use df1/df2.values[0,:] 或者您可以使用df1/df2.values[0,:]

You can divide by the series ie the first row of df2: 您可以按系列除以df2的第一行:

In [11]: df = pd.DataFrame([[1., 2.], [3., 4.]], columns=['A', 'B'])

In [12]: df2 = pd.DataFrame([[5., 10.]], columns=['A', 'B'])

In [13]: df.div(df2)
Out[13]: 
     A    B
0  0.2  0.2
1  NaN  NaN

In [14]: df.div(df2.iloc[0])
Out[14]: 
     A    B
0  0.2  0.2
1  0.6  0.4

Small clarification just in case: the reason why you got NaN everywhere while Andy's first example ( df.div(df2) ) works for the first line is div tries to match indexes (and columns). 以防万一:请df.div(df2)澄清:当Andy的第一个示例( df.div(df2) )用于第一行时,到处都是NaN的原因是div试图匹配索引(和列)。 In Andy's example, index 0 is found in both dataframes, so the division is made, not index 1 so a line of NaN is added. 在安迪的示例中,在两个数据帧中都找到了索引0,因此进行了除法运算,而不是索引1,因此添加了一行NaN。 This behavior should appear even more obvious if you run the following (only the 't' line is divided): 如果您运行以下命令,则此行为应该更加明显(仅分隔't'行):

df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'])
df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'])
df_a.div(df_b)

So in your case, the index of the only row of df2 was apparently not present in df1. 因此,在您的情况下,df1中唯一一行的索引显然不存在。 "Luckily", the column headers are the same in both dataframes, so when you slice the first row, you get a series, the index of which is composed by the column headers of df2. “幸运的是”,两个数据帧中的列标题都是相同的,因此,在对第一行进行切片时,会得到一个序列,其索引由df2的列标题组成。 This is what eventually allows the division to take place properly. 这最终使分裂得以正确进行。

For a case with index and column matching: 对于索引和列匹配的情况:

df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'], columns = range(5))
df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'], columns = [1,2,3,4,5])
df_a.div(df_b)

If you want to divide each row of a column with a specific value you could try: 如果要用特定值划分列的每一行,则可以尝试:

df['column_name'] = df['column_name'].div(10000)

For me, this code divided each row of 'column_name' with 10,000. 对我来说,这段代码将“ column_name”的每一行除以10,000。

要划分一行(具有单列或多列),我们需要执行以下操作:

df.loc['index_value'] = df.loc['index_value'].div(10000)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM