简体   繁体   中英

Python: Divide each row of a DataFrame by another DataFrame vector

I have a DataFrame (df1) with a dimension 2000 rows x 500 columns (excluding the index) for which I want to divide each row by another DataFrame (df2) with dimension 1 rows X 500 columns . Both have the same column headers. I tried:

df.divide(df2) and df.divide(df2, axis='index') and multiple other solutions and I always get a df with nan values in every cell. What argument am I missing in the function df.divide ?

In df.divide(df2, axis='index') , you need to provide the axis/row of df2 (ex. df2.iloc[0] ).

import pandas as pd

data1 = {"a":[1.,3.,5.,2.],
         "b":[4.,8.,3.,7.],
         "c":[5.,45.,67.,34]}
data2 = {"a":[4.],
         "b":[2.],
         "c":[11.]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2) 

df1.div(df2.iloc[0], axis='columns')

or you can use df1/df2.values[0,:]

You can divide by the series ie the first row of df2:

In [11]: df = pd.DataFrame([[1., 2.], [3., 4.]], columns=['A', 'B'])

In [12]: df2 = pd.DataFrame([[5., 10.]], columns=['A', 'B'])

In [13]: df.div(df2)
Out[13]: 
     A    B
0  0.2  0.2
1  NaN  NaN

In [14]: df.div(df2.iloc[0])
Out[14]: 
     A    B
0  0.2  0.2
1  0.6  0.4

Small clarification just in case: the reason why you got NaN everywhere while Andy's first example ( df.div(df2) ) works for the first line is div tries to match indexes (and columns). In Andy's example, index 0 is found in both dataframes, so the division is made, not index 1 so a line of NaN is added. This behavior should appear even more obvious if you run the following (only the 't' line is divided):

df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'])
df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'])
df_a.div(df_b)

So in your case, the index of the only row of df2 was apparently not present in df1. "Luckily", the column headers are the same in both dataframes, so when you slice the first row, you get a series, the index of which is composed by the column headers of df2. This is what eventually allows the division to take place properly.

For a case with index and column matching:

df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'], columns = range(5))
df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'], columns = [1,2,3,4,5])
df_a.div(df_b)

If you want to divide each row of a column with a specific value you could try:

df['column_name'] = df['column_name'].div(10000)

For me, this code divided each row of 'column_name' with 10,000.

要划分一行(具有单列或多列),我们需要执行以下操作:

df.loc['index_value'] = df.loc['index_value'].div(10000)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM