Replace values in dataframe from another dataframe with Pandas

Question

I have 3 dataframes: df1 , df2 , df3 . I am trying to fill NaN values of df1 with some values contained in df2 . The values selected from df2 are also selected according to the output of a simple function ( mul_val ) who processes some data stored in df3 .

I was able to get such result but I would like to find in a simpler, easier way and more readable code.

Here is what I have so far:

import pandas as pd
import numpy as np

# simple function
def mul_val(a,b):
    return a*b

# dataframe 1
data = {'Name':['PINO','PALO','TNCO' ,'TNTO','CUCO' ,'FIGO','ONGF','LABO'],
        'Id'  :[  10  ,  9   ,np.nan ,  14   , 3    ,np.nan,  7   ,np.nan]}
df1 = pd.DataFrame(data)

# dataframe 2
infos = {'Info_a':[10,20,30,40,70,80,90,50,60,80,40,50,20,30,15,11],
         'Info_b':[10,30,30,60,10,85,99,50,70,20,30,50,20,40,16,17]}
df2 = pd.DataFrame(infos)

dic = {'Name': {0: 'FIGO', 1: 'TNCO'}, 
       'index': {0: [5, 6], 1: [11, 12, 13]}}
df3 = pd.DataFrame(dic)

#---------------Modify from here in the most efficient way!-----------------

for idx,row in df3.iterrows():
    store_val = []
    print(row['Name'])
    for j in row['index']:
        store_val.append([mul_val(df2['Info_a'][j],df2['Info_b'][j]),j])
    store_val = np.asarray(store_val)

    # - Identify which is the index of minimum value of the first column
    indx_min_val = np.argmin(store_val[:,0])

    # - Get the value relative number contained in the second column
    col_value = row['index'][indx_min_val]

    # Identify value to be replaced in df1
    value_to_be_replaced = df1['Id'][df1['Name']==row['Name']]

    # - Replace such value into the df1 having the same row['Name']
    df1['Id'].replace(to_replace=value_to_be_replaced,value=col_value, inplace=True)

By printing store_val at every iteration I get:

FIGO
[[6800    5]   
 [8910    6]]
TNCO
[[2500   11]
 [ 400   12]
 [1200   13]]

Let's do a simple example: considering FIGO , I identify 6800 as the minimum number between 6800 and 8910 . Therefore I select the number 5 who is placed in df1 . Repeating such operation for the remaining rows of df3 (in this case I have only 2 rows but they could be a lot more), the final result should be like this:

In[0]: before           In[0]: after
Out[0]:                 Out[0]: 
     Id  Name                Id  Name
0  10.0  PINO           0  10.0  PINO
1   9.0  PALO           1   9.0  PALO
2   NaN  TNCO  ----->   2  12.0  TNCO
3  14.0  TNTO           3  14.0  TNTO
4   3.0  CUCO           4   3.0  CUCO
5   NaN  FIGO  ----->   5   5.0  FIGO
6   7.0  ONGF           6   7.0  ONGF
7   NaN  LABO           7   NaN  LABO

Nore: you can also remove the for loops if needed and use different type of formats to store the data (list, arrays...); the important thing is that the final result is still a dataframe.

Answer 1

I can offer two similar options that achieve the same result than your loop in a couple of lines:

1.Using apply and fillna() ( fillna is faster than combine_first by a factor of two):

  df3['Id'] = df3.apply(lambda row: (df2.Info_a*df2.Info_b).loc[row['index']].argmin(), axis=1)
  df1 = df1.set_index('Name').fillna(df3.set_index('Name')).reset_index()

2.Using a function (lambda doesn't support assignment, so you have to apply a func)

def f(row):
    df1.ix[df1.Name==row['Name'], 'Id'] = (df2.Info_a*df2.Info_b).loc[row['index']].argmin()
df3.apply(f, axis=1)

or a slight variant not relying on global definitions:

def f(row, df1, df2):
    df1.ix[df1.Name==row['Name'], 'Id'] = (df2.Info_a*df2.Info_b).loc[row['index']].argmin()
df3.apply(f, args=(df1,df2,), axis=1)

Note that your solution, even though much more verbose, will take the least amount of time with this small dataset (7.5 ms versus 9.5 ms for both of mine). It makes sense that the speed would be similar, since in both cases it's a matter of looping on the rows of df3

Replace values in dataframe from another dataframe with Pandas

Question

1 answers

solution1
1 ACCPTED 2016-11-24 14:04:12

Replace values in dataframe from another dataframe with Pandas

Question

1 answers

solution1 1 ACCPTED 2016-11-24 14:04:12

solution1
1 ACCPTED 2016-11-24 14:04:12