Fill column with info from another dataframe

Question

I'm trying to get zipcode data from a dataframe to fill another dataframe missing NaN zipcode values. I'm using the street name and the number of the address to search the best zipcode match but my functions work only when hardcoding the values.

Here is some dummy data for the first dataframe:

sufixo = ['ST', 'ST', 'AV', 'ST', 'AV']
logradouro = ['JEFF', '9TH', 'CRAZY', 'SEXY', 'TEST']
number = [123, 444, 1204, 40, 55]
zipcode = [None, None, None, None, None]

dataset = list(zip(sufixo, logradouro, number, zipcode))
df = pd.DataFrame(data=dataset, columns=['suffix', 's_name', 'number', 'zipcode'])

Now the second one:

street_name = ['CRAZY AV', 'SEXY ST', '9TH ST', 'JEFF ST', 'TEST AV', 'CRAZY AV', 'SEXY ST', 'TEST AV']
number = [100, 23, 666, 24, 54, 1200, 39, 100]
zipcode = [11122, 11133, 11166, 11100, 11144, 11155, 11199, 11177]

dataset = list(zip(street_name, number, zipcode))
df2 = pd.DataFrame(data=dataset, columns=['street_name', 'number', 'zipcode'])

The function for getting the nearest number:

def find_nearest(array, value):
    idx = (np.abs(array-value)).idxmin()
    return array[idx]

The function for concatenating the street name:

def concat_st_name(row):
    return row['s_name'] + " " + row['suffix']

df['combo_name'] = df.apply(concat_st_name, axis=1)

And my failing function that tries to get a decent zipcode:

def zip_finder(row):
        return df2['zipcode'][(df2['street_name'] == row['combo_name']) &
                              (df2['number'] == find_nearest(df2[df2['street_name'] == row['combo_name']]['number'], row['number']))]

When trying to apply this with df['ziptest'] = df.apply(zip_finder, axis=1)

I get a: ValueError: Wrong number of items passed 5, placement implies 1

If I generate a ziptest column with 0's before, I get a new dataframe like this:

I'm new to Pandas and I think I'm failing to understand the logic of the apply method.

Answer 1

IIUC, you can just using merge_asof

df['street_name']=df['s_name'] + " " + df['suffix']

pd.merge_asof(df.sort_values('number').drop('zipcode',1),df2.sort_values('number'),by='street_name',on='number',direction ='nearest')
Out[1176]: 
  suffix s_name  number street_name  zipcode
0     ST   SEXY      40     SEXY ST    11199
1     AV   TEST      55     TEST AV    11144
2     ST   JEFF     123     JEFF ST    11100
3     ST    9TH     444      9TH ST    11166
4     AV  CRAZY    1204    CRAZY AV    11155

Fill column with info from another dataframe

Question

1 answers

solution1
0 ACCPTED 2018-03-27 03:39:34

Fill column with info from another dataframe

Question

1 answers

solution1 0 ACCPTED 2018-03-27 03:39:34

solution1
0 ACCPTED 2018-03-27 03:39:34