I'm trying to get zipcode data from a dataframe to fill another dataframe missing NaN zipcode values. I'm using the street name and the number of the address to search the best zipcode match but my functions work only when hardcoding the values.
Here is some dummy data for the first dataframe:
sufixo = ['ST', 'ST', 'AV', 'ST', 'AV']
logradouro = ['JEFF', '9TH', 'CRAZY', 'SEXY', 'TEST']
number = [123, 444, 1204, 40, 55]
zipcode = [None, None, None, None, None]
dataset = list(zip(sufixo, logradouro, number, zipcode))
df = pd.DataFrame(data=dataset, columns=['suffix', 's_name', 'number', 'zipcode'])
Now the second one:
street_name = ['CRAZY AV', 'SEXY ST', '9TH ST', 'JEFF ST', 'TEST AV', 'CRAZY AV', 'SEXY ST', 'TEST AV']
number = [100, 23, 666, 24, 54, 1200, 39, 100]
zipcode = [11122, 11133, 11166, 11100, 11144, 11155, 11199, 11177]
dataset = list(zip(street_name, number, zipcode))
df2 = pd.DataFrame(data=dataset, columns=['street_name', 'number', 'zipcode'])
The function for getting the nearest number:
def find_nearest(array, value):
idx = (np.abs(array-value)).idxmin()
return array[idx]
The function for concatenating the street name:
def concat_st_name(row):
return row['s_name'] + " " + row['suffix']
df['combo_name'] = df.apply(concat_st_name, axis=1)
And my failing function that tries to get a decent zipcode:
def zip_finder(row):
return df2['zipcode'][(df2['street_name'] == row['combo_name']) &
(df2['number'] == find_nearest(df2[df2['street_name'] == row['combo_name']]['number'], row['number']))]
When trying to apply this with df['ziptest'] = df.apply(zip_finder, axis=1)
I get a: ValueError: Wrong number of items passed 5, placement implies 1
If I generate a ziptest
column with 0's before, I get a new dataframe like this:
I'm new to Pandas and I think I'm failing to understand the logic of the apply method.
IIUC, you can just using merge_asof
df['street_name']=df['s_name'] + " " + df['suffix']
pd.merge_asof(df.sort_values('number').drop('zipcode',1),df2.sort_values('number'),by='street_name',on='number',direction ='nearest')
Out[1176]:
suffix s_name number street_name zipcode
0 ST SEXY 40 SEXY ST 11199
1 AV TEST 55 TEST AV 11144
2 ST JEFF 123 JEFF ST 11100
3 ST 9TH 444 9TH ST 11166
4 AV CRAZY 1204 CRAZY AV 11155
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.