简体   繁体   中英

Fill column with info from another dataframe

I'm trying to get zipcode data from a dataframe to fill another dataframe missing NaN zipcode values. I'm using the street name and the number of the address to search the best zipcode match but my functions work only when hardcoding the values.

Here is some dummy data for the first dataframe:

sufixo = ['ST', 'ST', 'AV', 'ST', 'AV']
logradouro = ['JEFF', '9TH', 'CRAZY', 'SEXY', 'TEST']
number = [123, 444, 1204, 40, 55]
zipcode = [None, None, None, None, None]

dataset = list(zip(sufixo, logradouro, number, zipcode))
df = pd.DataFrame(data=dataset, columns=['suffix', 's_name', 'number', 'zipcode'])

Now the second one:

street_name = ['CRAZY AV', 'SEXY ST', '9TH ST', 'JEFF ST', 'TEST AV', 'CRAZY AV', 'SEXY ST', 'TEST AV']
number = [100, 23, 666, 24, 54, 1200, 39, 100]
zipcode = [11122, 11133, 11166, 11100, 11144, 11155, 11199, 11177]

dataset = list(zip(street_name, number, zipcode))
df2 = pd.DataFrame(data=dataset, columns=['street_name', 'number', 'zipcode'])

The function for getting the nearest number:

def find_nearest(array, value):
    idx = (np.abs(array-value)).idxmin()
    return array[idx]

The function for concatenating the street name:

def concat_st_name(row):
    return row['s_name'] + " " + row['suffix']

df['combo_name'] = df.apply(concat_st_name, axis=1)

And my failing function that tries to get a decent zipcode:

def zip_finder(row):
        return df2['zipcode'][(df2['street_name'] == row['combo_name']) &
                              (df2['number'] == find_nearest(df2[df2['street_name'] == row['combo_name']]['number'], row['number']))]

When trying to apply this with df['ziptest'] = df.apply(zip_finder, axis=1)

I get a: ValueError: Wrong number of items passed 5, placement implies 1

If I generate a ziptest column with 0's before, I get a new dataframe like this: 我的失败暴露了

I'm new to Pandas and I think I'm failing to understand the logic of the apply method.

IIUC, you can just using merge_asof

df['street_name']=df['s_name'] + " " + df['suffix']

pd.merge_asof(df.sort_values('number').drop('zipcode',1),df2.sort_values('number'),by='street_name',on='number',direction ='nearest')
Out[1176]: 
  suffix s_name  number street_name  zipcode
0     ST   SEXY      40     SEXY ST    11199
1     AV   TEST      55     TEST AV    11144
2     ST   JEFF     123     JEFF ST    11100
3     ST    9TH     444      9TH ST    11166
4     AV  CRAZY    1204    CRAZY AV    11155

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM