Python equivalent of R's code

Question

I posted a question along the same lines yesterday. This is a slightly modified version of it. previous question here .

I have 2 dataframes as follows:

data1 looks like this:

id          address       
1          11123451
2          78947591

data2 looks like the following:

lowerbound_address   upperbound_address    place
78392888                 89000000            X
10000000                 20000000            Y

I want to create another column in data1 called "place" which contains the place the id is from. There will be many ids coming from the same place. And some ids don't have a match.

The addresses here are float values.

What I am actually looking for in Python is an equivalent of this in R . It's easier to code the following in R . But I am unsure of how to code this in Python . Can someone help me with this?

data_place = rep(NA, nrow(data1))
for (i in (1:nrow(data1)){
tmp = as.character(data2[data1$address[i] >= data2$lowerbound_address & data1$address[i] <= data2$upperbound_address, "place"])
if(length(tmp)==1) {data_place[i] = tmp}
}

data$place = data_place

Answer 1

Something like this would work.

import pandas as pd
import numpy as np

# The below section is only used to import data

from io import StringIO

data = """      
id          address       
1          11123451
2          78947591
3          50000000
"""

data2 = """
lowerbound_address   upperbound_address    place
78392888                 89000000            X
10000000                 20000000            Y
"""

# The above section is only used to import data

df = pd.read_csv(StringIO(data), delimiter='\s+')
df2 = pd.read_csv(StringIO(data2), delimiter='\s+')

df['new']=np.nan

df['new'][(df['address'] > df2['lowerbound_address'][0]) & (df['address'] < df2['upperbound_address'][0])] = 'X'
df['new'][(df['address'] > df2['lowerbound_address'][1]) & (df['address'] < df2['upperbound_address'][1])] = 'Y'

In addition to pandas , we used numpy for np.nan .

All I have done was create a new column and assign NaN to it. Then created two criteria to assign either X or 'Y' based on the upper and lower boundaries in the second data (last two lines).

Final results:

   id   address  new
0   1  11123451    Y
1   2  78947591    X
2   3  50000000  NaN

Answer 2

Do a merge_asof and then replace all those times that the address is out of bounds with nan.

data1.sort_values('address', inplace = True)
data2.sort_values('lowerbound_address', inplace=True)
data3 = pd.merge_asof(data1, data2, left_on='address', right_on='lowerbound_address')
data3['place'] = data3['place'].where(data3.address <= data3.upperbound_address)
data3.drop(['lowerbound_address', 'upperbound_address'], axis=1)

Output

   id   address place
0   1  11123451     Y
1   3  50000000   NaN
2   2  78947591     X

Python equivalent of R's code

Question

2 answers

solution1
2 ACCPTED 2016-12-24 03:57:07

solution2
0 2016-12-24 03:09:40

Python equivalent of R's code

Question

2 answers

solution1 2 ACCPTED 2016-12-24 03:57:07

solution2 0 2016-12-24 03:09:40

solution1
2 ACCPTED 2016-12-24 03:57:07

solution2
0 2016-12-24 03:09:40