简体   繁体   中英

Python equivalent of R's code

I posted a question along the same lines yesterday. This is a slightly modified version of it. previous question here .

I have 2 dataframes as follows:

data1 looks like this:

id          address       
1          11123451
2          78947591

data2 looks like the following:

lowerbound_address   upperbound_address    place
78392888                 89000000            X
10000000                 20000000            Y

I want to create another column in data1 called "place" which contains the place the id is from. There will be many ids coming from the same place. And some ids don't have a match.

The addresses here are float values.

What I am actually looking for in Python is an equivalent of this in R . It's easier to code the following in R . But I am unsure of how to code this in Python . Can someone help me with this?

data_place = rep(NA, nrow(data1))
for (i in (1:nrow(data1)){
tmp = as.character(data2[data1$address[i] >= data2$lowerbound_address & data1$address[i] <= data2$upperbound_address, "place"])
if(length(tmp)==1) {data_place[i] = tmp}
}

data$place = data_place

Something like this would work.

import pandas as pd
import numpy as np

# The below section is only used to import data

from io import StringIO

data = """      
id          address       
1          11123451
2          78947591
3          50000000
"""

data2 = """
lowerbound_address   upperbound_address    place
78392888                 89000000            X
10000000                 20000000            Y
"""

# The above section is only used to import data

df = pd.read_csv(StringIO(data), delimiter='\s+')
df2 = pd.read_csv(StringIO(data2), delimiter='\s+')

df['new']=np.nan

df['new'][(df['address'] > df2['lowerbound_address'][0]) & (df['address'] < df2['upperbound_address'][0])] = 'X'
df['new'][(df['address'] > df2['lowerbound_address'][1]) & (df['address'] < df2['upperbound_address'][1])] = 'Y'

In addition to pandas , we used numpy for np.nan .

All I have done was create a new column and assign NaN to it. Then created two criteria to assign either X or 'Y' based on the upper and lower boundaries in the second data (last two lines).

Final results:

   id   address  new
0   1  11123451    Y
1   2  78947591    X
2   3  50000000  NaN

Do a merge_asof and then replace all those times that the address is out of bounds with nan.

data1.sort_values('address', inplace = True)
data2.sort_values('lowerbound_address', inplace=True)
data3 = pd.merge_asof(data1, data2, left_on='address', right_on='lowerbound_address')
data3['place'] = data3['place'].where(data3.address <= data3.upperbound_address)
data3.drop(['lowerbound_address', 'upperbound_address'], axis=1)

Output

   id   address place
0   1  11123451     Y
1   3  50000000   NaN
2   2  78947591     X

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM