简体   繁体   English

尝试根据每个数据帧中的经纬度差异比较两个数据帧

[英]Trying to compare two data frames based on difference between latitude and longitude in each data frame

I am trying to compare lat & long coordinates in two data frames.我正在尝试比较两个数据框中的经纬度坐标。 If the difference in latitude_fuze is <.01 latitude_air and if the difference in longitude_fuze is <.01 longitude_air, then I want to update the field df_result['Type'] to read 'Airport'.如果 latitude_fuze 的差异是 <.01 latitude_air,如果 longitude_fuze 的差异是 <.01 longitude_air,那么我想将 df_result['Type'] 字段更新为“Airport”。 Basically, I have a DF with airport lat & long coordinates, and if these coordinates are very similar to the lat & long coordinates that I have in my business DF, I want to add a flag to the business DF to indicate that this is an airport.基本上,我有一个带有机场纬度和经度坐标的 DF,如果这些坐标与我在业务 DF 中的经纬度坐标非常相似,我想在业务 DF 中添加一个标志以表明这是一个飞机场。

Here is the code that I am testing.这是我正在测试的代码。

lat1 = df_result['latitude_fuze']
lon1 = df_result['longitude_fuze']
lat2 = df_airports['latitude_air']
lon2 = df_airports['longitude_air']

fuze_rows=range(df_result.shape[0])
air_rows=range(df_airports.shape[0])

for r in fuze_rows:
    lat = df_result.loc[r,lat1]
    max_lat = lat + .01
    min_lat = lat - .01
    lon = df_result.loc[r,lon1]
    max_lon = lon + .01
    min_lon = lon - .01
    for a in air_rows:
        if (min_lat <= df_airports.loc[a,lat2] <= max_lat) and (min_lon <= df_airports.loc[a,lon2] <= max_lon):
            df_result['Type'] = 'Airport'

Here are two sample data frames:以下是两个示例数据框:

# Import pandas library 
import pandas as pd 
  
# initialize list of lists 
data = [['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'], 
        ['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
        ['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
        ['NY', 'NY', 'New York', '40.76', '73.98'],
        ['NY', 'NY', 'New York', '40.76', '73.98']] 
  
# Create the pandas DataFrame 
df_result = pd.DataFrame(data, columns = ['state', 'city', 'county','latitude_fuze','longitude_fuze']) 
# print dataframe. 
df_result

And...和...

data = [['New York', 'JFK', '40.64', '-73.78'], 
        ['New York', 'JFK', '40.64', '-73.78'],
        ['Los Angeles', 'LAX', '33.94', '-118.41'],
        ['Chicago', 'ORD', '41.98', '-87.90'],
        ['San Francisco', 'SFO', '37.62', '-122.38']] 
  
# Create the pandas DataFrame 
df_airports = pd.DataFrame(data, columns = ['municipality_name', 'airport_code', 'latitude_air','longitude_air']) 
# print dataframe. 
df_airports

When running this code, I get this error:运行此代码时,我收到此错误:

KeyError: "None of [Float64Index([40.719515, 40.719515, 40.719515,  40.75682,  40.75682,  40.75682,\n               40.75682,  40.75682,  40.75682,   40.7646,\n              ...\n                40.0006,   40.0006,   40.0006,   40.0006,   40.0006,   40.0006,\n                40.0006, 39.742417, 39.742417, 39.742417],\n             dtype='float64', length=1720)] are in the [index]"

If using KNN or the Haversine method to do the calculation is better, I'm open to that.如果使用 KNN 或 Haversine 方法进行计算更好,我对此持开放态度。 I'm not looking for distances here, but rather similarities in lat & long numbers.我不是在这里寻找距离,而是在纬度和经度数字上寻找相似之处。 If I do need to calculate the distance to make this work correctly, please let me know.如果我确实需要计算距离以使其正常工作,请告诉我。 Thanks everyone.感谢大家。

I'm not sure what approach you need to take, as I'm not 100% clear on what you're trying to do.我不确定您需要采取什么方法,因为我不是 100% 清楚您要做什么。 However, something like this might be helpful for getting your current approach working:但是,这样的事情可能有助于让您当前的方法发挥作用:

# join the two dataframes - must be the same length
df = pd.concat([df_result, df_airports], axis=1)

# cast latitudes and longitudes to numeric
cols = ["latitude_fuze", "latitude_air", "longitude_fuze", "longitude_air"]
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)

# create a mask where our conditions are met (difference between lat fuze and lat air < 0.1 and difference between long fuze and long air < 0.1)
mask = ((abs(df["latitude_fuze"] - df["latitude_air"]) < 0.1) & (abs(df["longitude_fuze"] - df["longitude_air"]) < 0.1))

# fill the type column
df.loc[mask, 'Type'] = "Airport"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM