[英]Trying to compare two data frames based on difference between latitude and longitude in each data frame
I am trying to compare lat & long coordinates in two data frames.我正在尝试比较两个数据框中的经纬度坐标。 If the difference in latitude_fuze is <.01 latitude_air and if the difference in longitude_fuze is <.01 longitude_air, then I want to update the field df_result['Type'] to read 'Airport'.
如果 latitude_fuze 的差异是 <.01 latitude_air,如果 longitude_fuze 的差异是 <.01 longitude_air,那么我想将 df_result['Type'] 字段更新为“Airport”。 Basically, I have a DF with airport lat & long coordinates, and if these coordinates are very similar to the lat & long coordinates that I have in my business DF, I want to add a flag to the business DF to indicate that this is an airport.
基本上,我有一个带有机场纬度和经度坐标的 DF,如果这些坐标与我在业务 DF 中的经纬度坐标非常相似,我想在业务 DF 中添加一个标志以表明这是一个飞机场。
Here is the code that I am testing.这是我正在测试的代码。
lat1 = df_result['latitude_fuze']
lon1 = df_result['longitude_fuze']
lat2 = df_airports['latitude_air']
lon2 = df_airports['longitude_air']
fuze_rows=range(df_result.shape[0])
air_rows=range(df_airports.shape[0])
for r in fuze_rows:
lat = df_result.loc[r,lat1]
max_lat = lat + .01
min_lat = lat - .01
lon = df_result.loc[r,lon1]
max_lon = lon + .01
min_lon = lon - .01
for a in air_rows:
if (min_lat <= df_airports.loc[a,lat2] <= max_lat) and (min_lon <= df_airports.loc[a,lon2] <= max_lon):
df_result['Type'] = 'Airport'
Here are two sample data frames:以下是两个示例数据框:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'NY', 'New York', '40.76', '73.98'],
['NY', 'NY', 'New York', '40.76', '73.98']]
# Create the pandas DataFrame
df_result = pd.DataFrame(data, columns = ['state', 'city', 'county','latitude_fuze','longitude_fuze'])
# print dataframe.
df_result
And...和...
data = [['New York', 'JFK', '40.64', '-73.78'],
['New York', 'JFK', '40.64', '-73.78'],
['Los Angeles', 'LAX', '33.94', '-118.41'],
['Chicago', 'ORD', '41.98', '-87.90'],
['San Francisco', 'SFO', '37.62', '-122.38']]
# Create the pandas DataFrame
df_airports = pd.DataFrame(data, columns = ['municipality_name', 'airport_code', 'latitude_air','longitude_air'])
# print dataframe.
df_airports
When running this code, I get this error:运行此代码时,我收到此错误:
KeyError: "None of [Float64Index([40.719515, 40.719515, 40.719515, 40.75682, 40.75682, 40.75682,\n 40.75682, 40.75682, 40.75682, 40.7646,\n ...\n 40.0006, 40.0006, 40.0006, 40.0006, 40.0006, 40.0006,\n 40.0006, 39.742417, 39.742417, 39.742417],\n dtype='float64', length=1720)] are in the [index]"
If using KNN or the Haversine method to do the calculation is better, I'm open to that.如果使用 KNN 或 Haversine 方法进行计算更好,我对此持开放态度。 I'm not looking for distances here, but rather similarities in lat & long numbers.
我不是在这里寻找距离,而是在纬度和经度数字上寻找相似之处。 If I do need to calculate the distance to make this work correctly, please let me know.
如果我确实需要计算距离以使其正常工作,请告诉我。 Thanks everyone.
感谢大家。
I'm not sure what approach you need to take, as I'm not 100% clear on what you're trying to do.我不确定您需要采取什么方法,因为我不是 100% 清楚您要做什么。 However, something like this might be helpful for getting your current approach working:
但是,这样的事情可能有助于让您当前的方法发挥作用:
# join the two dataframes - must be the same length
df = pd.concat([df_result, df_airports], axis=1)
# cast latitudes and longitudes to numeric
cols = ["latitude_fuze", "latitude_air", "longitude_fuze", "longitude_air"]
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)
# create a mask where our conditions are met (difference between lat fuze and lat air < 0.1 and difference between long fuze and long air < 0.1)
mask = ((abs(df["latitude_fuze"] - df["latitude_air"]) < 0.1) & (abs(df["longitude_fuze"] - df["longitude_air"]) < 0.1))
# fill the type column
df.loc[mask, 'Type'] = "Airport"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.