[英]Find Closest Value to Input in Pandas Dataframe
The first rows of my dataframe are shown below.我的 dataframe 的第一行如下所示。 The columns are longitude, latitude, and value.
列是经度、纬度和值。 This dataframe extends for 30 million rows.
这个 dataframe 扩展了 3000 万行。
-179.979166666666657 89.9791666666666714 -3.39999995214436425e+38
-179.9375 89.9791666666666714 -3.39999995214436425e+38
-179.895833333333343 89.9791666666666714 -3.39999995214436425e+38
-179.854166666666657 89.9791666666666714 -3.39999995214436425e+38
-179.8125 89.9791666666666714 -3.39999995214436425e+38
-179.770833333333343 89.9791666666666714 -3.39999995214436425e+38
-179.729166666666657 89.9791666666666714 -3.39999995214436425e+38
-179.6875 89.9791666666666714 -3.39999995214436425e+38
-179.645833333333343 89.9791666666666714 -3.39999995214436425e+38
I am trying to find the closest longitude and latitude point to a given input, and then print out the value associated with the closest longitude and latitude.我试图找到与给定输入最近的经度和纬度点,然后打印出与最接近的经度和纬度相关的值。 I have tried to convert the dataframe into an array, and then search for the minimum value using this algorithm:
我曾尝试将 dataframe 转换为数组,然后使用此算法搜索最小值:
def match (lon, lat):
min=10000
minindex=-1
for x in range (len (mintemparr)):
if (abs ((float (lon))-float (mintemparr [x][0])))+(abs ((float (lat))-float (mintemparr [x]
[1])))<min:
min=(abs ((float (lon))-float (mintemparr [x][0])))+(abs ((float (lat))-float
(mintemparr [x][1])))
minindex=x
result=mintemparr [minindex][2]
return result
However, this is very slow.但是,这非常缓慢。 Is there a more direct way to search for the closest value within pandas rather than converting it into an array.
有没有更直接的方法来搜索 pandas 中最接近的值,而不是将其转换为数组。
Thanks in advance.提前致谢。
def find_closest(df, lat, lon):
dist = (df['lat'] - lat).abs() + (df['lon'] - lon).abs()
return df.loc[dist.idxmin()]
>>> find_closest(df, -179, 90)
lat -1.796458e+02
lon 8.997917e+01
value -3.400000e+38
Name: 8, dtype: float64
You can do this by using pandas.您可以通过使用 pandas 来做到这一点。 I will make a column for the differences with the given values and then the square root of the sum of squared differences of latitude and longitude.
我将为与给定值的差异创建一个列,然后是纬度和经度的平方差之和的平方根。 Then I'm getting the min.
然后我得到了分钟。 Assuming that your data frame is called df with columns latitude, longitude and value :
假设您的数据框被称为df列latitude、longitude和 value :
lon=-179.979164
lat=89.979162
df['sumofdiff']=df.assign(landif=df['longitude']-lon).assign(latdiff=df['latitude']-lat).eval("x=(landif*landif)+(latdiff*latdiff)")['x'].apply(np.sqrt)
df[df.sumofdiff == df.sumofdiff.min()]
longitude latitude value diff sumofdiff
0 -179.979167 89.979167 -3.400000e+38 0.000005 0.000005
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.