I have a data-frame containing 3 columns: 'longitude', 'latitude', and 'country'. For some longitude and latitudes, the value in the country columns is 'unknown'. Here is an overview of the data-frame:
longitude latitude country
-76.250000 83.083333 China
-76.166667 83.083333 unknown
-76.083333 83.083333 USA
-76.000000 83.083333 India
-75.916667 83.083333 unknown
-68.166667 -55.500000 unknown
-67.666667 -55.500000 UK
-68.166667 -55.583333 Chile
-68.083333 -55.583333 Canada
-67.500000 -55.666667 unknown
For the unknown countries, I want to calculate the minimum euclidean distance for longitudes and latitudes containing a country name and replace 'unknown' with that country name(minimum distance). Is there an efficient way to do that?
Your example is not representative. The only country value you have is Chile. However, something like the following should work:
from scipy.spatial import distance
def euclidean(point, others):
return others[distance.cdist(point[None,:-1].astype(float), others[:,:-1].astype(float)).argmin(),2]
unknown = df[df["country"].eq("unknown")]
known = df[df["country"].ne("unknown")]
matches = unknown.apply(lambda row: scipy_euclidean(row.to_numpy(), known.to_numpy()), axis=1)
df["country"] = df["country"].where(df["country"].ne("unknown"), matches)
>>> df
longitude latitude country
0 -76.250000 83.083333 China
1 -76.166667 83.083333 China
2 -76.083333 83.083333 USA
3 -76.000000 83.083333 India
4 -75.916667 83.083333 India
5 -68.166667 -55.500000 Chile
6 -67.666667 -55.500000 UK
7 -68.166667 -55.583333 Chile
8 -68.083333 -55.583333 Canada
9 -67.500000 -55.666667 UK
big_df = pd.concat([df]*1000)
unknown = big_df[big_df["country"].eq("unknown")]
known = big_df[big_df["country"].ne("unknown")]
>>> %timeit unknown.apply(lambda row: euclidean(row.to_numpy(), known.to_numpy()), axis=1)
847 µs ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.