I have attached a sample of my dataset. I have minimal Panda experience, hence, I'm struggling to formulate the problem.
What I'm trying to do is populate the 'dist' column (cartesian: p1 = (lat1,long1) ; p2 = (lat2,long2)
) for each index based on the state and the county.
Each county may have multiple p1
's. We use the one nearest to p2
when computing the distance. When a county doesn't have a p1
value, we simply use the next one that comes in the sequence.
How do I set up this problem concisely? I can imagine running an iterator over the the county/state but failing to move beyond that.
[EDIT] Here is the data frame head as suggested below. (Ignore the mismatch from the picture)
lat1 long1 state county lat2 long2
0 . . AK Aleutians West 11.0 23.0
1 . . AK Wade Hampton 33.0 11.0
2 . . AK North Slope 55.0 11.0
3 . . AK Kenai Peninsula 44.0 11.0
4 . . AK Anchorage 11.0 11.0
5 1 2 AK Anchorage NaN NaN
6 . . AK Anchorage 55.0 44.0
7 3 4 AK Anchorage NaN NaN
8 . . AK Anchorage 3.0 2.0
9 . . AK Anchorage 5.0 11.0
10 . . AK Anchorage 42.0 22.0
11 . . AK Anchorage 11.0 2.0
12 . . AK Anchorage 444.0 1.0
13 . . AK Anchorage 1.0 2.0
14 0 2 AK Anchorage NaN NaN
15 . . AK Anchorage 1.0 1.0
16 . . AK Anchorage 111.0 11.0
Here's how I would do it using Shapely
, the engine underlying Geopandas
, and I'm going to use randomized data.
from shapely.geometry import LineString
import pandas as pd
import random
def gen_random():
return [random.randint(1, 100) for x in range(20)]
j = {"x1": gen_random(), "y1": gen_random(),
"x2": gen_random(), "y2": gen_random(),}
df = pd.DataFrame(j)
def get_distance(k):
lstr = LineString([(k.x1, k.y1,), (k.x2, k.y2) ])
return lstr.length
df["Dist"] = df.apply(get_distance, axis=1)
Shapely: http://toblerity.org/shapely/manual.html#introduction Geopandas: http://geopandas.org/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.