简体   繁体   中英

Calculating distance between column values in pandas dataframe

I have attached a sample of my dataset. I have minimal Panda experience, hence, I'm struggling to formulate the problem.

在此处输入图片说明

What I'm trying to do is populate the 'dist' column (cartesian: p1 = (lat1,long1) ; p2 = (lat2,long2) ) for each index based on the state and the county.

Each county may have multiple p1 's. We use the one nearest to p2 when computing the distance. When a county doesn't have a p1 value, we simply use the next one that comes in the sequence.

How do I set up this problem concisely? I can imagine running an iterator over the the county/state but failing to move beyond that.

[EDIT] Here is the data frame head as suggested below. (Ignore the mismatch from the picture)

   lat1 long1 state           county   lat2  long2
0     .     .    AK   Aleutians West   11.0   23.0
1     .     .    AK     Wade Hampton   33.0   11.0
2     .     .    AK      North Slope   55.0   11.0
3     .     .    AK  Kenai Peninsula   44.0   11.0
4     .     .    AK        Anchorage   11.0   11.0
5     1     2    AK        Anchorage    NaN    NaN
6     .     .    AK        Anchorage   55.0   44.0
7     3     4    AK        Anchorage    NaN    NaN
8     .     .    AK        Anchorage    3.0    2.0
9     .     .    AK        Anchorage    5.0   11.0
10    .     .    AK        Anchorage   42.0   22.0
11    .     .    AK        Anchorage   11.0    2.0
12    .     .    AK        Anchorage  444.0    1.0
13    .     .    AK        Anchorage    1.0    2.0
14    0     2    AK        Anchorage    NaN    NaN
15    .     .    AK        Anchorage    1.0    1.0
16    .     .    AK        Anchorage  111.0   11.0

Here's how I would do it using Shapely , the engine underlying Geopandas , and I'm going to use randomized data.

from shapely.geometry import LineString
import pandas as pd
import random


def gen_random():
  return [random.randint(1, 100) for x in range(20)]

j = {"x1": gen_random(), "y1": gen_random(),
     "x2": gen_random(), "y2": gen_random(),}
df = pd.DataFrame(j)


def get_distance(k):
  lstr = LineString([(k.x1, k.y1,), (k.x2, k.y2) ])
  return lstr.length

df["Dist"] = df.apply(get_distance, axis=1)

Shapely: http://toblerity.org/shapely/manual.html#introduction Geopandas: http://geopandas.org/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM