繁体   English   中英

使用pandas作为距离矩阵,然后获得相关距离的子数据帧

[英]Using pandas as a distance matrix and then getting a sub-dataframe of relevant distances

我已经创建了一个pandas df,它具有位置i和位置j之间的距离。 从起点P1和终点P2开始,我想找到子数据帧(距离矩阵),其具有df的一个轴具有P1,P2,而另一个轴具有其余的索引。

我正在使用Pandas DF,因为我认为它是最有效的方式

dm_dict = # distance matrix in dict form where you can call dm_dict[i][j] and get the distance from i to j
dm_df = pd.DataFrame().from_dict(dm_dict)
P1 = dm_df.max(axis=0).idxmax()
P2 = dm_df[i].idxmax()
route = [i, j]
remaining_locs = dm_df[dm_df[~dm_df.isin(route)].isin(route)]
while not_done:
    # go through the remaining_locs until found all the locations are added.

没有错误消息,但remaining_locs df充满了nan而不是距离的df。

使用dm_df[~dm_df.isin(route)].isin(route)似乎给了我一个准确的布尔df。


样本数据,它在技术上是半径距离,但欧几里德应该可以填充矩阵:

import numpy

def dist(i, j):
    a = numpy.array((i[1], i[2]))
    b = numpy.array((j[1], j[2]))
    return numpy.linalg.norm(a-b)

locations = [
    ("Ottawa", 45.424722,-75.695),
    ("Edmonton", 53.533333,-113.5),
    ("Victoria", 48.428611,-123.365556), 
    ("Winnipeg", 49.899444,-97.139167), 
    ("Fredericton",  49.899444,-97.139167), 
    ("StJohns", 47.561389, -52.7125),
    ("Halifax", 44.647778, -63.571389), 
    ("Toronto", 43.741667, -79.373333),
    ("Charlottetown",46.238889, -63.129167),
    ("QuebecCity",46.816667, -71.216667 ),
    ("Regina", 50.454722, -104.606667),
    ("Yellowknife", 62.442222, -114.3975),
    ("Iqaluit", 63.748611, -68.519722)
]

dm_dict = {i: {j: dist(i, j) for j in locations if j != i} for i in locations}

看起来你想要scipy的distance_matrix

df = pd.DataFrame(locations)

x = df[[1,2]]
dm = pd.DataFrame(distance_matrix(x,x),
                  index=df[0],
                  columns=df[0])

输出:

+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
|                |  Ottawa    | Edmonton   | Victoria   | Winnipeg   | Fredericton  |  StJohns   |  Halifax   |  Toronto   | Charlottetown  | QuebecCity  |  Regina    | Yellowknife  |  Iqaluit  |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
| 0              |            |            |            |            |              |            |            |            |                |             |            |              |           |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
| Ottawa         | 0.000000   | 38.664811  | 47.765105  | 21.906059  | 21.906059    | 23.081609  | 12.148481  | 4.045097   | 12.592181      | 4.689667    | 29.345960  | 42.278586    | 19.678657 |
| Edmonton       | 38.664811  | 0.000000   | 11.107987  | 16.759535  | 16.759535    | 61.080146  | 50.713108  | 35.503607  | 50.896264      | 42.813477   | 9.411122   | 8.953983     | 46.125669 |
| Victoria       | 47.765105  | 11.107987  | 0.000000   | 26.267600  | 26.267600    | 70.658378  | 59.913580  | 44.241193  | 60.276176      | 52.173796   | 18.867990  | 16.637528    | 56.945306 |
| Winnipeg       | 21.906059  | 16.759535  | 26.267600  | 0.000000   | 0.000000     | 44.488147  | 33.976105  | 18.802741  | 34.206429      | 26.105163   | 7.488117   | 21.334745    | 31.794214 |
| Fredericton    | 21.906059  | 16.759535  | 26.267600  | 0.000000   | 0.000000     | 44.488147  | 33.976105  | 18.802741  | 34.206429      | 26.105163   | 7.488117   | 21.334745    | 31.794214 |
| StJohns        | 23.081609  | 61.080146  | 70.658378  | 44.488147  | 44.488147    | 0.000000   | 11.242980  | 26.933071  | 10.500284      | 18.519147   | 51.974763  | 63.454538    | 22.625084 |
| Halifax        | 12.148481  | 50.713108  | 59.913580  | 33.976105  | 33.976105    | 11.242980  | 0.000000   | 15.827902  | 1.651422       | 7.946971    | 41.444115  | 53.851052    | 19.731392 |
| Toronto        | 4.045097   | 35.503607  | 44.241193  | 18.802741  | 18.802741    | 26.933071  | 15.827902  | 0.000000   | 16.434995      | 8.717042    | 26.111037  | 39.703942    | 22.761342 |
| Charlottetown  | 12.592181  | 50.896264  | 60.276176  | 34.206429  | 34.206429    | 10.500284  | 1.651422   | 16.434995  | 0.000000       | 8.108112    | 41.691201  | 53.767927    | 18.320711 |
| QuebecCity     | 4.689667   | 42.813477  | 52.173796  | 26.105163  | 26.105163    | 18.519147  | 7.946971   | 8.717042   | 8.108112       | 0.000000    | 33.587610  | 45.921044    | 17.145385 |
| Regina         | 29.345960  | 9.411122   | 18.867990  | 7.488117   | 7.488117     | 51.974763  | 41.444115  | 26.111037  | 41.691201      | 33.587610   | 0.000000   | 15.477744    | 38.457705 |
| Yellowknife    | 42.278586  | 8.953983   | 16.637528  | 21.334745  | 21.334745    | 63.454538  | 53.851052  | 39.703942  | 53.767927      | 45.921044   | 15.477744  | 0.000000     | 45.896374 |
| Iqaluit        | 19.678657  | 46.125669  | 56.945306  | 31.794214  | 31.794214    | 22.625084  | 19.731392  | 22.761342  | 18.320711      | 17.145385   | 38.457705  | 45.896374    | 0.000000  |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+

我很确定这就是我想要的:

filtered = dm_df.filter(items=route,axis=1).filter(items=set(locations).difference(set(route)), axis=0)

过滤是一个[2行×10列]的df,然后我可以从那里找到最小值

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM