简体   繁体   English

计算每个唯一 ID 行驶的距离总和

[英]Calculate sum of distances travelled for each unique ID

I have a data-frame which has three columns.我有一个包含三列的数据框。 One column contains x-coordinates, another column with y-coordinates.一列包含 x 坐标,另一列包含 y 坐标。 also, as you can see, there is a 'trackid' column -- this column associates all of the x and y coordinates with a specific, unique trackid.此外,如您所见,还有一个“trackid”列——该列将所有 x 和 y 坐标与特定的唯一 trackid 相关联。

    trackiD   X_COORDINATES     Y_COORDINATES
        
     2        542.299805        23.388090
     2        544.108215        23.575758
     2        545.300598        23.962421
     2        546.417053        25.049328
     2        546.198669        24.830357
     2        546.724915        24.916084
     2        547.037048        24.918982
     2        547.011963        24.785202
     2        547.649231        24.845772
     3        547.600525        24.613401
     3        547.891479        24.268734
     3        548.580505        24.459103
     3        548.144409        23.915531
     3        548.626770        23.922005
     4        548.527222        24.134670
     4        548.504211        23.642254
     4        548.936584        24.028818
     4        548.627869        23.295454

What I am trying to do is the following:我正在尝试做的是以下内容:

  • take each pair of x and y coordinates and calculate the increments of distance traveled between them using the pythagorean distance formula: (sqrt(x2-x1)^2 + (y2-y1)^2), adding each distance increment to a list, then taking the sum of all increments in the list to get the total distance traveled -- also important to note, I am doing this calculation only for each set of coordinates within a unique trackid.获取每对 x 和 y 坐标并使用勾股距离公式计算它们之间的距离增量:(sqrt(x2-x1)^2 + (y2-y1)^2),将每个距离增量添加到列表中,然后取列表中所有增量的总和以获得行驶的总距离 - 同样重要的是要注意,我仅对唯一 trackid 内的每组坐标进行此计算。 ie. IE。 calculate sum of the distance increments for trackid 2, then do the same process separately for trackid 3 and 4 and so forth -- ultimately storing all the total distances traveled per each unique track ID in a new list.计算 trackid 2 的距离增量总和,然后分别对 trackid 3 和 4 执行相同的过程,依此类推——最终将每个唯一轨道 ID 的所有总行驶距离存储在一个新列表中。

Here is my current code -- it runs, but the issue is , it outputs a list with just one single, large, likely incorrect value (displayed below).这是我当前的代码——它可以运行,但问题是,它输出一个列表,其中只有一个大的、可能不正确的值(如下所示)。 also the 'value' variable seems to have been cut off and displayed across multiple lines here on stackoverflow but this is not the case when I run it in jupyter notebook.此外,“值”变量似乎已被切断并在 stackoverflow 上的多行中显示,但当我在 jupyter notebook 中运行它时,情况并非如此。

       def pythag_dis(U_id):
          c = data.Unique_id == U_id
          df = data[c]
          df.reset_index(inplace = True)
          k = sorted(df.trackId.unique())
          i = 0
          j = 1
          length = len(k)
          while i < length: 
            condition = df.trackId == k[i]
            df2 = df[condition]
            df2.reset_index(inplace = True)
            value = 
           math.sqrt((df.Object_Center_0.iloc[j] - 
           df.Object_Center_0.iloc[i])**2 + 
           (df.Object_Center_1.iloc[j] - 
           df.Object_Center_1.iloc[i])**2)
           mylist = []
           mylist.append(value)
           fulldistance = sum(mylist)
           mylist2 = []
           mylist2.append(fulldistance)
           i+=1
      return mylist2
    pythag_dis('1CCM0701')

OUTPUT: [1976.075585650214]

First create two new columns, X_SHIFTED and Y_SHIFTED that represents the next point's coordinates for each track ID.首先创建两个新列, X_SHIFTEDY_SHIFTED ,代表每个轨道 ID 的下一个点的坐标。 We do this by combining df.groupby and df.shift :我们通过结合df.groupbydf.shift做到这df.shift

df[['X_SHIFTED', 'Y_SHIFTED']] = df.groupby('trackiD').shift()

Then, simply use the euclidean distance formula between points ( X_COORDINATES , Y_COORDINATES ) and ( X_SHIFTED , Y_SHIFTED ).然后,只需使用点 ( X_COORDINATES , Y_COORDINATES ) 和 ( X_SHIFTED , Y_SHIFTED ) 之间的欧几里得距离公式。 We can do this using df.apply row-wise ( axis=1 ), along with math.dist :我们可以使用df.apply row-wise ( axis=1 ) 和math.dist来做到这math.dist

import math

df['DIST'] = df.apply(
    lambda row: math.dist(
        (row['X_COORDINATES'], row['Y_COORDINATES']),
        (row['X_SHIFTED'], row['Y_SHIFTED'])
    ), axis=1)

output:输出:

    trackiD  X_COORDINATES  Y_COORDINATES   X_SHIFTED  Y_SHIFTED      DIST
0         2     542.299805      23.388090         NaN        NaN       NaN
1         2     544.108215      23.575758  542.299805  23.388090  1.818122
2         2     545.300598      23.962421  544.108215  23.575758  1.253509
3         2     546.417053      25.049328  545.300598  23.962421  1.558152
4         2     546.198669      24.830357  546.417053  25.049328  0.309257
5         2     546.724915      24.916084  546.198669  24.830357  0.533183
6         2     547.037048      24.918982  546.724915  24.916084  0.312146
7         2     547.011963      24.785202  547.037048  24.918982  0.136112
8         2     547.649231      24.845772  547.011963  24.785202  0.640140
9         3     547.600525      24.613401         NaN        NaN       NaN
10        3     547.891479      24.268734  547.600525  24.613401  0.451054
11        3     548.580505      24.459103  547.891479  24.268734  0.714841
12        3     548.144409      23.915531  548.580505  24.459103  0.696886
13        3     548.626770      23.922005  548.144409  23.915531  0.482404
14        4     548.527222      24.134670         NaN        NaN       NaN
15        4     548.504211      23.642254  548.527222  24.134670  0.492953
16        4     548.936584      24.028818  548.504211  23.642254  0.579981
17        4     548.627869      23.295454  548.936584  24.028818  0.795693

To get each track's sum of distances, you can then use:要获得每个轨道的距离总和,您可以使用:

df.groupby('trackiD')['DIST'].sum()

output:输出:

trackiD
2    6.560621
3    2.345185
4    1.868628
Name: DIST, dtype: float64

A possible solution using Pandas: I use pandas groupby shift to match the coordinates, calculate the distance and then sum the distance in the groups:使用 Pandas 的可能解决方案:我使用 Pandas groupby shift 来匹配坐标,计算距离,然后对组中的距离求和:

import math
import numpy as np
import pandas as pd

def distance(row):
    x1, y1, x2, y2 = row["X_COORDINATES"], row["Y_COORDINATES"], row["X2"], row["Y2"]
    if np.isnan(x2) or np.isnan(y2):
        return 0
    return math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)

df["X2"] = df.groupby("trackiD")["X_COORDINATES"].shift(-1)
df["Y2"] = df.groupby("trackiD")["Y_COORDINATES"].shift(-1)

df["distance"] = df.apply(distance, axis=1)
df.groupby("trackiD")["distance"].sum()

Output:输出:

trackiD
2    6.560621
3    2.345185
4    1.868628
Name: distance, dtype: float64

Test dataframe:测试数据框:

df = pd.DataFrame(
    {
        "trackiD": {
            0: 2,
            1: 2,
            2: 2,
            3: 2,
            4: 2,
            5: 2,
            6: 2,
            7: 2,
            8: 2,
            9: 3,
            10: 3,
            11: 3,
            12: 3,
            13: 3,
            14: 4,
            15: 4,
            16: 4,
            17: 4,
        },
        "X_COORDINATES": {
            0: 542.299805,
            1: 544.108215,
            2: 545.300598,
            3: 546.417053,
            4: 546.198669,
            5: 546.724915,
            6: 547.037048,
            7: 547.011963,
            8: 547.649231,
            9: 547.600525,
            10: 547.891479,
            11: 548.580505,
            12: 548.144409,
            13: 548.62677,
            14: 548.527222,
            15: 548.504211,
            16: 548.936584,
            17: 548.627869,
        },
        "Y_COORDINATES": {
            0: 23.38809,
            1: 23.575758,
            2: 23.962421,
            3: 25.049328,
            4: 24.830357,
            5: 24.916084,
            6: 24.918982,
            7: 24.785202,
            8: 24.845772,
            9: 24.613401,
            10: 24.268734,
            11: 24.459103,
            12: 23.915531,
            13: 23.922005,
            14: 24.13467,
            15: 23.642254,
            16: 24.028818,
            17: 23.295454,
        },
    }
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算每个客户 ID 的总和? - How to calculate the sum for each customer ID? Pandas - 每个唯一 ID 的累积和 - Pandas - Cumulative sum for each unique id 计算模式与Dna中每个字符串之间的距离之和= {Dna1,…,Dnat} - Calculate the sum of distances between Pattern and each string in Dna = {Dna1, …, Dnat} 如何在python中执行1000次随机游走,每个游走1000个步骤,并采用标准差和行进距离的平均值 - how to do 1000 random walks in python with 1000 steps each and take standard deviation and mean of distances travelled 如何计算每个组 ID 的最后两周总和 - How to calculate last two week sum for each group ID 计算在 DataFrame 中作为数组传递的图形节点之间的距离总和 - Calculate sum of distances between nodes of a graph passed as an array in a DataFrame 如何计算表示每列中存在特定值的总和并计算每个 id 的总数? - How to calculate the sum indicating the presence of particular value in each column and take total count for each id? Pandas - 每个唯一词的总和 - Pandas - Sum for each unique word 计算每个唯一 ID 的两个单独数据框中的列值的增加/减少百分比 - Calculate % increase/decrease for a column value in two separate data frames per each unique ID Python:对于每个唯一的ID,找到它的代码和它的值并计算比率 - Python: For each unique ID, find its code and its value and calculate the ratio
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM