[英]Calculate sum of distances travelled for each unique ID
I have a data-frame which has three columns.我有一个包含三列的数据框。 One column contains x-coordinates, another column with y-coordinates.
一列包含 x 坐标,另一列包含 y 坐标。 also, as you can see, there is a 'trackid' column -- this column associates all of the x and y coordinates with a specific, unique trackid.
此外,如您所见,还有一个“trackid”列——该列将所有 x 和 y 坐标与特定的唯一 trackid 相关联。
trackiD X_COORDINATES Y_COORDINATES
2 542.299805 23.388090
2 544.108215 23.575758
2 545.300598 23.962421
2 546.417053 25.049328
2 546.198669 24.830357
2 546.724915 24.916084
2 547.037048 24.918982
2 547.011963 24.785202
2 547.649231 24.845772
3 547.600525 24.613401
3 547.891479 24.268734
3 548.580505 24.459103
3 548.144409 23.915531
3 548.626770 23.922005
4 548.527222 24.134670
4 548.504211 23.642254
4 548.936584 24.028818
4 548.627869 23.295454
What I am trying to do is the following:我正在尝试做的是以下内容:
Here is my current code -- it runs, but the issue is , it outputs a list with just one single, large, likely incorrect value (displayed below).这是我当前的代码——它可以运行,但问题是,它输出一个列表,其中只有一个大的、可能不正确的值(如下所示)。 also the 'value' variable seems to have been cut off and displayed across multiple lines here on stackoverflow but this is not the case when I run it in jupyter notebook.
此外,“值”变量似乎已被切断并在 stackoverflow 上的多行中显示,但当我在 jupyter notebook 中运行它时,情况并非如此。
def pythag_dis(U_id):
c = data.Unique_id == U_id
df = data[c]
df.reset_index(inplace = True)
k = sorted(df.trackId.unique())
i = 0
j = 1
length = len(k)
while i < length:
condition = df.trackId == k[i]
df2 = df[condition]
df2.reset_index(inplace = True)
value =
math.sqrt((df.Object_Center_0.iloc[j] -
df.Object_Center_0.iloc[i])**2 +
(df.Object_Center_1.iloc[j] -
df.Object_Center_1.iloc[i])**2)
mylist = []
mylist.append(value)
fulldistance = sum(mylist)
mylist2 = []
mylist2.append(fulldistance)
i+=1
return mylist2
pythag_dis('1CCM0701')
OUTPUT: [1976.075585650214]
First create two new columns, X_SHIFTED
and Y_SHIFTED
that represents the next point's coordinates for each track ID.首先创建两个新列,
X_SHIFTED
和Y_SHIFTED
,代表每个轨道 ID 的下一个点的坐标。 We do this by combining df.groupby
and df.shift
:我们通过结合
df.groupby
和df.shift
做到这df.shift
:
df[['X_SHIFTED', 'Y_SHIFTED']] = df.groupby('trackiD').shift()
Then, simply use the euclidean distance formula between points ( X_COORDINATES
, Y_COORDINATES
) and ( X_SHIFTED
, Y_SHIFTED
).然后,只需使用点 (
X_COORDINATES
, Y_COORDINATES
) 和 ( X_SHIFTED
, Y_SHIFTED
) 之间的欧几里得距离公式。 We can do this using df.apply
row-wise ( axis=1
), along with math.dist
:我们可以使用
df.apply
row-wise ( axis=1
) 和math.dist
来做到这math.dist
:
import math
df['DIST'] = df.apply(
lambda row: math.dist(
(row['X_COORDINATES'], row['Y_COORDINATES']),
(row['X_SHIFTED'], row['Y_SHIFTED'])
), axis=1)
output:输出:
trackiD X_COORDINATES Y_COORDINATES X_SHIFTED Y_SHIFTED DIST
0 2 542.299805 23.388090 NaN NaN NaN
1 2 544.108215 23.575758 542.299805 23.388090 1.818122
2 2 545.300598 23.962421 544.108215 23.575758 1.253509
3 2 546.417053 25.049328 545.300598 23.962421 1.558152
4 2 546.198669 24.830357 546.417053 25.049328 0.309257
5 2 546.724915 24.916084 546.198669 24.830357 0.533183
6 2 547.037048 24.918982 546.724915 24.916084 0.312146
7 2 547.011963 24.785202 547.037048 24.918982 0.136112
8 2 547.649231 24.845772 547.011963 24.785202 0.640140
9 3 547.600525 24.613401 NaN NaN NaN
10 3 547.891479 24.268734 547.600525 24.613401 0.451054
11 3 548.580505 24.459103 547.891479 24.268734 0.714841
12 3 548.144409 23.915531 548.580505 24.459103 0.696886
13 3 548.626770 23.922005 548.144409 23.915531 0.482404
14 4 548.527222 24.134670 NaN NaN NaN
15 4 548.504211 23.642254 548.527222 24.134670 0.492953
16 4 548.936584 24.028818 548.504211 23.642254 0.579981
17 4 548.627869 23.295454 548.936584 24.028818 0.795693
To get each track's sum of distances, you can then use:要获得每个轨道的距离总和,您可以使用:
df.groupby('trackiD')['DIST'].sum()
output:输出:
trackiD
2 6.560621
3 2.345185
4 1.868628
Name: DIST, dtype: float64
A possible solution using Pandas: I use pandas groupby shift to match the coordinates, calculate the distance and then sum the distance in the groups:使用 Pandas 的可能解决方案:我使用 Pandas groupby shift 来匹配坐标,计算距离,然后对组中的距离求和:
import math
import numpy as np
import pandas as pd
def distance(row):
x1, y1, x2, y2 = row["X_COORDINATES"], row["Y_COORDINATES"], row["X2"], row["Y2"]
if np.isnan(x2) or np.isnan(y2):
return 0
return math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)
df["X2"] = df.groupby("trackiD")["X_COORDINATES"].shift(-1)
df["Y2"] = df.groupby("trackiD")["Y_COORDINATES"].shift(-1)
df["distance"] = df.apply(distance, axis=1)
df.groupby("trackiD")["distance"].sum()
Output:输出:
trackiD
2 6.560621
3 2.345185
4 1.868628
Name: distance, dtype: float64
Test dataframe:测试数据框:
df = pd.DataFrame(
{
"trackiD": {
0: 2,
1: 2,
2: 2,
3: 2,
4: 2,
5: 2,
6: 2,
7: 2,
8: 2,
9: 3,
10: 3,
11: 3,
12: 3,
13: 3,
14: 4,
15: 4,
16: 4,
17: 4,
},
"X_COORDINATES": {
0: 542.299805,
1: 544.108215,
2: 545.300598,
3: 546.417053,
4: 546.198669,
5: 546.724915,
6: 547.037048,
7: 547.011963,
8: 547.649231,
9: 547.600525,
10: 547.891479,
11: 548.580505,
12: 548.144409,
13: 548.62677,
14: 548.527222,
15: 548.504211,
16: 548.936584,
17: 548.627869,
},
"Y_COORDINATES": {
0: 23.38809,
1: 23.575758,
2: 23.962421,
3: 25.049328,
4: 24.830357,
5: 24.916084,
6: 24.918982,
7: 24.785202,
8: 24.845772,
9: 24.613401,
10: 24.268734,
11: 24.459103,
12: 23.915531,
13: 23.922005,
14: 24.13467,
15: 23.642254,
16: 24.028818,
17: 23.295454,
},
}
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.