[Python, pandas]: get km run per day per vehicle

Question

I am trying to output a table, where I can see the km run per day per vehicle, but the calculations I am getting is wrong.

to give a small snippet of the data I have. each vehicle sends its current odometer readout to a central server a few hundred times per day.

TS                          DATE               VEH          ODOMETER
2021-03-12 12:58:15.500     2021-03-12         008              2932
2021-03-12 00:00:21.700     2021-03-12         002             64253
2021-03-12 00:02:21.500     2021-03-12         002             64254
2021-03-12 00:03:41.400     2021-03-12         002             64255
2021-03-12 00:05:38.200     2021-03-12         002             64256
...                         ...                ...             ...
2021-03-12 23:55:88.100     2021-03-12         002             64953
2021-03-12 00:05:38.200     2021-03-13         002             64954

as it should be clear in above example, the first and last odometer readout for vehicle 2 on 2021-03-12 is 64953-64253 = 700 km but the first readout on the next day is 64954, so with grouping the diff() between the first and last odometer value for the day, some km is disappearing, as shown below

def grp_odo(dfObj):
    dfObj['ODOMETER'] = dfObj['ODOMETER'].astype(int)
    dfObj["km"] = dfObj.groupby(["DATE","VEH"])["ODOMETER"].diff()

    sum_km = dfObj.groupby(["DATE","VEH"], as_index=False)["km"].sum()

    return sum_km

dfodo = grp_odo(df[['DATE','VEH', 'ODOMETER']].loc[(~pd.isna(df['ODOMETER']))])

Print the groupby odometer difference per day:

print(dfodo)

            DATE WSTRPVID     km
0     2021-01-01      001  523.0
1     2021-01-01      002  700.0
2     2021-01-01      003  781.0
3     2021-01-01      004    2.0
4     2021-01-01      005  553.0
...          ...      ...    ...
3375  2021-04-09      034  802.0
3376  2021-04-09      035  615.0
3377  2021-04-09      036  778.0
3378  2021-04-09      038  425.0
3379  2021-04-09      039  386.0

print the sum the total km run by vehicle to from start to finish of the grouped data.

print(dfodo[dfodo.VEH== "002"].sum())
km: 36796

Print the sum of highest and lowest odometer value in dataframe

print(df[df.VEH== "002"].groupby('VEH')['ODOMETER'].agg(np.ptp))

VEH
002    36898
Name: ODOMETER, dtype: int64

I would like to have an output, where 2021-03-12 is 701km, so difference between first value of 2021-03-12 and first value of 2021-03-13, is this possible?

Answer 1

Input:

>>> df
                       TS       DATE  VEH  ODOMETER
0 2021-03-12 12:58:15.500 2021-03-12  008    2932.0
1 2021-03-12 00:00:21.700 2021-03-12  002   64253.0
2 2021-03-12 00:02:21.500 2021-03-12  002   64254.0
3 2021-03-12 00:03:41.400 2021-03-12  002   64255.0
4 2021-03-12 00:05:38.200 2021-03-12  002   64256.0
5 2021-03-12 23:55:48.100 2021-03-12  002   64953.0
6 2021-03-12 00:05:38.200 2021-03-13  002   64954.0

Output:

>>> df.assign(TOTAL=df.groupby("VEH")["ODOMETER"].shift(-1) - df["ODOMETER"]) \ 
      .groupby(["DATE", "VEH"]).sum()["TOTAL"]
DATE        VEH
2021-03-12  002    701.0
            008      0.0
2021-03-13  002      0.0
Name: TOTAL, dtype: float64

Answer 2

df = pd.DataFrame({'DATE': ['2021-03-12', '2021-03-12', '2021-03-12', '2021-03-12', '2021-03-12', '2021-03-12', '2021-03-13'],
                  'VEH': ['008', '002', '002', '002', '002', '002', '002',],
                  'ODOMETER': [2932, 64253, 64254, 64255, 64256, 64953, 64954]})

df.sort_values(['VEH', 'DATE'], inplace=True)


         DATE  VEH  ODOMETER
1  2021-03-12  002     64253
2  2021-03-12  002     64254
3  2021-03-12  002     64255
4  2021-03-12  002     64256
5  2021-03-12  002     64953
6  2021-03-13  002     64954
0  2021-03-12  008      2932

Create a column that takes minimum ODOMETER value for each vehicle, date group

dff = df.groupby(['VEH', 'DATE'], as_index=False).agg({'ODOMETER': 'min'})

   VEH        DATE  ODOMETER
0  002  2021-03-12     64253
1  002  2021-03-13     64954
2  008  2021-03-12      2932

The take difference between each date for a vehicle group.

dff['TOTAL_DIST'] = dff.groupby('VEH')['ODOMETER'].shift(-1)-dff.groupby('VEH')['ODOMETER'].shift(0)

   VEH        DATE  ODOMETER  TOTAL_DIST
0  002  2021-03-12     64253       701.0
1  002  2021-03-13     64954         NaN
2  008  2021-03-12      2932         NaN

Answer 3

You can make use of 2 consecutive df.groupby() together with GroupBy.first() and shift() as follows:

df_daily = df.groupby(['DATE', 'VEH'], as_index=False)['ODOMETER'].first()
df_daily['km_diff'] = df_daily.groupby('VEH')['ODOMETER'].shift(-1) - df_daily.groupby('VEH')['ODOMETER'].shift(0)

Test Run

Test data construction

cols= ['TS', 'DATE', 'VEH', 'ODOMETER']
data = [
['2021-03-12 12:58:15.500', '2021-03-12', '008'              , 2932],
['2021-03-13 12:58:15.500', '2021-03-13', '008'              , 3032],
['2021-03-12 00:00:21.700',     '2021-03-12',         '002',    64253],
['2021-03-12 00:02:21.500',     '2021-03-12',         '002', 64254],
['2021-03-12 00:03:41.400',     '2021-03-12',         '002',             64255],
['2021-03-12 00:05:38.200',     '2021-03-12',         '002',             64256],
['2021-03-12 23:55:88.100',     '2021-03-12',         '002',             64953],
['2021-03-12 00:05:38.200',     '2021-03-13',         '002',             64954]
]
df = pd.DataFrame(data, columns=cols)
print(df)


                        TS        DATE  VEH  ODOMETER
0  2021-03-12 12:58:15.500  2021-03-12  008      2932
1  2021-03-13 12:58:15.500  2021-03-13  008      3032       <=== Added this test data
2  2021-03-12 00:00:21.700  2021-03-12  002     64253
3  2021-03-12 00:02:21.500  2021-03-12  002     64254
4  2021-03-12 00:03:41.400  2021-03-12  002     64255
5  2021-03-12 00:05:38.200  2021-03-12  002     64256
6  2021-03-12 23:55:88.100  2021-03-12  002     64953
7  2021-03-12 00:05:38.200  2021-03-13  002     64954

Run new codes

df_daily = df.groupby(['DATE', 'VEH'], as_index=False)['ODOMETER'].first()
df_daily['km_diff'] = df_daily.groupby('VEH')['ODOMETER'].shift(-1) - df_daily.groupby('VEH')['ODOMETER'].shift(0)

Result:

print(df_daily)



         DATE  VEH  ODOMETER  km_diff
0  2021-03-12  002     64253    701.0
1  2021-03-12  008      2932    100.0
2  2021-03-13  002     64954      NaN
3  2021-03-13  008      3032      NaN

[Python, pandas]: get km run per day per vehicle

Question

3 answers

solution1
2 ACCPTED 2021-04-15 13:59:20

solution2
1 2021-04-15 13:52:20

solution3
1 2021-04-15 14:03:15

Test Run

[Python, pandas]: get km run per day per vehicle

Question

3 answers

solution1 2 ACCPTED 2021-04-15 13:59:20

solution2 1 2021-04-15 13:52:20

solution3 1 2021-04-15 14:03:15

Test Run

solution1
2 ACCPTED 2021-04-15 13:59:20

solution2
1 2021-04-15 13:52:20

solution3
1 2021-04-15 14:03:15