Iterate over rows, compare dates and append to list

Question

Here is an example of my csv file:

EMPL_NO,ADRESSE,DIST_KM,DIST_MIN,DATE_INTERNE_INSPECTION
5,H4N 1P9,541,60,2023-06-03
5,H4N 1P9,541,60,2024-06-03
5,H4N 1P9,541,60,2023-05-29
5,H4N 1P9,541,60,2024-05-29
5,H4N 1P9,541,60,2023-06-05
5,H4N 1P9,541,60,2024-06-05
5,H4N 1P9,541,60,2026-06-05
12,H4N 1G4,503,40,2021-06-05
12,H4N 1G4,503,40,2023-06-05

EMPL_NO is my 'primary key'.

So, for every EMPL_NO , I need to check the corresponding dates, and I need to compare them with each other. I need to regroup them in different groups. It should regroup the values that have at most 90 days difference between them. And for the others, it should display them in another group.

For example, the expect output for the df seen above should be:

5, 2023-06-03, 2023-05-29, 2023-06-05
5, 2024-06-03, 2024-05-29, 2024-06-05
5, 2026-06-05
12, 2021-06-05
12, 2023-06-05

Can I get a little help if possible?

Answer 1

Here is a non-pandas solution:

import csv 
import itertools as it 
import datetime as dt 

day_range=90

with open(fn) as f_in:
    r=csv.reader(f_in)
    header=next(r)
    data=sorted([row for row in r], key=lambda x:(int(x[0]), x[-1]))
    for k,v in it.groupby(data, key=lambda x: x[0]):
        grp=list(v)
        for sl in grp:
            sl[-1]=dt.date(*tuple(map(int,sl[-1].split('-'))))
        while grp:
            rng=[grp.pop(0)]
            while grp and (grp[0][-1]-rng[-1][-1]).days<=day_range:
                rng.append(grp.pop(0))
            print('{}, {}'.format(k,', '.join([str(e[-1]) for e in rng])))

With a file of the example given, prints:

5, 2023-05-29, 2023-06-03, 2023-06-05
5, 2024-05-29, 2024-06-03, 2024-06-05
5, 2026-06-05
12, 2021-06-05
12, 2023-06-05

Here is the same thing in Pandas:

import pandas as pd
import itertools as it 

day_range=90

data=pd.read_csv(fn, parse_dates=['DATE_INTERNE_INSPECTION'])

data.sort_values(by=['EMPL_NO', 'DATE_INTERNE_INSPECTION'],inplace=True)

data['group']=(data['DATE_INTERNE_INSPECTION'].diff() 
                > pd.Timedelta(days=day_range)).cumsum()

for k,v in it.groupby(data.iterrows(),
            key=lambda row: (row[1]['EMPL_NO'], row[1]['group'])):
    row=', '.join([str(row[1]['DATE_INTERNE_INSPECTION'].date()) for row in v])
    print('{}, {}'.format(k[0],row))
# same output

If you want to add fields to the output, you would do it in the last line of each of these.

Iterate over rows, compare dates and append to list

Question

1 answers

solution1
1 ACCPTED 2021-11-11 14:33:27

Iterate over rows, compare dates and append to list

Question

1 answers

solution1 1 ACCPTED 2021-11-11 14:33:27

solution1
1 ACCPTED 2021-11-11 14:33:27