ok so here's an example dataset:
returntime= '9:00'
data1 = {Name:'jim', cardriven: '20123', time:'7:30'}
data1 = {Name:'bob', cardriven: '20123', time:'10:30'}
data1 = {Name:'jim', cardriven: '201111', time:'8:30'}
data1 = {Name:'bob', cardriven: '201314', time:'9:30'}
my problem is that i need to be able to loop over these dictionaries & find the car that both of them have driven & then compare the times they drove them to see who returned the car closest to 9:00
i have tried many loops & created lists etc... but i know theres gotta be a simple way to just say...
for [data1, data2....] who returned the car closest to the time... and here is the info from that record.
thanx in advance
Maybe you can trying using only 1 dict
where each entry in the dict is another dict with the key being maybe the name of the driver or an ID code.
Then you can loop over that dict and find out which dict entries had driven the same car.
Here's a simplified example of what I mean
returntime= '9:00'
data1 = {'Name':'jim', 'cardriven': '20123', 'time': "7:30"}
data2 = {'Name':'bob', 'cardriven': '20123', 'time': "10:30"}
data3 = {'Name':'jim', 'cardriven': '201111', 'time': "8:30"}
dict = {}
dict[0] = data1
dict[1] = data2
dict[2] = data3
for i in range(len(dict)):
if dict[i]["cardriven"] == '20123':
print(dict[i]["Name"])
Output:
jim
bob
Also a pro-tip: you can enter the time into the dict as a datetime object and that would help you greatly in comparing the time.
This will iterate through the data you offered and put cars in a dictionary, which will keep track of whichever car has the closest time to the goal.
import datetime
returntime = "09:00"
data = [
dict(name="Jim", cardriven="20123", time="7:30"),
dict(name="Bob", cardriven="20123", time="10:30"),
dict(name="Jim", cardriven="201111", time="8:30"),
dict(name="Bob", cardriven="201314", time="9:30"),
]
def parsedelta(s):
t = datetime.datetime.strptime(s, "%M:%S")
return datetime.timedelta(minutes=t.minute, seconds=t.second)
deltareturn = parsedelta(returntime)
def diffreturn(s):
return abs(deltareturn.seconds - parsedelta(s).seconds)
cars = {}
for datum in data:
car = datum["cardriven"]
if car not in cars:
cars[car] = datum
continue
if diffreturn(datum["time"]) < diffreturn(cars[car]["time"]):
cars[car] = datum
print(cars)
Since we want to find a car both of them drove in, we could create a dictionary where each key is the car driven and each value is list of name-time pairs as well as a list of cars both drove in. Then compare the times and see who returned it closest to returntime
.
from datetime import datetime
temp = {}
both_drove = []
for data in [data1, data2, data3, data4]:
if data['cardriven'] in temp:
temp[data['cardriven']].append((data['Name'], data['time']))
both_drove.append(data['cardriven'])
else:
temp[data['cardriven']] = [(data['Name'], data['time'])]
returntime = datetime.strptime(returntime, '%H:%M')
for car in both_drove:
p1, p2 = temp[car]
if abs(datetime.strptime(p1[1], '%H:%M') - returntime) > abs(datetime.strptime(p2[1], '%H:%M') - returntime):
print(p2)
else:
print(p1)
Output:
('jim', '7:30')
NB It's not clear which is closer to returntime
, 10:30
or 7:30
.
The test data is a bit funky for the question. You are basically looking for a groupby and sort approach but 2 out of the 3 groups in your test data has only a single entry. Furthermore, for car 20123
, the times are equal distance ( delta_min
in my answer below) from the returntime. In this case, the sort_values
step below won't affect the order. If you know how equal distance entries should be ranked, then that is a next step you can work on.
Nevertheless, I think the best course of action is to convert it into a pandas dateframe and create a pipeline. For this data
data1 = {"Name":'jim', "cardriven": '20123', "time":'7:30'}
data2 = {"Name":'bob', "cardriven": '20123', "time":'10:30'}
data3 = {"Name":'jim', "cardriven": '201111', "time":'8:30'}
data4 = {"Name":'bob', "cardriven": '201314', "time":'9:30'}
We can design a pipeline that uses a modified version of the excellent parsedelta
function proposed in ljmc ´s answer.
import datetime
import pandas as pd
data = pd.DataFrame([data1, data2, data3, data4])
# Name cardriven time
# 0 jim 20123 7:30
# 1 bob 20123 10:30
# 2 jim 201111 8:30
# 3 bob 201314 9:30
def timedelta(time):
t = datetime.datetime.strptime(time, "%H:%M")
return datetime.timedelta(hours=t.hour, minutes=t.minute).seconds / 60
returntime= '9:00'
latest_entries = (
data
.assign(delta_min=lambda d: abs(d["time"].apply(timedelta) - timedelta(returntime)))
.sort_values("delta_min")
.drop("delta_min", axis = 1) # comment this out if you want the minute difference
.drop_duplicates(subset="cardriven")
)
print(latest_entries)
Which gives us
Name cardriven time
2 jim 201111 8:30
0 jim 20123 7:30
3 bob 201314 9:30
Going further, we could simplify the pipeline by passing the timedelta
function directly as the key
parameter in the sort_values
step. We also split the timedelta function.
def _timedelta(tm):
t = datetime.datetime.strptime(tm, "%H:%M")
return datetime.timedelta(hours=t.hour, minutes=t.minute).seconds / 60
def timedelta(time, rtrn_time):
return abs(_timedelta(time) - _timedelta(rtrn_time))
returntime= '9:00'
latest_entries = (
data
.sort_values("time", key=lambda d: d.apply(timedelta, rtrn_time=returntime))
.drop_duplicates(subset="cardriven")
)
print(latest_entries)
Name cardriven time
2 jim 201111 8:30
0 jim 20123 7:30
3 bob 201314 9:30
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.