I have two lists, the first is items
which contains around 250 tuples, each tuple contains 3 elements
(path_to_a_file, size_in_bytes, modified_time)
The second list, result
contains anywhere up to 250 elements, which is the result of a database query which looks up rows based on the paths that are in the items
list. The number of elements in result
depends if those files are in the database already.
each element in result is an row object returned from SQLAlchemy query with attributes for the row values, ( path
, mtime
and hash
are the ones I'm interested in here)
What I'm trying and do is filter out all the elements in items
that are in results
that have the same mtime (and keep track of the number, and total size filtered) and make a new list with items either with a different mtime or that dont exist in result
. items with different mtimes need to be stored (path,size,mtime_from_result,hash_from_result)
and items which weren't in the database (path,size,mtime,None)
.
I hope I'm not making this too localised but I felt I needed to explain what I'm trying to accomplish to ask the question.
I want to try and make this loop as fast as possible but the most important part is making it work as expected.
Is it safe to remove items from the lists as I iterate over them? I noticed iterating forwards has a weird outcome but iterating backwards seems to be ok. Is there a better approach?
I'm removing items that I've matched up ( i.path == j[0]
) because I know the relationship is 1 to 1 and its not going to match again so by reducing the lists I can iterate over it faster in the next iteration, and more importantly I get left with all the unmatched items.
I can't help feel there's a much nicer solution that I'm overlooking, perhaps with list comprehension or generators perhaps.
send_items=[]
for i in result[::-1]:
for j in items[::-1]:
if i.path==j[0]:
result.remove(i) #I think this remove is possibly pointless?
items.remove(j)
if i.mtime==j[2]:
self.num_skipped+=1
self.size_skipped+=j[1]
else:
send_items.append((j[0],j[1],i.mtime,i.hash))
break
send_items.extend(((j[0],j[1],j[2],None) for j in items))
I'd do this as:
def get_send_items(items, results):
send_items = []
results_dict = {i.path:i for i in results}
for p, s, m in items:
result = results_dict.get(p)
if result is None:
send_items.append((p, s, m, None))
elif result.mtime != m:
send_items.append((p, s, result.mtime, result.hash))
return send_items
Here is my analysis of your solution (Assuming both result
and items
are of length N):
result[::-1]
creates a copy of result
so calling result.remove(i)
doesn't affect the iteration, nor would it have anyways. You only loop over result
once, so removing elements is a bit pointless. It only creates extra work. result[::]
to create a copy of result
. items.remove(j)
actually reduces efficiency. remove()
takes O(N) time. So calling it reduces the algorithm's efficiency to O(N^3) from O(N^2). First of all, I am assuming that the file path identifies a file - that they are unique.
We create a dictionary of the results, so we can easily check for membership, and check the values associated with it.
dict_results = {file: (size, modified_time) for file, size, modified_time in results}
We can then use a list comprehension to filter out the elements you don't want:
[(file, size, modified_time) for file, size, modified_time in items if (file not in dict_results) or (not dict_results[file][1] == modified_time)]
Eg:
>>> results = [(1, 1, 1), (2, 2, 3)]
>>> items = [(1, 1, 1), (2, 2, 2), (3, 3, 3)]
>>> dict_results = {file: (size, modified_time) for file, size, modified_time in results}
>>> [(file, size, modified_time) for file, size, modified_time in items if (file not in dict_results) or (not dict_results[file][1] == modified_time)]
[(2, 2, 2), (3, 3, 3)]
How about inserting the results into a set as Marcin suggests, and using a list comprehension to filter the items:
mtimes_set = set(result[2] for result in results)
send_items = (item for item in items if item[2] not in mtimes_set)
Misunderstood the path part. This can still be done (although a bit ugly around the last set of brackets):
path_dict = dict((result[0], result) for result in results)
send_items = (item for item in items if item[0] in path_dict and path_dict[item[0]][2] != item[2])
Here I am creating a dictionary of seen paths, then a generator returning those which have a path in the dict, and that have the different mtime. It could easily be changes to return the path_dict result item instead here instead of item now.
First stab:
items_dict = dict( (el[0], el[1:]) for el in items )
new = []
modified = []
other = []
for res in result:
put_to = None
item = items_dict.get(res.path, (None, None))
if item is (None, None):
put_to = new
elif res.mtime != item[1]:
put_to = modified
else:
put_to = other
put_to.append( (res.path, item) )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.