I have a data structure in this form:
Song = namedtuple('Song', ['fullpath', 'tags']) # tags is a dictionary
Album = namedtuple('Album', ['album_key', 'songs'])
The data_structure is a list of Albums
There are thousands of albums, with 10-20 songs in each
I'm looking for matches:
for new_album in new_albums:
for old_album in old_albums:
if new_album.album_key == old_album.album_key:
for new_song in new_album.songs:
for old_song in old_album.songs:
if new_song.fullpath == old_song.fullpath:
# do something
break
This is inefficient, mainly because it restarts the loop through old_album for each new_album. One solution is to use dictionaries, but I need to sort and OrderedDict is only ordered by key insertion. Another is to change the list to a dictionary, process, then change back to a list, but that does not seem ideal.
Is there a better way?
You don't have to convert the data into a new format, but you can still use a dict for finding matches:
paths = {}
for album, a_id in zip(albums, xrange(len(albums))):
for song, s_id in zip(album.songs, xrange(len(album.songs))):
if song.fullpath not in paths:
paths[song.fullpath] = (a_id, s_id)
else:
# do something
break
when you get to #do something
you can use the paths[song.fullpath]
to give you [0]
(the album) and [1]
the song that matches. so:
matched_album, matched_song = paths[song.fullpath]
print albums[matched_album].songs[matched_song], "matches!"
Does this help?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.