I am a python newbie and I have been trying to sort (and extract) values from a tuple based on values on a list, but so far, my code seems really slow.
So, I have a list like so:
x = ["d5b44796d43c4bf5a0f252aeb49738f5", "04d0e11f8ceb4b128fa723181369ba1a", "6244dd8bfee44a61800a25d9f2e6f743", "662ae26640a44a37816daa6e85ef4972", "7d5e1f59f7984495877a059bea643954"]
the, I have a tuple like so:
y = [(31, u'dir/04d0e11f8ceb4b128fa723181369ba1a.mov'), (32, u'dir/d5b44796d43c4bf5a0f252aeb49738f5.pdf'), (66, u'dir/6244dd8bfee44a61800a25d9f2e6f743.jpg'), (34, u'dir/662ae26640a44a37816daa6e85ef4972.doc'), (33, u'dir/7d5e1f59f7984495877a059bea643954.ppt')]
I would like to get the id
from y
if the element in x
is present in y[i][1]
. So, something like this:
id_list=[]
for i in x:
for j in y:
if i in j[1]:
try:
id_list.append(j[0])
except:
pass
break
else:
pass
I get:
id_list = [32, 31, 66, 34, 33]
Also, the result set has to maintain the order in x
. The above loop does this.
The problem is that the above code is very slow (ashamed of it!) - my x
is in 1000's and so is y
.
So I guess my question is if there a better way to write the above code? I was thinking iterators here but was not entirely sure how to write one in this case.
id_list = [j[0] for j in sorted(y, key=lambda e: x.index(e[1].split('/')[-1].split('.')[0]))]
This can be improved if x was a dict
since lookup will be faster, so we'll use OrderedDict
to maintain the order:
import collections
from os.path import basename, splitext
x = collections.OrderedDict((e, i) for i, e in enumerate(x))
id_list = [j[0] for j in sorted(y, key=lambda e: x[splitext(basename(e[1]))[0]])]
In [3]y1=[elem[1].strip('dir').split('.')[0] for elem in y]
In [4]: res=[(i,j[0]) for i in x for j in y1 if i in j ]
In [5]: res
Out[5]:
[('04d0e11f8ceb4b128fa723181369ba1a', 31),
('6244dd8bfee44a61800a25d9f2e6f743', 66),
('662ae26640a44a37816daa6e85ef4972', 34),
('7d5e1f59f7984495877a059bea643954', 33)]
In [6]: [elem[1] for elem in res]
Out[6]: [31, 66, 34, 33]
If you want to maintain the order in x, you need to extract all ids in y
and put them in a set, then iterator over x to check whether an item is in the set:
>>> x = ["d5b44796d43c4bf5a0f252aeb49738f5", "04d0e11f8ceb4b128fa723181369ba1a", "6244dd8bfee44a61800a25d9f2e6f743", "662ae26640a44a37816daa6e85ef4972", "7d5e1f59f7984495877a059bea643954"]
>>> y = [(31, u'dir/04d0e11f8ceb4b128fa723181369ba1a.mov'), (32, u'dir/d5b44796d43c4bf5a0f252aeb49738f5.pdf'), (66, u'dir/6244dd8bfee44a61800a25d9f2e6f743.jpg'), (34, u'dir/662ae26640a44a37816daa6e85ef4972.doc'), (33, u'dir/7d5e1f59f7984495877a059bea643954.ppt')]
>>> s = set()
>>> for e in y:
... r = re.match(r'^dir/(.*)\.', e[1])
... if r:
... s.add(r.group(1))
>>> [e for e in x if e in s]
x = ["d5b44796d43c4bf5a0f252aeb49738f5", "04d0e11f8ceb4b128fa723181369ba1a", "6244dd8bfee44a61800a25d9f2e6f743", "662ae26640a44a37816daa6e85ef4972", "7d5e1f59f7984495877a059bea643954"]
xset = set(x)
y = [(31, u'dir/04d0e11f8ceb4b128fa723181369ba1a.mov'), (32, u'dir/d5b44796d43c4bf5a0f252aeb49738f5.pdf'), (66, u'dir/6244dd8bfee44a61800a25d9f2e6f743.jpg'), (34, u'dir/662ae26640a44a37816daa6e85ef4972.doc'), (33, u'dir/7d5e1f59f7984495877a059bea643954.ppt')]
print [num for num, path in y if path.split('/')[1].split('.')[0] in xset]
In this answer : use [:-4]
may not be a good idea, what if we have a dir/04d0e11f8ceb4b128fa723181369ba1a.rmvb
? I'd suggest using os.path.splitext(os.path.basename(thefilepath))[0]
to get the file name.
so my idea is: we map the element to the id first, yy
should be:
{u'7d5e1f59f7984495877a059bea643954': 33,u'6244dd8bfee44a61800a25d9f2e6f743': 66, u'662ae26640a44a37816daa6e85ef4972': 34, u'04d0e11f8ceb4b128fa723181369ba1a': 31, u'd5b44796d43c4bf5a0f252aeb49738f5': 32}
and the we get the id using yy[element]
, and the order should be as before.
The solution:
from os import path
yy = {path.splitext(path.basename(j))[0]:i for (i, j) in y}
xx = [yy[i] for i in x]
print(xx)
# output
[32, 31, 66, 34, 33]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.