[英]Python: Efficient way of matching slices of strings between two lists
假設我有兩個名稱相似的文件列表,如下所示:
images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2', 'mask_im_3']
我如何能夠有效地刪除不匹配的元素? 我想得到以下信息:
images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2']
我嘗試過執行以下操作:
setA = set([x[-4:] for x in images])
setB = set([x[-4:] for x in masks])
matches = setA.union(setB)
elems = list(matches)
for elem in elems:
result = [x for x in images if x.endswith(elem)]
但這相當幼稚和緩慢,因為我需要遍歷大約 100k 個元素的列表。 知道如何有效地實現這一點嗎?
首先,既然你想要共同的結局,你應該使用交集,而不是聯合:
matches = setA.intersection(setB)
然后matches
已經是一個集合,因此不要將其轉換為列表並循環遍歷它,而是遍歷images
和masks
並檢查集合成員資格。
imgres = [x for x in images if x[-4:] in matches]
mskres = [x for x in masks if x[-4:] in matches]
您的解決方案基本上和它一樣好,您可以將其改進為只運行一次,但如果您存儲中間地圖image_map
# store dict of mapping to original name
image_map = {x[-4:]: x for x in images}
# store all our matches here
matches = []
# loop through your other file names
for mask in masks:
# if this then we have a match!
if mask[-4:] in image_map:
# save the mask
matches.append(mask)
# get the original image name
matches.append(image_map[mask[-4:]])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.