简体   繁体   English

Python:在两个列表之间匹配字符串切片的有效方法

[英]Python: Efficient way of matching slices of strings between two lists

Let's say I have two lists of files with similar names like so:假设我有两个名称相似的文件列表,如下所示:

images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2', 'mask_im_3']

How would I be able to efficiently remove elements that aren't matching?我如何能够有效地删除不匹配的元素? I want to get the following:我想得到以下信息:

images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2']

I've tried doing the following:我尝试过执行以下操作:

setA = set([x[-4:] for x in images])
setB = set([x[-4:] for x in masks])

matches = setA.union(setB)

elems = list(matches)

for elem in elems:
    result = [x for x in images if x.endswith(elem)]

But this is rather naïve and slow as I need to iterate through a list of ~100k elements.但这相当幼稚和缓慢,因为我需要遍历大约 100k 个元素的列表。 Any idea how I can effectively implement this?知道如何有效地实现这一点吗?

First of all, since you want the common endings, you should use intersection, not union:首先,既然你想要共同的结局,你应该使用交集,而不是联合:

matches = setA.intersection(setB)

Then matches is already a set, so instead of converting it to a list and loop over it, loop over images and masks and check for set membership.然后matches已经是一个集合,因此不要将其转换为列表并循环遍历它,而是遍历imagesmasks并检查集合成员资格。

imgres = [x for x in images if x[-4:] in matches]
mskres = [x for x in masks if x[-4:] in matches]

Your solution is basically as good as it gets, you can improve it to just a single run through though if you store an intermediate map image_map您的解决方案基本上和它一样好,您可以将其改进为只运行一次,但如果您存储中间地图image_map

# store dict of mapping to original name
image_map = {x[-4:]: x for x in images}

# store all our matches here
matches = []

# loop through your other file names
for mask in masks:

    # if this then we have a match!
    if mask[-4:] in image_map:

        # save the mask
        matches.append(mask)

        # get the original image name
        matches.append(image_map[mask[-4:]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM