Python：在兩個列表之間匹配字符串切片的有效方法

Question

假設我有兩個名稱相似的文件列表，如下所示：

images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2', 'mask_im_3']

我如何能夠有效地刪除不匹配的元素？ 我想得到以下信息：

images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2']

我嘗試過執行以下操作：

setA = set([x[-4:] for x in images])
setB = set([x[-4:] for x in masks])

matches = setA.union(setB)

elems = list(matches)

for elem in elems:
    result = [x for x in images if x.endswith(elem)]

但這相當幼稚和緩慢，因為我需要遍歷大約 100k 個元素的列表。 知道如何有效地實現這一點嗎？

Answer 1

首先，既然你想要共同的結局，你應該使用交集，而不是聯合：

matches = setA.intersection(setB)

然后matches已經是一個集合，因此不要將其轉換為列表並循環遍歷它，而是遍歷images和masks並檢查集合成員資格。

imgres = [x for x in images if x[-4:] in matches]
mskres = [x for x in masks if x[-4:] in matches]

Answer 2

您的解決方案基本上和它一樣好，您可以將其改進為只運行一次，但如果您存儲中間地圖image_map

# store dict of mapping to original name
image_map = {x[-4:]: x for x in images}

# store all our matches here
matches = []

# loop through your other file names
for mask in masks:

    # if this then we have a match!
    if mask[-4:] in image_map:

        # save the mask
        matches.append(mask)

        # get the original image name
        matches.append(image_map[mask[-4:]])

Python：在兩個列表之間匹配字符串切片的有效方法

問題描述

2 個解決方案

解決方案1
1 2022-06-01 16:28:14

解決方案2
1 2022-06-01 16:28:29

Python：在兩個列表之間匹配字符串切片的有效方法

問題描述

2 個解決方案

解決方案1 1 2022-06-01 16:28:14

解決方案2 1 2022-06-01 16:28:29

解決方案1
1 2022-06-01 16:28:14

解決方案2
1 2022-06-01 16:28:29