简体   繁体   English

如何在python中对列表中的两个列表路径的字符串进行排序?

[英]How to sort a list a string of two list path in python?

I have two list that contains the path of files 我有两个包含文件路径的列表

lst_A =['/home/data_A/test_AA_123.jpg',
        '/home/data_A/test_AB_234.jpg',
        '/home/data_A/test_BB_321.jpg',
        '/home/data_A/test_BC_112.jpg',
       ]

lst_B =['/home/data_B/test_AA_222.jpg',
        '/home/data_B/test_CC_444.jpg',
        '/home/data_B/test_AB_555.jpg',
        '/home/data_B/test_BC_777.jpg',
       ]

Based on the lst_A , I want to sort the list B so that the first and second name of basename of two path in A and B should be same. 基于lst_A ,我想对列表B进行排序,以使A和B中两个路径的基名的名字和名字应该相同。 In this case is test_xx . 在这种情况下是test_xx So, the expected short list B is 因此,预期的候选清单B为

lst_B =['/home/data_B/test_AA_222.jpg',
        '/home/data_B/test_AB_555.jpg',
        '/home/data_B/test_CC_444.jpg',
        '/home/data_B/test_BC_777.jpg',
       ]

In additions, I want to indicate which position of two lists have first and second name are same in the basename (such as test_xx ), so the array indicator should be 另外,我想指出两个列表中哪个位置的名字中的名字和名字相同(例如test_xx ),因此数组指示符应为

array_same =[1,1,0,1]

How should I do it in python? 我应该如何在python中做到这一点? I have tried the .sort() function but it returns unexpected result. 我已经尝试过.sort()函数,但是它返回了意外的结果。 Thanks 谢谢

Update: This is my solution 更新:这是我的解决方案

import os
lst_A =['/home/data_A/test_AA_123.jpg',
        '/home/data_A/test_AB_234.jpg',
        '/home/data_A/test_BB_321.jpg',
        '/home/data_A/test_BC_112.jpg',
       ]

lst_B =['/home/data_B/test_AA_222.jpg',
        '/home/data_B/test_CC_444.jpg',
        '/home/data_B/test_AB_555.jpg',
        '/home/data_B/test_BC_777.jpg']

lst_B_sort=[]
same_array=[]
for ind_a, a_name in enumerate(lst_A):
  for ind_b, b_name in enumerate(lst_B):
    print (os.path.basename(b_name).split('_')[1])
    if os.path.basename(b_name).split('_')[1] in os.path.basename(a_name):
        lst_B_sort.append(b_name)
        same_array.append(1)
print(lst_B_sort)
print(same_array)
Output: ['/home/data_B/test_AA_222.jpg', '/home/data_B/test_AB_555.jpg', '/home/data_B/test_BC_777.jpg']

[1, 1, 1]

Because I did not add the element that has not same name 因为我没有添加名称不同的元素

Loop through lst_A , get the filename prefix, then append the element from lst_B with the same prefix to the result list. 遍历lst_A ,获取文件名前缀,然后将lst_B具有相同前缀的元素追加到结果列表中。

Create a set of all the elements from lst_B , and when you add a path to the result, remove it from the set. lst_B创建所有元素的lst_B ,然后在结果中添加路径时,将其从集合中删除。 Then at the end you can go through this set, filling in the blank spaces in the result where there were no matches. 然后,最后您可以遍历此集合,在没有匹配项的结果中填充空白。

lst_A =['/home/data_A/test_AA_123.jpg',
        '/home/data_A/test_AB_234.jpg',
        '/home/data_A/test_BB_321.jpg',
        '/home/data_A/test_BC_112.jpg',
       ]

lst_B =['/home/data_B/test_AA_222.jpg',
        '/home/data_B/test_CC_444.jpg',
        '/home/data_B/test_AB_555.jpg',
        '/home/data_B/test_BC_777.jpg',
       ]

new_lst_B = []
same_array = []
set_B = set(lst_B)
for fn in lst_A:
    prefix = "_".join(os.path.basename(fn).split('_')[:-1])+'_' # This gets test_AA_
    try:
        found_B = next(x for x in lst_B if os.path.basename(x).startswith(prefix))
        new_lst_b.append(found_B)
        same_array.append(1)
        set_B.remove(found_B)
    except StopIteration: # No match found
        new_lst_b.append(None) # Placeholder to fill in
        same_array.append(0)
for missed in set_B:
    index = new_lst_B.index(None)
    new_lst_B[index] = missed

DEMO DEMO

We will discuss the issue with a SIMPLE technique followed by an APPLIED solution. 我们将使用SIMPLE技术和应用解决方案讨论该问题。

SIMPLE 简单

We just focus on sorting the names given a key. 我们只专注于对给定键的名称进行排序。

Given 特定

Simple names and a key list: 简单名称和键列表:

lst_a = "AA AB BB BC EE".split()
lst_b = "AA DD CC AB BC".split()

key_list = [1, 1, 0, 1, 0]

Code

same = sorted(set(lst_a) & set(lst_b))
diff = sorted(set(lst_b) - set(same))

isame, idiff = iter(same), iter(diff)
[next(isame) if x else next(idiff) for x in key_list]
# ['AA', 'AB', 'CC', 'BC', 'DD']

lst_b gets sorted according to elements shared with lst_a first. lst_b根据与lst_a共享的元素进行排序。 Remnants are inserted as desired. 根据需要插入残余物。


Details 细节

This problem is mainly reduced to sorting the intersection of names from both lists. 这个问题主要减少到对两个列表中的名称交集进行排序。 The intersection is a set of common elements called same . 相交是一组称为same的公共元素。 The remnants are in a set called diff . 残留物位于diff We sort same and diff and here's what they look like: 我们的排序samediff ,在这里就是他们的样子:

same
# ['AA', 'AB', 'BC']
diff
# ['CC', 'DD']

Now we just want to pull a value from either list, in order, according to the key. 现在,我们只想根据键从任一列表中按顺序提取值。 We start by iterating the key_list . 我们从迭代key_list开始。 If 1 , pull from the isame iterator. 如果为1 ,则从isame迭代器中提取。 Otherwise, pull from idiff . 否则,请从idiff

Now that we have the basic technique, we can apply it to the more complicated path example. 现在我们有了基本技术,我们可以将其应用于更复杂的路径示例。


APPLIED 应用

Applying this idea to more complicated path-strings: 将此想法应用于更复杂的路径字符串:

Given 特定

import pathlib 


lst_a = "foo/t_AA_a.jpg foo/t_AB_a.jpg foo/t_BB_a.jpg foo/t_BC_a.jpg foo/t_EE_a.jpg".split()
lst_b = "foo/t_AA_b.jpg foo/t_DD_b.jpg foo/t_CC_b.jpg foo/t_AB_b.jpg foo/t_BC_b.jpg".split()

key_list = [1, 1, 0, 1, 0]

# Helper
def get_name(s_path):
    """Return the shared 'name' from a string path.

    Examples
    --------
    >>> get_name("foo/test_xx_a.jpg")
    'test_xx'

    """
    return pathlib.Path(s_path).stem.rsplit("_", maxsplit=1)[0]

Code

Map the names to paths: 将名称映射到路径:

name_path_a = {get_name(p): p for p in lst_a}
name_path_b = {get_name(p): p for p in lst_b}

Names are in dict keys, so directly substitute sets with dict keys: 名称在字典键中,因此直接用字典键替换集:

same = sorted(name_path_a.keys() & name_path_b.keys())
diff = sorted(name_path_b.keys() - set(same))

isame, idiff = iter(same), iter(diff)

Get the paths via names pulled from iterators: 通过从迭代器提取的名称获取路径:

[name_path_b[next(isame)] if x else name_path_b[next(idiff)] for x in key_list]

Output 产量

['foo/t_AA_b.jpg',
 'foo/t_AB_b.jpg',
 'foo/t_CC_b.jpg',
 'foo/t_BC_b.jpg',
 'foo/t_DD_b.jpg']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM