简体   繁体   中英

How to sort a list a string of two list path in python?

I have two list that contains the path of files

lst_A =['/home/data_A/test_AA_123.jpg',
        '/home/data_A/test_AB_234.jpg',
        '/home/data_A/test_BB_321.jpg',
        '/home/data_A/test_BC_112.jpg',
       ]

lst_B =['/home/data_B/test_AA_222.jpg',
        '/home/data_B/test_CC_444.jpg',
        '/home/data_B/test_AB_555.jpg',
        '/home/data_B/test_BC_777.jpg',
       ]

Based on the lst_A , I want to sort the list B so that the first and second name of basename of two path in A and B should be same. In this case is test_xx . So, the expected short list B is

lst_B =['/home/data_B/test_AA_222.jpg',
        '/home/data_B/test_AB_555.jpg',
        '/home/data_B/test_CC_444.jpg',
        '/home/data_B/test_BC_777.jpg',
       ]

In additions, I want to indicate which position of two lists have first and second name are same in the basename (such as test_xx ), so the array indicator should be

array_same =[1,1,0,1]

How should I do it in python? I have tried the .sort() function but it returns unexpected result. Thanks

Update: This is my solution

import os
lst_A =['/home/data_A/test_AA_123.jpg',
        '/home/data_A/test_AB_234.jpg',
        '/home/data_A/test_BB_321.jpg',
        '/home/data_A/test_BC_112.jpg',
       ]

lst_B =['/home/data_B/test_AA_222.jpg',
        '/home/data_B/test_CC_444.jpg',
        '/home/data_B/test_AB_555.jpg',
        '/home/data_B/test_BC_777.jpg']

lst_B_sort=[]
same_array=[]
for ind_a, a_name in enumerate(lst_A):
  for ind_b, b_name in enumerate(lst_B):
    print (os.path.basename(b_name).split('_')[1])
    if os.path.basename(b_name).split('_')[1] in os.path.basename(a_name):
        lst_B_sort.append(b_name)
        same_array.append(1)
print(lst_B_sort)
print(same_array)
Output: ['/home/data_B/test_AA_222.jpg', '/home/data_B/test_AB_555.jpg', '/home/data_B/test_BC_777.jpg']

[1, 1, 1]

Because I did not add the element that has not same name

Loop through lst_A , get the filename prefix, then append the element from lst_B with the same prefix to the result list.

Create a set of all the elements from lst_B , and when you add a path to the result, remove it from the set. Then at the end you can go through this set, filling in the blank spaces in the result where there were no matches.

lst_A =['/home/data_A/test_AA_123.jpg',
        '/home/data_A/test_AB_234.jpg',
        '/home/data_A/test_BB_321.jpg',
        '/home/data_A/test_BC_112.jpg',
       ]

lst_B =['/home/data_B/test_AA_222.jpg',
        '/home/data_B/test_CC_444.jpg',
        '/home/data_B/test_AB_555.jpg',
        '/home/data_B/test_BC_777.jpg',
       ]

new_lst_B = []
same_array = []
set_B = set(lst_B)
for fn in lst_A:
    prefix = "_".join(os.path.basename(fn).split('_')[:-1])+'_' # This gets test_AA_
    try:
        found_B = next(x for x in lst_B if os.path.basename(x).startswith(prefix))
        new_lst_b.append(found_B)
        same_array.append(1)
        set_B.remove(found_B)
    except StopIteration: # No match found
        new_lst_b.append(None) # Placeholder to fill in
        same_array.append(0)
for missed in set_B:
    index = new_lst_B.index(None)
    new_lst_B[index] = missed

DEMO

We will discuss the issue with a SIMPLE technique followed by an APPLIED solution.

SIMPLE

We just focus on sorting the names given a key.

Given

Simple names and a key list:

lst_a = "AA AB BB BC EE".split()
lst_b = "AA DD CC AB BC".split()

key_list = [1, 1, 0, 1, 0]

Code

same = sorted(set(lst_a) & set(lst_b))
diff = sorted(set(lst_b) - set(same))

isame, idiff = iter(same), iter(diff)
[next(isame) if x else next(idiff) for x in key_list]
# ['AA', 'AB', 'CC', 'BC', 'DD']

lst_b gets sorted according to elements shared with lst_a first. Remnants are inserted as desired.


Details

This problem is mainly reduced to sorting the intersection of names from both lists. The intersection is a set of common elements called same . The remnants are in a set called diff . We sort same and diff and here's what they look like:

same
# ['AA', 'AB', 'BC']
diff
# ['CC', 'DD']

Now we just want to pull a value from either list, in order, according to the key. We start by iterating the key_list . If 1 , pull from the isame iterator. Otherwise, pull from idiff .

Now that we have the basic technique, we can apply it to the more complicated path example.


APPLIED

Applying this idea to more complicated path-strings:

Given

import pathlib 


lst_a = "foo/t_AA_a.jpg foo/t_AB_a.jpg foo/t_BB_a.jpg foo/t_BC_a.jpg foo/t_EE_a.jpg".split()
lst_b = "foo/t_AA_b.jpg foo/t_DD_b.jpg foo/t_CC_b.jpg foo/t_AB_b.jpg foo/t_BC_b.jpg".split()

key_list = [1, 1, 0, 1, 0]

# Helper
def get_name(s_path):
    """Return the shared 'name' from a string path.

    Examples
    --------
    >>> get_name("foo/test_xx_a.jpg")
    'test_xx'

    """
    return pathlib.Path(s_path).stem.rsplit("_", maxsplit=1)[0]

Code

Map the names to paths:

name_path_a = {get_name(p): p for p in lst_a}
name_path_b = {get_name(p): p for p in lst_b}

Names are in dict keys, so directly substitute sets with dict keys:

same = sorted(name_path_a.keys() & name_path_b.keys())
diff = sorted(name_path_b.keys() - set(same))

isame, idiff = iter(same), iter(diff)

Get the paths via names pulled from iterators:

[name_path_b[next(isame)] if x else name_path_b[next(idiff)] for x in key_list]

Output

['foo/t_AA_b.jpg',
 'foo/t_AB_b.jpg',
 'foo/t_CC_b.jpg',
 'foo/t_BC_b.jpg',
 'foo/t_DD_b.jpg']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM