简体   繁体   English

如何在python 3中的其他字符串列表中的元素中对匹配字符串的列表进行分组

[英]How to group in a list matching strings from elements from other string lists in python 3

I got 744 image files with names with the following scheme: 'mission_code_coord_date1_date2_01_T1/2_Bnumber.TIF'.我得到了 744 个图像文件,其名称采用以下方案:“mission_code_coord_date1_date2_01_T1/2_Bnumber.TIF”。 Like in this list, for example:就像在这个列表中一样,例如:

files = [
'LM02_L1TP_028046_19760327_20180424_01_T2_B6.TIF', #--¬
'LM02_L1TP_028047_19760327_20180424_01_T2_B6.TIF', #---note match except in the 'coord' part
'LT05_L1TP_026046_19951010_20170106_01_T1_B5.TIF',
'LT05_L1TP_026047_19951010_20170107_01_T1_B5.TIF',
'LC08_L1TP_026047_20150713_20170226_01_T1_B1.TIF']
#---------^-----^
#         9    15

The objective is to group the files in sublists for those whose 'mission_code' and 'date1_date2_01_T1/2_Bnumber.TIF' matches, then the output would be an array like this:目标是将那些“mission_code”和“date1_date2_01_T1/2_Bnumber.TIF”匹配的文件分组在子列表中,然后输出将是这样的数组:

ord_files=[
    ['LM02_L1TP_028046_19760327_20180424_01_T2_B6.TIF','LM02_L1TP_028047_19760327_20180424_01_T2_B6.TIF'],
    ['LT05_L1TP_026046_19951010_20170106_01_T1_B5.TIF','LT05_L1TP_026047_19951010_20170107_01_T1_B5.TIF'],
    ['LC08_L1TP_026047_20150713_20170226_01_T1_B1.TIF','']]

Some files have a pair, triplet or they are alone.有些文件有一对、三元组或者它们是单独的。 My idea was remove the string from the coord part in a new list, mo_files, so that could be easy to do a filter and then with a conditional create the otput list, ord_files.我的想法是从新列表 mo_files 中的coord部分中删除字符串,这样可以很容易地进行过滤,然后有条件地创建 otput 列表 ord_files。

On that mood so far I have tried things like:到目前为止,在这种心情下,我尝试了以下方法:

for k in range(len(files)):
    mo_files[k][:] = files[k][9] + files[k][15]

Only im getting errors like IndexError: list index out of range There is a simpler or better method?.只有我收到类似IndexError: list index out of range错误IndexError: list index out of range有更简单或更好的方法吗?。

Thanks.谢谢。

you can use:您可以使用:

d = {} # you can also use collections.defaultdict

for f in files:
    d.setdefault(tuple(e for i, e in enumerate(f.split('_')) if i != 2), []).append(f)
list(d.values())

output:输出:

[['LM02_L1TP_028046_19760327_20180424_01_T2_B6.TIF',
  'LM02_L1TP_028047_19760327_20180424_01_T2_B6.TIF'],
 ['LT05_L1TP_026046_19951010_20170106_01_T1_B5.TIF'],
 ['LT05_L1TP_026047_19951010_20170107_01_T1_B5.TIF'],
 ['LC08_L1TP_026047_20150713_20170226_01_T1_B1.TIF']]

or you can use:或者你可以使用:

from collections import defaultdict

d = defaultdict(list) 
for f in files:
    d[tuple(e for i, e in enumerate(f.split('_')) if i != 2)].append(f)

list(d.values())

this version is a bot faster这个版本是一个更快的机器人

If you're into pandas :如果你喜欢pandas

import pandas as pd
df = pd.DataFrame(files, columns=["filename"])                                                                                                                                 

# indeed define a "key" that is the whole string without 'coord' part
df["key"] = df.filename.apply(lambda s: s[:9]+s[16:])                                                 

Now all you have to do is groupby and aggregate using list :现在您所要做的就是使用list groupby和聚合:

>>> df.groupby("key").filename.apply(list).values                                                                                                                                  
array([list(['LC08_L1TP_026047_20150713_20170226_01_T1_B1.TIF']),
       list(['LM02_L1TP_028046_19760327_20180424_01_T2_B6.TIF', 'LM02_L1TP_028047_19760327_20180424_01_T2_B6.TIF']),
       list(['LT05_L1TP_026046_19951010_20170106_01_T1_B5.TIF']),
       list(['LT05_L1TP_026047_19951010_20170107_01_T1_B5.TIF'])],
      dtype=object)

By the way, if you're not sure whether indices could change within the 700+ files, then a more stable solution is to make things using _ -splitting:顺便说一句,如果您不确定 700 多个文件中的索引是否会发生变化,那么更稳定的解决方案是使用_ -splitting 进行处理:

df["key"] = df.filename.apply(
    lambda filename: "_".join([part for idx, part in enumerate(filename.split("_")) if idx != 2])
)  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python-匹配2个列表中的字符串 - Python - Matching strings from 2 lists python - 如何根据与其他元素的距离对python中的列表元素进行分组? - How to group elements of a list in python based on their distance from other elements? Python,来自其他列表的元素对列表 - Python, list of pairs of elements from other lists 从包含来自 python 中的其他两个列表的任何字符串的列表中获取元素 - Getting elements from a list that contain any string from two other lists in python 如何检查列表的字符串元素是否在数据框/其他列表中(python) - How to check if string elements of lists are in dataframe/other list (python) Python 匹配两个列表之间的列表元素中的部分字符串 - Python matching partial strings in list elements between two lists 如何从列表中删除所有符合特定条件的元素? - How to remove all elements matching a specific criteria from a list of lists? 如何使用Python中的其他两个列表创建字符串列表? - How can I create a list of strings from two other lists in Python? 从 Python 中的字符串列表列表中获取连接字符串 - Get joined string from list of lists of strings in Python 在python中,如何通过匹配原始列表中的字符串模式从字符串列表中提取子列表 - In python, how do i extract a sublist from a list of strings by matching a string pattern in the original list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM