如何以有效的方式列出图像序列？ Python中的商业序列比较

Question

I have a directory of 9 images: 我有一个9图像的目录：

image_0001, image_0002, image_0003
image_0010, image_0011
image_0011-1, image_0011-2, image_0011-3
image_9999

I would like to be able to list them in an efficient way, like this (4 entries for 9 images): 我希望能够以有效的方式列出它们，就像这样（9个图像有4个条目）：

(image_000[1-3], image_00[10-11], image_0011-[1-3], image_9999)

Is there a way in python, to return a directory of images, in a short/clear way (without listing every file)? 在python中有一种方法，以简短/清晰的方式返回图像目录（不列出每个文件）？

So, possibly something like this: 所以，可能是这样的：

list all images, sort numerically, create a list (counting each image in sequence from start). 列出所有图像，按数字排序，创建一个列表（从开始按顺序计算每个图像）。 When an image is missing (create a new list), continue until original file list is finished. 缺少图像（创建新列表）时，继续直到原始文件列表完成。 Now I should just have some lists that contain non broken sequences. 现在我应该有一些包含非破坏序列的列表。

I'm trying to make it easy to read/describe a list of numbers. 我试图让阅读/描述数字列表变得容易。 If I had a sequence of 1000 consecutive files It could be clearly listed as file[0001-1000] rather than file['0001','0002','0003' etc...] 如果我有1000个连续文件的序列它可以清楚地列为文件[0001-1000]而不是文件['0001'，'0002'，'0003'等...]

Edit1 (based on suggestion): Given a flattened list, how would you derive the glob patterns? Edit1 （基于建议）：给定一个扁平列表，你将如何得出glob模式？

Edit2 I'm trying to break the problem down into smaller pieces. Edit2我试图将问题分解成更小的部分。 Here is an example of part of the solution: data1 works, data2 returns 0010 as 64, data3 (the realworld data) doesn't work: 以下是解决方案的一部分示例：data1工作，data2返回0010为64，data3（realworld数据）不起作用：

# Find runs of consecutive numbers using groupby.  The key to the solution
# is differencing with a range so that consecutive numbers all appear in
# same group.
from operator import itemgetter
from itertools import *

data1=[01,02,03,10,11,100,9999]
data2=[0001,0002,0003,0010,0011,0100,9999]
data3=['image_0001','image_0002','image_0003','image_0010','image_0011','image_0011-2','image_0011-3','image_0100','image_9999']

list1 = []
for k, g in groupby(enumerate(data1), lambda (i,x):i-x):
    list1.append(map(itemgetter(1), g))
print 'data1'
print list1

list2 = []
for k, g in groupby(enumerate(data2), lambda (i,x):i-x):
    list2.append(map(itemgetter(1), g))
print '\ndata2'
print list2

returns: 收益：

data1
[[1, 2, 3], [10, 11], [100], [9999]]

data2
[[1, 2, 3], [8, 9], [64], [9999]]

Answer 1

Here is a working implementation of what you want to achieve, using the code you added as a starting point: 以下是您要实现的工作实现，使用您添加的代码作为起点：

#!/usr/bin/env python

import itertools
import re

# This algorithm only works if DATA is sorted.
DATA = ["image_0001", "image_0002", "image_0003",
        "image_0010", "image_0011",
        "image_0011-1", "image_0011-2", "image_0011-3",
        "image_0100", "image_9999"]

def extract_number(name):
    # Match the last number in the name and return it as a string,
    # including leading zeroes (that's important for formatting below).
    return re.findall(r"\d+$", name)[0]

def collapse_group(group):
    if len(group) == 1:
        return group[0][1]  # Unique names collapse to themselves.
    first = extract_number(group[0][1])  # Fetch range
    last = extract_number(group[-1][1])  # of this group.
    # Cheap way to compute the string length of the upper bound,
    # discarding leading zeroes.
    length = len(str(int(last)))
    # Now we have the length of the variable part of the names,
    # the rest is only formatting.
    return "%s[%s-%s]" % (group[0][1][:-length],
        first[-length:], last[-length:])

groups = [collapse_group(tuple(group)) \
    for key, group in itertools.groupby(enumerate(DATA),
        lambda(index, name): index - int(extract_number(name)))]

print groups

This prints ['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999'] , which is what you want. 这会打印['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999'] ，这就是你想要的。

HISTORY: I initially answered the question backwards, as @Mark Ransom pointed out below. 历史：我最初回答了这个问题，正如@Mark Ransom在下面指出的那样。 For the sake of history, my original answer was: 为了历史，我最初的答案是：

You're looking for glob . 你正在寻找glob 。 Try: 尝试：

import glob
images = glob.glob("image_[0-9]*")

Or, using your example: 或者，使用您的示例：

images = [glob.glob(pattern) for pattern in ("image_000[1-3]*",
    "image_00[10-11]*", "image_0011-[1-3]*", "image_9999*")]
images = [image for seq in images for image in seq]  # flatten the list

Answer 2

Okay, so I found your question to be a fascinating puzzle. 好的，所以我发现你的问题是一个迷人的难题。 I've left how to "compress" the numeric ranges up to you (marked as a TODO), as there are different ways to accomplish that depending on how you like it formatted and if you want the minimum number of elements or the minimum string description length. 我已经离开了如何“压缩”你的数字范围（标记为TODO），因为有不同的方法可以实现这一点，具体取决于你喜欢它的格式，以及你想要最少数量的元素或最小字符串描述长度。

This solution uses a simple regular expression (digit strings) to classify each string into two groups: static and variable. 此解决方案使用简单的正则表达式（数字字符串）将每个字符串分为两组：静态和可变。 After the data is classified, I use groupby to collect the static data into longest matching groups to achieve the summary effect. 对数据进行分类后，我使用groupby将静态数据收集到最长匹配组中，以实现汇总效果。 I mix integer index sentinals into the result (in matchGrouper) so I can re-select the varying parts from all elements (in unpack). 我将整数索引sentinals混合到结果中（在matchGrouper中），这样我就可以从所有元素中重新选择不同的部分（在unpack中）。

import re
import glob
from itertools import groupby
from operator import itemgetter

def classifyGroups(iterable, reObj=re.compile('\d+')):
    """Yields successive match lists, where each item in the list is either
    static text content, or a list of matching values.

     * `iterable` is a list of strings, such as glob('images/*')
     * `reObj` is a compiled regular expression that describes the
            variable section of the iterable you want to match and classify
    """
    def classify(text, pos=0):
        """Use a regular expression object to split the text into match and non-match sections"""
        r = []
        for m in reObj.finditer(text, pos):
            m0 = m.start()
            r.append((False, text[pos:m0]))
            pos = m.end()
            r.append((True, text[m0:pos]))
        r.append((False, text[pos:]))
        return r

    def matchGrouper(each):
        """Returns index of matches or origional text for non-matches"""
        return [(i if t else v) for i,(t,v) in enumerate(each)]

    def unpack(k,matches):
        """If the key is an integer, unpack the value array from matches"""
        if isinstance(k, int):
            k = [m[k][1] for m in matches]
        return k

    # classify each item into matches
    matchLists = (classify(t) for t in iterable)

    # group the matches by their static content
    for key, matches in groupby(matchLists, matchGrouper):
        matches = list(matches)
        # Yield a list of content matches.  Each entry is either text
        # from static content, or a list of matches
        yield [unpack(k, matches) for k in key]

Finally, we add enough logic to perform pretty printing of the output, and run an example. 最后，我们添加了足够的逻辑来执行输出的漂亮打印，并运行一个示例。

def makeResultPretty(res):
    """Formats data somewhat like the question"""
    r = []
    for e in res:
        if isinstance(e, list):
            # TODO: collapse and simplify ranges as desired here
            if len(set(e))<=1:
                # it's a list of the same element
                e = e[0]
            else: 
                # prettify the list
                e = '['+' '.join(e)+']'
        r.append(e)
    return ''.join(r)

fnList = sorted(glob.glob('images/*'))
re_digits = re.compile(r'\d+')
for res in classifyGroups(fnList, re_digits):
    print makeResultPretty(res)

My directory of images was created from your example. 我的图片目录是根据您的示例创建的。 You can replace fnList with the following list for testing: 您可以使用以下列表替换fnList进行测试：

fnList = [
 'images/image_0001.jpg',
 'images/image_0002.jpg',
 'images/image_0003.jpg',
 'images/image_0010.jpg',
 'images/image_0011-1.jpg',
 'images/image_0011-2.jpg',
 'images/image_0011-3.jpg',
 'images/image_0011.jpg',
 'images/image_9999.jpg']

And when I run against this directory, my output looks like: 当我对这个目录运行时，我的输出看起来像：

StackOverflow/3926936% python classify.py
images/image_[0001 0002 0003 0010].jpg
images/image_0011-[1 2 3].jpg
images/image_[0011 9999].jpg

Answer 3

def ranges(sorted_list):
    first = None
    for x in sorted_list:
        if first is None:
            first = last = x
        elif x == increment(last):
            last = x
        else:
            yield first, last
            first = last = x
    if first is not None:
        yield first, last

The increment function is left as an exercise for the reader. increment函数留给读者练习。

Edit: here's an example of how it would be used with integers instead of strings as input. 编辑：这是一个如何使用整数而不是字符串作为输入的示例。

def increment(x): return x+1

list(ranges([1,2,3,4,6,7,8,10]))
[(1, 4), (6, 8), (10, 10)]

For each contiguous range in the input you get a pair indicating the start and end of the range. 对于输入中的每个连续范围，您将获得一对指示范围的开始和结束。 If an element isn't part of a range, the start and end values are identical. 如果元素不是范围的一部分，则起始值和结束值相同。

如何以有效的方式列出图像序列？ Python中的商业序列比较

问题描述

3 个解决方案

解决方案1
6 已采纳 2010-10-13 20:39:16

解决方案2
3 2010-10-13 23:03:56

解决方案3
2 2010-10-13 21:18:34

如何以有效的方式列出图像序列？ Python中的商业序列比较

问题描述

3 个解决方案

解决方案1 6 已采纳 2010-10-13 20:39:16

解决方案2 3 2010-10-13 23:03:56

解决方案3 2 2010-10-13 21:18:34

解决方案1
6 已采纳 2010-10-13 20:39:16

解决方案2
3 2010-10-13 23:03:56

解决方案3
2 2010-10-13 21:18:34