[英]How to list an image sequence in an efficient way? Numercial sequence comparison in Python
I have a directory of 9 images: 我有一个9图像的目录:
image_0001, image_0002, image_0003 image_0010, image_0011 image_0011-1, image_0011-2, image_0011-3 image_9999
I would like to be able to list them in an efficient way, like this (4 entries for 9 images): 我希望能够以有效的方式列出它们,就像这样(9个图像有4个条目):
(image_000[1-3], image_00[10-11], image_0011-[1-3], image_9999)
Is there a way in python, to return a directory of images, in a short/clear way (without listing every file)? 在python中有一种方法,以简短/清晰的方式返回图像目录(不列出每个文件)?
So, possibly something like this: 所以,可能是这样的:
list all images, sort numerically, create a list (counting each image in sequence from start). 列出所有图像,按数字排序,创建一个列表(从开始按顺序计算每个图像)。 When an image is missing (create a new list), continue until original file list is finished.
缺少图像(创建新列表)时,继续直到原始文件列表完成。 Now I should just have some lists that contain non broken sequences.
现在我应该有一些包含非破坏序列的列表。
I'm trying to make it easy to read/describe a list of numbers. 我试图让阅读/描述数字列表变得容易。 If I had a sequence of 1000 consecutive files It could be clearly listed as file[0001-1000] rather than file['0001','0002','0003' etc...]
如果我有1000个连续文件的序列它可以清楚地列为文件[0001-1000]而不是文件['0001','0002','0003'等...]
Edit1 (based on suggestion): Given a flattened list, how would you derive the glob patterns? Edit1 (基于建议):给定一个扁平列表,你将如何得出glob模式?
Edit2 I'm trying to break the problem down into smaller pieces. Edit2我试图将问题分解成更小的部分。 Here is an example of part of the solution: data1 works, data2 returns 0010 as 64, data3 (the realworld data) doesn't work:
以下是解决方案的一部分示例:data1工作,data2返回0010为64,data3(realworld数据)不起作用:
# Find runs of consecutive numbers using groupby. The key to the solution
# is differencing with a range so that consecutive numbers all appear in
# same group.
from operator import itemgetter
from itertools import *
data1=[01,02,03,10,11,100,9999]
data2=[0001,0002,0003,0010,0011,0100,9999]
data3=['image_0001','image_0002','image_0003','image_0010','image_0011','image_0011-2','image_0011-3','image_0100','image_9999']
list1 = []
for k, g in groupby(enumerate(data1), lambda (i,x):i-x):
list1.append(map(itemgetter(1), g))
print 'data1'
print list1
list2 = []
for k, g in groupby(enumerate(data2), lambda (i,x):i-x):
list2.append(map(itemgetter(1), g))
print '\ndata2'
print list2
returns: 收益:
data1
[[1, 2, 3], [10, 11], [100], [9999]]
data2
[[1, 2, 3], [8, 9], [64], [9999]]
Here is a working implementation of what you want to achieve, using the code you added as a starting point: 以下是您要实现的工作实现,使用您添加的代码作为起点:
#!/usr/bin/env python
import itertools
import re
# This algorithm only works if DATA is sorted.
DATA = ["image_0001", "image_0002", "image_0003",
"image_0010", "image_0011",
"image_0011-1", "image_0011-2", "image_0011-3",
"image_0100", "image_9999"]
def extract_number(name):
# Match the last number in the name and return it as a string,
# including leading zeroes (that's important for formatting below).
return re.findall(r"\d+$", name)[0]
def collapse_group(group):
if len(group) == 1:
return group[0][1] # Unique names collapse to themselves.
first = extract_number(group[0][1]) # Fetch range
last = extract_number(group[-1][1]) # of this group.
# Cheap way to compute the string length of the upper bound,
# discarding leading zeroes.
length = len(str(int(last)))
# Now we have the length of the variable part of the names,
# the rest is only formatting.
return "%s[%s-%s]" % (group[0][1][:-length],
first[-length:], last[-length:])
groups = [collapse_group(tuple(group)) \
for key, group in itertools.groupby(enumerate(DATA),
lambda(index, name): index - int(extract_number(name)))]
print groups
This prints ['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999']
, which is what you want. 这会打印
['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999']
,这就是你想要的。
HISTORY: I initially answered the question backwards, as @Mark Ransom pointed out below. 历史:我最初回答了这个问题,正如@Mark Ransom在下面指出的那样。 For the sake of history, my original answer was:
为了历史,我最初的答案是:
You're looking for glob . 你正在寻找glob 。 Try:
尝试:
import glob
images = glob.glob("image_[0-9]*")
Or, using your example: 或者,使用您的示例:
images = [glob.glob(pattern) for pattern in ("image_000[1-3]*",
"image_00[10-11]*", "image_0011-[1-3]*", "image_9999*")]
images = [image for seq in images for image in seq] # flatten the list
Okay, so I found your question to be a fascinating puzzle. 好的,所以我发现你的问题是一个迷人的难题。 I've left how to "compress" the numeric ranges up to you (marked as a TODO), as there are different ways to accomplish that depending on how you like it formatted and if you want the minimum number of elements or the minimum string description length.
我已经离开了如何“压缩”你的数字范围(标记为TODO),因为有不同的方法可以实现这一点,具体取决于你喜欢它的格式,以及你想要最少数量的元素或最小字符串描述长度。
This solution uses a simple regular expression (digit strings) to classify each string into two groups: static and variable. 此解决方案使用简单的正则表达式(数字字符串)将每个字符串分为两组:静态和可变。 After the data is classified, I use groupby to collect the static data into longest matching groups to achieve the summary effect.
对数据进行分类后,我使用groupby将静态数据收集到最长匹配组中,以实现汇总效果。 I mix integer index sentinals into the result (in matchGrouper) so I can re-select the varying parts from all elements (in unpack).
我将整数索引sentinals混合到结果中(在matchGrouper中),这样我就可以从所有元素中重新选择不同的部分(在unpack中)。
import re
import glob
from itertools import groupby
from operator import itemgetter
def classifyGroups(iterable, reObj=re.compile('\d+')):
"""Yields successive match lists, where each item in the list is either
static text content, or a list of matching values.
* `iterable` is a list of strings, such as glob('images/*')
* `reObj` is a compiled regular expression that describes the
variable section of the iterable you want to match and classify
"""
def classify(text, pos=0):
"""Use a regular expression object to split the text into match and non-match sections"""
r = []
for m in reObj.finditer(text, pos):
m0 = m.start()
r.append((False, text[pos:m0]))
pos = m.end()
r.append((True, text[m0:pos]))
r.append((False, text[pos:]))
return r
def matchGrouper(each):
"""Returns index of matches or origional text for non-matches"""
return [(i if t else v) for i,(t,v) in enumerate(each)]
def unpack(k,matches):
"""If the key is an integer, unpack the value array from matches"""
if isinstance(k, int):
k = [m[k][1] for m in matches]
return k
# classify each item into matches
matchLists = (classify(t) for t in iterable)
# group the matches by their static content
for key, matches in groupby(matchLists, matchGrouper):
matches = list(matches)
# Yield a list of content matches. Each entry is either text
# from static content, or a list of matches
yield [unpack(k, matches) for k in key]
Finally, we add enough logic to perform pretty printing of the output, and run an example. 最后,我们添加了足够的逻辑来执行输出的漂亮打印,并运行一个示例。
def makeResultPretty(res):
"""Formats data somewhat like the question"""
r = []
for e in res:
if isinstance(e, list):
# TODO: collapse and simplify ranges as desired here
if len(set(e))<=1:
# it's a list of the same element
e = e[0]
else:
# prettify the list
e = '['+' '.join(e)+']'
r.append(e)
return ''.join(r)
fnList = sorted(glob.glob('images/*'))
re_digits = re.compile(r'\d+')
for res in classifyGroups(fnList, re_digits):
print makeResultPretty(res)
My directory of images was created from your example. 我的图片目录是根据您的示例创建的。 You can replace fnList with the following list for testing:
您可以使用以下列表替换fnList进行测试:
fnList = [
'images/image_0001.jpg',
'images/image_0002.jpg',
'images/image_0003.jpg',
'images/image_0010.jpg',
'images/image_0011-1.jpg',
'images/image_0011-2.jpg',
'images/image_0011-3.jpg',
'images/image_0011.jpg',
'images/image_9999.jpg']
And when I run against this directory, my output looks like: 当我对这个目录运行时,我的输出看起来像:
StackOverflow/3926936% python classify.py
images/image_[0001 0002 0003 0010].jpg
images/image_0011-[1 2 3].jpg
images/image_[0011 9999].jpg
def ranges(sorted_list):
first = None
for x in sorted_list:
if first is None:
first = last = x
elif x == increment(last):
last = x
else:
yield first, last
first = last = x
if first is not None:
yield first, last
The increment
function is left as an exercise for the reader. increment
函数留给读者练习。
Edit: here's an example of how it would be used with integers instead of strings as input. 编辑:这是一个如何使用整数而不是字符串作为输入的示例。
def increment(x): return x+1
list(ranges([1,2,3,4,6,7,8,10]))
[(1, 4), (6, 8), (10, 10)]
For each contiguous range in the input you get a pair indicating the start and end of the range. 对于输入中的每个连续范围,您将获得一对指示范围的开始和结束。 If an element isn't part of a range, the start and end values are identical.
如果元素不是范围的一部分,则起始值和结束值相同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.