Python正则表达式匹配字符串模式并返回子字符串

Question

我有许多名称如下的文件：

<some name>_2536by1632.jpg
<some name1>_4800by2304.JPG
<some name2>_904by904.jpg

因此，名称部分各不相同，扩展名始终为jpg，但也可以大写。 x和y在<x>by<y>可能有有限的值，我以这种格式列出这些值：

possible_sizes = [ (2536,1632), (4800,2304), ...]

我需要测试文件名是否为这种模式，如果是，则需要返回<x>by<y>字符串的值。

截至目前，我无需使用正则表达式即可执行此操作。 像这样：

for item in possible_sizes:
    if "_{0}by{1}.jpg".format(item[0],item[1]) in filename.lower():
        dimension = "{0}by{1}".format(item[0],item[1])

但这不是一个很干净的解决方案，特别是当将来可能增加的尺寸值时。

如何使用正则表达式呢？

Answer 1

您可以只使用Python的字符串方法：

import os

# O(1) lookup time
possible_sizes = frozenset([(2536, 1632), (4800, 2304), ...])

name, extension = os.path.splitext(filename)
title, size = filename.rsplit('_')
width, height = map(int, size.split('by'))

if (width, height) in possible_sizes:
    print(width, height)

Answer 2

可能不是最明智的选择，但应该易于阅读。

字符串：

可以以任何^.*开头
必须有下划线_
后跟一个数字（至少由1个数字组成） \\d+
其次是“通过” by
后跟一个数字（至少由1个数字组成） \\d+
以.jpg或。结尾的 JPG \\.(jpg|JPG)$

(?P<X> ....) makes a match accessible by the name X.

Leads to this expression "^.*_((?P<X>\\d+)by(?P<Y>\\d+))\\.(jpg|JPG)$"

示例程序：

import re

possible_sizes = [ ( 2536, 1632 ), ( 4800, 2304 )]
names = ["<some name>_2536by1632.jpg", "<some name1>_4800by2304.JPG", "<some name2>_904by904.jpg"]
pattern = "^.*_((?P<X>\d+)by(?P<Y>\d+))\.(jpg|JPG)$"

for name in names:
    matchobj = re.match( pattern, name )
    if matchobj:
        if ( int( matchobj.group( "X" ) ), int( matchobj.group( "Y" ) ) ) in possible_sizes:
            print matchobj.group( 1 )

Output

2536by1632

4800by2304

Answer 3

这与您提出问题的实质无关，但我认为这实际上是可行的-

possible_sizes = { "_2536by1632.jpg" : (2536,1632), "_4800by2304.jpg" : (4800,2304)}
for filename in filenames:
    if filename.endswith in possible_sizes:
        return possible_sizes[filename]

Python正则表达式匹配字符串模式并返回子字符串

问题描述

3 个解决方案

解决方案1
0 2013-09-18 15:27:16

解决方案2
0 2013-09-18 15:39:38

解决方案3
-1 2013-09-18 15:29:02

Python正则表达式匹配字符串模式并返回子字符串

问题描述

3 个解决方案

解决方案1 0 2013-09-18 15:27:16

解决方案2 0 2013-09-18 15:39:38

解决方案3 -1 2013-09-18 15:29:02

解决方案1
0 2013-09-18 15:27:16

解决方案2
0 2013-09-18 15:39:38

解决方案3
-1 2013-09-18 15:29:02