简体   繁体   English

python正则表达式给出空字符串

[英]python regex gives empty string

First off, I am new to regex. 首先,我是正则表达式的新手。 But so far I am in love with them. 但到目前为止,我爱上了他们。 I am using regex to extract info from an image files name that I get from render engine. 我正在使用正则表达式从我从渲染引擎获得的图像文件名中提取信息。 So far this regex is working decently... 到目前为止,这个正则表达式工作正常......

_([a-z]{2,8})_?(\d{1,2})?(\.|_)(\d{3,10})\.([a-z]{2,6})$

If I use the split() method on a file name such as... 如果我在文件名上使用split()方法,例如...

image_file_name_ao.0001.exr image_file_name_ao.0001.exr

I get back I nice little list I can use.... 我回来了我可以用的好小清单....

['image_file_name', 'gi', None, '.', '0001', 'exr', '']

My only concern is that it always returns an empty string last. 我唯一担心的是它总是最后返回一个空字符串。 No matter how I change or manipulate the regex it always gives me an empty string at the end of the list. 无论我如何更改或操纵正则表达式,它总是在列表的末尾给我一个空字符串。 I am totally comfortable with ignoring it and moving on, but my question is am I doing something wrong with my regex or is there something I can do to make it not pass that final empty string? 我完全放弃了忽略它并继续前进,但我的问题是我正在用我的正则表达式做错了还是我可以做些什么来使它不能通过最后的空字符串? Thank you for your time. 感谢您的时间。

No wonder. 难怪。 The split method splits your string at occurences of the regex (plus returns group ranges). split方法在正则表达式的出现时拆分你的字符串(加上返回组范围)。 And since your regex matches only substrings which reach until the end of the line (indicated by the $ at its end), there is nothing to split off at the file name's end but an empty suffix ( '' ). 并且由于你的正则表达式只匹配直到行结束的子字符串(由结尾处的$表示),所以没有什么可以在文件名的末尾分割,而是空后缀( '' )。

Given that you are already using groups " (...) " in your expression, you could as well use re.match(regex, string) . 鉴于您已经在表达式中使用了“ (...) ”组,您也可以使用re.match(regex, string) This will give you a MatchObject instance, from which you can retrieve a tuple containing your groups via groups() : 这将为您提供一个MatchObject实例,您可以从中通过groups()从中检索包含您的组的元groups()

# additional group up front
reg='(\S*)_([a-z]{2,8})_?(\d{1,2})?(\.|_)(\d{3,10})\.([a-z]{2,6})$' 
print re.match(reg, filename).groups() # request tuple of group matches

Edit: I'm really sorry but I didn't realize that your pattern does not match the file name string from its first character on. 编辑:我真的很抱歉,但我没有意识到你的模式与第一个字符的文件名字符串不匹配。 I extended it in my answer. 我在答案中扩展了它。 If you want to stick with your approach using split() , you might also change your original pattern in a way that the last part of the file name is not matched and hence split off. 如果你想使用split()坚持你的方法,你也可能会改变原始模式,使文件名的最后一部分不匹配,从而分开。

Interesting question. 有趣的问题。

I changed a little the regex's pattern: 我改变了一点正则表达式的模式:

import re

reg = re.compile('_([a-z]{2,8})'

                 '_?(\d\d?)?'

                 '([._])'
                 '(\d{3,10})'
                 '\.'
                 '(?=[a-z]{2,6}$)')

for ss in ('image_file_name_ao.0001.exr',
           'image_file_name_45_ao.0001.exr',
           'image_file_name_ao_78.0001.exr',
           'image_file_name_ao78.0001.exr'):
    print '%s\n%r\n' % ( ss, reg.split(ss) )

result 结果

image_file_name_ao.0001.exr
['image_file_name', 'ao', None, '.', '0001', 'exr']

image_file_name_45_ao.0001.exr
['image_file_name_45', 'ao', None, '.', '0001', 'exr']

image_file_name_ao_78.0001.exr
['image_file_name', 'ao', '78', '.', '0001', 'exr']

image_file_name_ao78.0001.exr
['image_file_name', 'ao', '78', '.', '0001', 'exr']

You can use filter() 你可以使用filter()

Given your example this would work like, 鉴于你的例子,这将工作,如,

def f(x):
    return x != '' 

filter
(
    f,
    re.split('_([a-z]{2,8})_?(\d{1,2})?(\.|_)(\d{3,10})\.([a-z]{2,6})$',
    'image_file_name_ao.0001.exr')
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM