简体   繁体   English

Python使用正则表达式从字符串中删除最后一个字符_

[英]Python Removing last character _ from string using regex

I know there are a bunch of other regex questions, but I was hoping someone could point out what is wrong with my regex. 我知道还有很多其他正则表达式问题,但我希望有人能指出我的正则表达式有什么问题。 I have done some research into it and it looks like it should work. 我已经对其进行了一些研究,看起来它应该可以工作。 I used rubular to test it, yes I know that is regex for ruby, but the same rules I used should apply to python from what it looks like in the python docs 我用rubular进行了测试,是的,我知道这是ruby的正则表达式,但是我使用的相同规则应该从python文档中的外观上应用于python

Currently I have 目前我有

a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
c = [re.sub(r'[A-Z]+', "", x) for x in a]

which returns 哪个返回

['SDFSD_SFSDF', 'SDFSDF_SDFSDF_', 'TSFSD_SDF_']

But I want it to return 但我要它回来

['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

I try to use this regex 我尝试使用此正则表达式

c = [re.sub(r'$?_[^A-Z_]+', "", x) for x in a]

but I am getting this error 但我收到此错误

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "/usr/lib64/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression

Can anyone help me figure out what I am doing wrong? 谁能帮助我找出我做错了什么?

import re

a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
c = [re.match(r'[A-Z_]+[A-Z]', x).group() for x in a]

print c

Results: 结果:

['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

Please note, that "re.sub" which you use in your example is a regex replace command, not a search. 请注意,您在示例中使用的“ re.sub”是一个正则表达式替换命令,而不是搜索。 Your regex seems to be matching for what you're asking for, not what you're trying to get rid of to get what you're asking for. 您的正则表达式似乎与您要的内容相匹配,而不是您想要摆脱的内容。

You could insert 'lookahead' into your regexp. 您可以在正则表达式中插入“ lookahead”。 Written as (?=...) your regexp will match only text followed by whatever you put in the . 输入为(?=...)您的正则表达式将仅匹配文本,后跟您在 So in your case you could choose to ignore the underscore unless it is followed by [AZ] . 因此,根据您的情况,您可以选择忽略下划线,除非后跟[AZ] Your reg exp will look like this: r'[AZ]+_(?[AZ])' so an underscore not followed by letters will be ignored. 您的reg exp将如下所示: r'[AZ]+_(?[AZ])'因此下划线而不是字母将被忽略。

Without regex using rstrip : 没有使用rstrip正则表达式:

a = ["ends_with_underscore_", "does_not", "multiple_____"]
b = [ x.rstrip("_") for x in a]
print b
>> ['ends_with_underscore', 'does_not', 'multiple']
>>> import re
>>> a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
>>> c = [re.sub('_?\d+','',x) for x in a]
>>> c
['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']
>>>

It's short and simple. 简短而简单。 Basically, it's saying "replace everything that is a stream of digits or a stream of digits preceded by an _". 基本上,这是说“替换所有以数字流或以_开头的数字流”。

The error in: 错误在:

c = [re.sub(r'$?_[^A-Z_]+', "", x) for x in a]

Is caused by the ? 是由引起的? , it is not preceded by any characters so it doesn't know what to match 0 or 1 times. ,它前面没有任何字符,因此不知道该匹配0或1次。 If you change it to: 如果将其更改为:

>>> [re.sub(r'_?[^A-Z_]+$', "", x) for x in a]
['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

It works as you expect. 它按您的预期工作。

Another thing, $ is used to detonate the end of the line, so it probably shouldn't be the first character. 另一件事, $用于引爆行尾,因此它可能不应该是第一个字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM