Python使用正则表达式从字符串中删除最后一个字符_

Question

I know there are a bunch of other regex questions, but I was hoping someone could point out what is wrong with my regex. 我知道还有很多其他正则表达式问题，但我希望有人能指出我的正则表达式有什么问题。 I have done some research into it and it looks like it should work. 我已经对其进行了一些研究，看起来它应该可以工作。 I used rubular to test it, yes I know that is regex for ruby, but the same rules I used should apply to python from what it looks like in the python docs 我用rubular进行了测试，是的，我知道这是ruby的正则表达式，但是我使用的相同规则应该从python文档中的外观上应用于python

Currently I have 目前我有

a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
c = [re.sub(r'[A-Z]+', "", x) for x in a]

which returns 哪个返回

['SDFSD_SFSDF', 'SDFSDF_SDFSDF_', 'TSFSD_SDF_']

But I want it to return 但我要它回来

['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

I try to use this regex 我尝试使用此正则表达式

c = [re.sub(r'$?_[^A-Z_]+', "", x) for x in a]

but I am getting this error 但我收到此错误

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "/usr/lib64/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression

Can anyone help me figure out what I am doing wrong? 谁能帮助我找出我做错了什么？

Answer 1

import re

a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
c = [re.match(r'[A-Z_]+[A-Z]', x).group() for x in a]

print c

Results: 结果：

['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

Please note, that "re.sub" which you use in your example is a regex replace command, not a search. 请注意，您在示例中使用的“ re.sub”是一个正则表达式替换命令，而不是搜索。 Your regex seems to be matching for what you're asking for, not what you're trying to get rid of to get what you're asking for. 您的正则表达式似乎与您要的内容相匹配，而不是您想要摆脱的内容。

Answer 2

You could insert 'lookahead' into your regexp. 您可以在正则表达式中插入“ lookahead”。 Written as (?=...) your regexp will match only text followed by whatever you put in the … . 输入为(?=...)您的正则表达式将仅匹配文本，后跟您在… 。 So in your case you could choose to ignore the underscore unless it is followed by [AZ] . 因此，根据您的情况，您可以选择忽略下划线，除非后跟[AZ] 。 Your reg exp will look like this: r'[AZ]+_(?[AZ])' so an underscore not followed by letters will be ignored. 您的reg exp将如下所示： r'[AZ]+_(?[AZ])'因此下划线而不是字母将被忽略。

Answer 3

Without regex using rstrip : 没有使用rstrip正则表达式：

a = ["ends_with_underscore_", "does_not", "multiple_____"]
b = [ x.rstrip("_") for x in a]
print b

>> ['ends_with_underscore', 'does_not', 'multiple']

Answer 4

>>> import re
>>> a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
>>> c = [re.sub('_?\d+','',x) for x in a]
>>> c
['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']
>>>

It's short and simple. 简短而简单。 Basically, it's saying "replace everything that is a stream of digits or a stream of digits preceded by an _". 基本上，这是说“替换所有以数字流或以_开头的数字流”。

Answer 5

The error in: 错误在：

c = [re.sub(r'$?_[^A-Z_]+', "", x) for x in a]

Is caused by the ? 是由引起的? , it is not preceded by any characters so it doesn't know what to match 0 or 1 times. ，它前面没有任何字符，因此不知道该匹配0或1次。 If you change it to: 如果将其更改为：

>>> [re.sub(r'_?[^A-Z_]+$', "", x) for x in a]
['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

It works as you expect. 它按您的预期工作。

Another thing, $ is used to detonate the end of the line, so it probably shouldn't be the first character. 另一件事， $用于引爆行尾，因此它可能不应该是第一个字符。

Python使用正则表达式从字符串中删除最后一个字符_

问题描述

5 个解决方案

解决方案1
1 2013-07-17 22:04:49

解决方案2
1 2013-07-17 22:15:47

解决方案3
1 2013-07-17 22:25:51

解决方案4
1 2013-07-17 22:50:24

解决方案5
0 已采纳 2013-07-17 22:02:40

Python使用正则表达式从字符串中删除最后一个字符_

问题描述

5 个解决方案

解决方案1 1 2013-07-17 22:04:49

解决方案2 1 2013-07-17 22:15:47

解决方案3 1 2013-07-17 22:25:51

解决方案4 1 2013-07-17 22:50:24

解决方案5 0 已采纳 2013-07-17 22:02:40

解决方案1
1 2013-07-17 22:04:49

解决方案2
1 2013-07-17 22:15:47

解决方案3
1 2013-07-17 22:25:51

解决方案4
1 2013-07-17 22:50:24

解决方案5
0 已采纳 2013-07-17 22:02:40