简体   繁体   English

如何删除字符串中最后一个数字之后的所有内容(某些字符除外)

[英]How to remove everything (except certain characters) after the last number in a string

This is a follow-up of this question. 这是此问题的后续措施

There I learned how to remove all characters after the last number in a string; 我在那里学习了如何删除字符串中最后一个数字之后的所有字符; so I can turn 所以我可以转

w = 'w123 o456 t789-- --'

into 进入

w123 o456 t789

Now I might have strings like this: 现在我可能会有这样的字符串:

w = 'w123 o456 (t789)'

In this case, 在这种情况下,

re.sub(r'\D+$', '', w)

would give me 会给我

w123 o456 (t789

So I have then actually two closely related questions: 因此,我实际上有两个密切相关的问题:

1) How can I modify the command re.sub(r'\\D+$', '', w) in a way that certain characters are kept (eg parenthesis)? 1)如何以保留某些字符(例如括号)的方式修改命令re.sub(r'\\D+$', '', w) )?

2) How can I modify the command re.sub(r'\\D+$', '', w) so that only certain characters are removed (eg dashes and white spaces)? 2)如何修改命令re.sub(r'\\D+$', '', w)以便仅删除某些字符(例如破折号和空格)?

EDIT 编辑

@Martin Bonner's answer gets very close but eg for @Martin Bonner的答案非常接近,但例如

w='w123 -o456 t789--) --'

the command 命令

 re.sub('[- ]+$', '', w)

gives me w123 -o456 t789--) but it should also get rid of the remaining dashes. 给我w123 -o456 t789--)但它也应该消除剩余的破折号。

To keep certain characters ( and ) use: 要保留某些字符()使用:

re.sub('[^0-9()]+$', '', w)

to remove only certain characters from the end of the line: 从行尾仅删除某些字符:

re.sub('[- ]+$', '', w)

In square brackets, you can list the characters you want to match. 在方括号中,您可以列出要匹配的字符。 If the first character is ^ then everything except the specified characters are matched. 如果第一个字符是^则匹配指定字符以外的所有字符。 The only minor niggle is that - usually specifies a range (so we can specify, eg, all digits without having to list all 10 of them). 唯一的小麻烦是-通常指定一个范围(因此,我们可以指定所有数字,而不必列出所有10个数字)。 That means that if we are going to specify - as one of the characters to match, it needs to go first . 这意味着,如果我们要指定-作为要匹配的字符之一,则需要先输入- (If you want to specify ^ , then escape it with \\ and go back to raw strings.) (如果要指定^ ,请使用\\对其进行转义,然后返回原始字符串。)

From the comment, I think you actually meant the second challenge to be "remove all the dashes and spaces from the string that lie between the last digit, and the end of the line". 从评论中,我认为您实际上的意思是第二个挑战是“删除字符串中位于最后一位和行尾之间的所有破折号和空格”。 That may be possible with a regular expression, but somebody who comes back to maintain the code in three months time will hate you (and it may well be you). 使用正则表达式可能是可行的,但是会在三个月的时间内回来维护代码的人会讨厌您(很可能是您)。 Just remember the Jamie Zawinski quote: 只要记住杰米·扎温斯基的话:

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. 有些人遇到问题时会想:“我知道,我将使用正则表达式。”现在,他们有两个问题。

You may use another re.sub in the callback as the replacement pattern. 您可以在回调中使用另一个re.sub作为替换模式。

re.sub(r'\D+$', lambda m: re.sub(r'[^()]+','',m.group(0)), s)

Here, you match all symbols other than digits at the end of the string, pass that value to the callback, and all symbols other than ( and ) are removed from that value. 在这里,您将匹配字符串末尾除数字以外的所有符号,将该值传递给回调,并且除()以外的所有其他符号将从该值中删除。

If there is always 3 groups of characters and each group start's with a single letter and has 3 digits after that, and only the last group might have brackets, this expression might be just what you need: 如果总是有3组字符,并且每组以一个字母开头,然后是3位数字,并且只有最后一组可能带有方括号,则此表达式可能正是您所需要的:

w = 'w123 o456 (t789)'
clean = re.sub(r'^.*(\w\d{3})[ -]+(\w\d{3})[ -]+(\(?\w\d{3}\)?).*$', r'\1 \2 \3', w)

clean now prints 'w123 o456 (t789)' even if there are some other characters at the beginning or end of string. 即使字符串的开头或结尾还有其他一些字符, clean现在'w123 o456 (t789)'打印'w123 o456 (t789)'

This expression look's for 3 groups of characters each consisting of a letter and 3 digits. 该表达式查找3个字符组,每个字符组由一个字母和3个数字组成。 For the last group there are optional brackets - \\(? and \\)? 对于最后一组,有可选的括号- \\(?\\)? . All characters before and after the 3 groups are matched with ^.* and .*$ . 3组之前和之后的所有字符都与^.*.*$匹配。 Then we replace everything with just the 3 captured groups - \\1 \\2 \\3 然后,我们只用三个捕获的组替换所有内容- \\1 \\2 \\3

Instead of Regex, why not use list comprehension (this auto keeps letters and digits if you don't want certain letters or digits we can change it too): 为什么不使用正则表达式,而不使用正则表达式(如果您不希望某些字母或数字,那么此自动保留字母和数字,我们也可以更改它):

w = 'w123 o456 t789-- --'
list_to_keep =[' ']
print(''.join([x for x in w if x.isalnum() or x in list_to_keep]))
>> w123 o456 t789 

w = 'w123 o456 (t789)'
list_to_keep =[' '] # add to me
print(''.join([x for x in w if x.isalnum() or x in list_to_keep]))
>> w123 o456 t789 

and for example: 例如:

w = 'w123 o456 (t789)'
list_to_keep =[' ', '('] # add to me (I added '(' to keep for example)
print(''.join([x for x in w if x.isalnum() or x in list_to_keep]))
>> w123 o456 (t789

and it works against what you edited saying Martin doesn't work: 它与您编辑的“马丁不起作用”的内容背道而驰:

w='w123 -o456 t789--) --'
list_to_keep =[' '] # add to me (I added '(' to keep for example)
print(''.join([x for x in w if x.isalnum() or x in list_to_keep]))
>> w123 o456 t789

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM