Python 正则表达式删除字母数字字符而不删除字符串末尾的单词

Question

I'm trying to clean some text by removing alphanumeric characters from the end of the string, but I'm also removing normal words as shown on the output .我试图通过从字符串末尾删除字母数字字符来清理一些文本，但我也在删除正常单词，如output所示。 Can someone help me achieve the expected result?有人可以帮我达到预期的结果吗？

re.sub(r'[a-zA-Z0-9/]{5,}$', '', text)

asus zenfone 3s max zc521tl
asus zenfone max plus (m1) zb570tl
asus zenfone max pro (m1) zb601kl/zb602k
nokia 3.1 c
nokia 3
asus zenfone 3 zoom ze553k
asus zenfone 3 deluxe zs570kl
blackberry keyone
htc explorer
lg tribute
acer liquid z520

Output: Output：

asus zenfone 3s max 
asus zenfone max plus (m1) 
asus zenfone max pro (m1) 
nokia 3.1 c
nokia 3
asus zenfone 3 zoom 
asus zenfone 3 deluxe 
blackberry 
htc 
lg 
acer liquid z520

Expected output:预期 output：

asus zenfone 3s max
asus zenfone max plus (m1) 
asus zenfone max pro (m1)
nokia 3.1 c
nokia 3
asus zenfone 3 zoom 
asus zenfone 3 deluxe 
**blackberry keyone**
**htc explorer**
**lg tribute**
acer liquid z520

Answer 1

You can add a positive look-ahead to the regex that requires the word at the end to contain at least one digit for it to be removed: (?=\D*\d) .您可以向正则表达式添加一个正则表达式，该表达式要求末尾的单词至少包含一个数字才能将其删除： (?=\D*\d) 。 That will prevent it from removing normal words that don't contain numbers.这将阻止它删除不包含数字的正常单词。

The complete program:完整的程序：

#!/usr/bin/env python3
import re

texts = [
    'asus zenfone 3s max zc521tl',
    'asus zenfone max plus (m1) zb570tl',
    'asus zenfone max pro (m1) zb601kl/zb602k',
    'nokia 3.1 c',
    'nokia 3',
    'asus zenfone 3 zoom ze553k',
    'asus zenfone 3 deluxe zs570kl',
    'blackberry keyone',
    'htc explorer',
    'lg tribute',
    'acer liquid z520',
]

for text in texts:
    print(re.sub(r'(?=\D*\d)[a-zA-Z0-9/]{5,}$', '', text))

It outputs:它输出：

asus zenfone 3s max 
asus zenfone max plus (m1) 
asus zenfone max pro (m1) 
nokia 3.1 c
nokia 3
asus zenfone 3 zoom 
asus zenfone 3 deluxe 
blackberry keyone
htc explorer
lg tribute
acer liquid z520

Answer 2

If it should be the last word in a string and there are always multiple words, you might use:如果它应该是字符串中的最后一个单词并且总是有多个单词，您可以使用：

[ \t]+(?=[a-zA-Z0-9/]{5})[a-zA-Z/]*[0-9][a-zA-Z0-9/]*[A-Za-z]$

[ \t]+ Match 1+ spaces or tabs [ \t]+匹配 1+ 个空格或制表符
(?=[a-zA-Z0-9/]{5}) Assert at least 5 chars of any of the listed (?=[a-zA-Z0-9/]{5})断言任何列出的至少 5 个字符
[a-zA-Z/]* Match 0+ times any of the listed [a-zA-Z/]*匹配任何列出的 0+ 次
[0-9] Match a digit [0-9]匹配一个数字
[a-zA-Z0-9/]* Match 0+ times any of the listed in the character class [a-zA-Z0-9/]*匹配字符 class 中列出的任何内容的 0+ 次
[A-Za-z] Match a char a-zA-Z [A-Za-z]匹配一个字符 a-zA-Z
$ End of string $字符串结尾

Regex demo正则表达式演示

In the replacement use an empty string.在替换中使用空字符串。

Python 正则表达式删除字母数字字符而不删除字符串末尾的单词

问题描述

2 个解决方案

解决方案1
1 2019-11-06 02:00:01

解决方案2
1 已采纳 2019-11-06 08:39:56

Python 正则表达式删除字母数字字符而不删除字符串末尾的单词

问题描述

2 个解决方案

解决方案1 1 2019-11-06 02:00:01

解决方案2 1 已采纳 2019-11-06 08:39:56

解决方案1
1 2019-11-06 02:00:01

解决方案2
1 已采纳 2019-11-06 08:39:56