简体   繁体   English

Python 正则表达式删除字母数字字符而不删除字符串末尾的单词

[英]Python regex to remove alphanumeric characters without removing words at the end of the string

I'm trying to clean some text by removing alphanumeric characters from the end of the string, but I'm also removing normal words as shown on the output .我试图通过从字符串末尾删除字母数字字符来清理一些文本,但我也在删除正常单词,如output所示。 Can someone help me achieve the expected result?有人可以帮我达到预期的结果吗?

re.sub(r'[a-zA-Z0-9/]{5,}$', '', text)

asus zenfone 3s max zc521tl
asus zenfone max plus (m1) zb570tl
asus zenfone max pro (m1) zb601kl/zb602k
nokia 3.1 c
nokia 3
asus zenfone 3 zoom ze553k
asus zenfone 3 deluxe zs570kl
blackberry keyone
htc explorer
lg tribute
acer liquid z520

Output: Output:

asus zenfone 3s max 
asus zenfone max plus (m1) 
asus zenfone max pro (m1) 
nokia 3.1 c
nokia 3
asus zenfone 3 zoom 
asus zenfone 3 deluxe 
blackberry 
htc 
lg 
acer liquid z520

Expected output:预期 output:

asus zenfone 3s max
asus zenfone max plus (m1) 
asus zenfone max pro (m1)
nokia 3.1 c
nokia 3
asus zenfone 3 zoom 
asus zenfone 3 deluxe 
**blackberry keyone**
**htc explorer**
**lg tribute**
acer liquid z520

You can add a positive look-ahead to the regex that requires the word at the end to contain at least one digit for it to be removed: (?=\D*\d) .您可以向正则表达式添加一个正则表达式,该表达式要求末尾的单词至少包含一个数字才能将其删除: (?=\D*\d) That will prevent it from removing normal words that don't contain numbers.这将阻止它删除不包含数字的正常单词。

The complete program:完整的程序:

#!/usr/bin/env python3
import re

texts = [
    'asus zenfone 3s max zc521tl',
    'asus zenfone max plus (m1) zb570tl',
    'asus zenfone max pro (m1) zb601kl/zb602k',
    'nokia 3.1 c',
    'nokia 3',
    'asus zenfone 3 zoom ze553k',
    'asus zenfone 3 deluxe zs570kl',
    'blackberry keyone',
    'htc explorer',
    'lg tribute',
    'acer liquid z520',
]

for text in texts:
    print(re.sub(r'(?=\D*\d)[a-zA-Z0-9/]{5,}$', '', text))

It outputs:它输出:

asus zenfone 3s max 
asus zenfone max plus (m1) 
asus zenfone max pro (m1) 
nokia 3.1 c
nokia 3
asus zenfone 3 zoom 
asus zenfone 3 deluxe 
blackberry keyone
htc explorer
lg tribute
acer liquid z520

If it should be the last word in a string and there are always multiple words, you might use:如果它应该是字符串中的最后一个单词并且总是有多个单词,您可以使用:

[ \t]+(?=[a-zA-Z0-9/]{5})[a-zA-Z/]*[0-9][a-zA-Z0-9/]*[A-Za-z]$
  • [ \t]+ Match 1+ spaces or tabs [ \t]+匹配 1+ 个空格或制表符
  • (?=[a-zA-Z0-9/]{5}) Assert at least 5 chars of any of the listed (?=[a-zA-Z0-9/]{5})断言任何列出的至少 5 个字符
  • [a-zA-Z/]* Match 0+ times any of the listed [a-zA-Z/]*匹配任何列出的 0+ 次
  • [0-9] Match a digit [0-9]匹配一个数字
  • [a-zA-Z0-9/]* Match 0+ times any of the listed in the character class [a-zA-Z0-9/]*匹配字符 class 中列出的任何内容的 0+ 次
  • [A-Za-z] Match a char a-zA-Z [A-Za-z]匹配一个字符 a-zA-Z
  • $ End of string $字符串结尾

Regex demo正则表达式演示

In the replacement use an empty string.在替换中使用空字符串。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用正则表达式删除 python 中某个字符串的前导和尾随非字母数字字符? - How to remove leading and trailing non-alphanumeric characters of a certain string in python using regex? 从 Python 中的字符串中删除非字母数字 unicode 字符 - Removing non-alphanumeric unicode characters from a string in Python 在Python中删除字符串而不删除重复字符 - Removing a string in Python without removing repeating characters Python正则表达式。 删除 ':' 之后的所有字符(包括行尾和特定字符串除外) - Python regex. Removing all characters after ':' (including at the end of line and except for a specific string) 从字符串 Python 的前 4 个值中删除字母数字以外的字符 - Remove characters other than alphanumeric from first 4 values of string Python 如何删除两个单词之间的字符串而不删除这些单词? - How to remove a string between two words without removing those words? 在Python 3中删除字母数字词,但有一些例外 - Removing Alphanumeric Words, With Some Exceptions in Python 3 Python 正则表达式:删除所有未附加到单词的特殊字符和数字 - Python regex: removing all special characters and numbers NOT attached to words python regex从字符串的末尾查找字符 - python regex find characters from and end of the string 通过正则表达式替换删除非字母数字字符 - Remove non-alphanumeric characters by regex substitution
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM