简体   繁体   English

如何在 Python 中使用 re.sub() 替换单词后缀?

[英]How to replace a word suffix using re.sub() in Python?

If I had a body of text and wanted to replace "ion" or "s" with nothing but keep the rest of the word (so if the word is reflection it should output reflect), how would I go about that?如果我有一段文字并且想用任何东西替换“ion”或“s”,只保留单词的 rest(所以如果这个词是反射,它应该 output 反映),我将如何 Z34D1F91FB2E5514B856BZAFC1? I have tried:我努力了:

new_llw = re.sub(r'[a-z]+ion', "", llw)
print(new_llw)

which replaces the whole word, and I tried它替换了整个单词,我试过了

if re.search(r'[a-z]+ion', "", llw) is True:
    re.sub('ion', '', llw)

print(llw)

which gives me and error这给了我和错误

TypeError: unsupported operand type(s) for &: 'str' and 'int'

For the ion replacement, you may use a positive lookbehind:对于ion置换,您可以使用积极的后视:

inp = "reflection"
output = re.sub(r'(?<=\w)ion\b', '', inp)
print(output)  # reflect

The TypeError: unsupported operand type(s) for &: 'str' and 'int' error is due to the fact you are using re.search(r'[az]+ion', "", llw) like re.sub . TypeError: unsupported operand type(s) for &: 'str' and 'int'错误是由于您使用re.search(r'[az]+ion', "", llw) like re.sub . The second argument to re.search is the input string, which is empty and the third argument is the flags, that are set with specific regex options (like re.A or re.I ) that may present a bitwise mask ( re.A | re.I ). re.search的第二个参数是输入字符串,它是空的,第三个参数是标志,这些标志是用特定的正则表达式选项(如re.Are.I )设置的,可能会呈现按位掩码( re.A | re.I 。我)。

Now, if you need to match an ion as a suffix in a word, you can use现在,如果您需要匹配一个ion作为单词中的后缀,您可以使用

new_llw = re.sub(r'\Bion\b', '', llw)

Here, \B matches a location that is immediately preceded with a word char (a letter, digit or connector punctuation, like _ ), then ion matches ion and \b matches a location that is either at the end of string or immediately followed with a non-word char.在这里, \B匹配紧跟在单词 char (字母、数字或连接符标点符号,如_ )之前的位置,然后ion匹配ion并且\b匹配位于字符串末尾或紧随其后的位置一个非单词字符。

To also match an s suffix:还要匹配一个s后缀:

new_llw = re.sub(r'\B(?:ion|s)\b', '', llw)

The (?:...) is a non-capturing group. (?:...)是一个非捕获组。

See the regex demo .请参阅正则表达式演示

Variations变化

If you consider words as letter sequences only, you can use如果您仅将单词视为字母序列,则可以使用

new_llw = re.sub(r'(?<=[a-zA-Z])(?:ion|s)\b', '', llw) # ASCII only version
new_llw = re.sub(r'(?<=[^\W\d_])(?:ion|s)\b', '', llw) # Any Unicode letters supported

Here, (?<=[a-zA-Z]) matches a location that is immediately preceded with an ASCII letter.在这里, (?<=[a-zA-Z])匹配紧接在 ASCII 字母前面的位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM