[英]Python regex to add a character to all words in a string except and
I want to be able to generate 'foos, bars and bees'
from 'foo, bar and bee'
using re.sub. 我希望能够使用re.sub从'foo, bar and bee'
生成'foos, bars and bees'
。
I can't even get just adding 's' to all words to work. 我甚至不能只为所有单词添加's'来工作。 I'll work on excluding 'and' once I get that part. 一旦我得到那部分,我将努力排除'和'。 I have tried subbing \\b
with "s"
but that matches beginnings and endings of words. 我试过用"s"
修饰\\b
,但它匹配单词的开头和结尾。 If I use '\\w*\\b'
then the whole word is replaced. 如果我使用'\\w*\\b'
则替换整个单词。 I am trying to figure this out using the Python docs, and (?P)
or (?<=...)
lookbehind assertions seem like they might be what I'm after, but I am having trouble getting those to cooperate, and the examples are limited. 我试图用Python文档来解决这个问题,并且(?P)
或(?<=...)
lookbehind断言似乎可能是我所追求的,但是我很难让它们合作,这些例子有限。
This works, based on the replacement accepting a callable: 这是有效的,基于替换接受可调用:
re.sub('(\w+)', lambda m: m.group(1) + 's' if m.group(1) != 'and' else 'and', 'foo, bar and bee')
It was inspired by an old bug report (second to last entry). 它的灵感来自旧的错误报告 (倒数第二)。
EDIT: A shorter and probably more readable solution: 编辑:更短且可能更易读的解决方案:
re.sub('(and)|(\w+)', lambda m: m.group(1) or m.group(2) + 's', 'foo, bar and bee')
It also has the benefit of making it easier to add other words to the exception list, as isedev suggested in a comment. 它还有一个好处,就是可以更容易地将其他单词添加到例外列表中,就像评论中提出的isedev一样。
Without considering words to exclude, the following will add an 's' to the end of all words in the string: 在不考虑要排除的单词的情况下,以下内容将在字符串中的所有单词的末尾添加“s”:
re.sub('([a-zA-Z]+)','\\1s','foo, bar and bee')
-> 'foos, bars ands bees'
To pluralise words in a more generic and less error prone way, you might want to take a look at the inflect package (for English at least). 要以更通用且更不容易出错的方式复数单词,您可能需要查看inflect包(至少对于英语)。
The below code would add s
to all the words except the word and
, 下面的代码将增加s
向所有的字,除了单词and
,
>>> import re
>>> s = "foo, bar and bee "
>>> m = re.sub(r'(?!and)(\b\w+\b)', r'\1s', s)
>>> m
'foos, bars and bees '
Negative lookahead asserts that it would match one or more word characters but not a \\band\\b
. 否定前瞻断言它会匹配一个或多个单词字符但不匹配\\band\\b
。 \\b
here, means word boundary which matches between a word character and a non-word character. \\b
这里,表示在单词字符和非单词字符之间匹配的单词边界。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.