[英]Python using re.sub with a dict to replace quota and apostrophe
I'm trying to replace ' and " in a sting. Here is the dict: 我正在尝试在字符串中替换“和”。这是字典:
char_replace_list = {
'"': '"',
"'": ''',
'&': '&',
'<': '<',
'>': '>',
}
This is what I did: 这是我所做的:
s = '\' " & < >'
pattern = re.compile(r'\b(' + '|'.join(self.char_replace_list.keys()) + r')\b')
pattern.sub(lambda x: char_replace_list[x.group()], s)
The result is: 结果是:
' " & < >
Where did I do wrong? 我在哪里做错了?
Interestingly I get a different result, with no substitutions at all on my machine. 有趣的是,我得到了不同的结果,我的机器上根本没有替代品。
Your issue is that the edges of those punctuation characters are not considered word boundaries (in a platform-dependent way!?): 您的问题是这些标点符号的边缘不被视为单词边界(以平台相关的方式!?):
\\b
Matches the empty string, but only at the beginning or end of a word. 匹配空字符串,但仅匹配单词的开头或结尾。 A word is defined as a sequence of alphanumeric or underscore characters , so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. 单词定义为字母数字或下划线字符的序列 ,因此单词的结尾由空格或非字母数字的非下划线字符指示。 Note that formally,
\\b
is defined as the boundary between a\\w
and a\\W
character (or vice versa), or between\\w
and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of theUNICODE
andLOCALE
flags . 请注意,正式地,\\b
被定义为\\w
和\\W
字符之间的边界(反之亦然),或者\\w
与字符串的开头/结尾之间的边界,因此被视为字母数字字符的精确字符集取决于关于UNICODE
和LOCALE
标志的值 。 For example,r'\\bfoo\\b'
matches'foo'
,'foo.'
例如,r'\\bfoo\\b'
与'foo'
,'foo.'
r'\\bfoo\\b'
匹配'foo.'
,'(foo)'
,'bar foo baz'
but not'foobar'
or'foo3'
. ,'(foo)'
,'bar foo baz'
而不是'foobar'
或'foo3'
。 Inside a character range,\\b
represents the backspace character, for compatibility with Python's string literals. 在字符范围内,\\b
表示退格字符,以与Python的字符串文字兼容。
Instead of \\b...\\b
you could use (?<= |^)...(?= |$)
可以使用(?<= |^)...(?= |$)
代替\\b...\\b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.