Python使用re.sub和dict替换配额和撇号

Question

I'm trying to replace ' and " in a sting. Here is the dict: 我正在尝试在字符串中替换“和”。这是字典：

char_replace_list = {
    '"': '&quot;',
    "'": '&apos;',
    '&': '&amp;',
    '<': '&lt',
    '>': '&gt',
}

This is what I did: 这是我所做的：

s = '\' " & < >'
pattern = re.compile(r'\b(' + '|'.join(self.char_replace_list.keys()) + r')\b')
pattern.sub(lambda x: char_replace_list[x.group()], s)

The result is: 结果是：

' " &amp; &lt; &gt;

Where did I do wrong? 我在哪里做错了？

Answer 1

Interestingly I get a different result, with no substitutions at all on my machine. 有趣的是，我得到了不同的结果，我的机器上根本没有替代品。

Your issue is that the edges of those punctuation characters are not considered word boundaries (in a platform-dependent way!?): 您的问题是这些标点符号的边缘不被视为单词边界（以平台相关的方式！？）：

\\b

Matches the empty string, but only at the beginning or end of a word. 匹配空字符串，但仅匹配单词的开头或结尾。 A word is defined as a sequence of alphanumeric or underscore characters , so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. 单词定义为字母数字或下划线字符的序列 ，因此单词的结尾由空格或非字母数字的非下划线字符指示。 Note that formally, \\b is defined as the boundary between a \\w and a \\W character (or vice versa), or between \\w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags . 请注意，正式地， \\b被定义为\\w和\\W字符之间的边界（反之亦然），或者\\w与字符串的开头/结尾之间的边界，因此被视为字母数字字符的精确字符集取决于关于UNICODE和LOCALE标志的值 。 For example, r'\\bfoo\\b' matches 'foo' , 'foo.' 例如， r'\\bfoo\\b'与'foo' ， 'foo.' r'\\bfoo\\b'匹配'foo.' , '(foo)' , 'bar foo baz' but not 'foobar' or 'foo3' . ， '(foo)' ， 'bar foo baz'而不是'foobar'或'foo3' 。 Inside a character range, \\b represents the backspace character, for compatibility with Python's string literals. 在字符范围内， \\b表示退格字符，以与Python的字符串文字兼容。

Instead of \\b...\\b you could use (?<= |^)...(?= |$) 可以使用(?<= |^)...(?= |$)代替\\b...\\b

Answer 2

I this case you can use translate method: 在这种情况下，您可以使用翻译方法：

char_replace_list = {
    '"': '&quot;',
    "'": '&apos;',
    '&': '&amp;',
    '<': '&lt',
    '>': '&gt',
}
s = '\' " & < >'
# table translate from a mapping
t = "".maketrans(char_replace_list)
print(s.translate(t))
# &apos; &quot; &amp; &lt &gt

Python使用re.sub和dict替换配额和撇号

问题描述

2 个解决方案

解决方案1
0 2016-03-14 15:17:37

`\\b`

解决方案2
0 2016-03-14 15:21:03

Python使用re.sub和dict替换配额和撇号

问题描述

2 个解决方案

解决方案1 0 2016-03-14 15:17:37

\\b

解决方案2 0 2016-03-14 15:21:03

解决方案1
0 2016-03-14 15:17:37

`\\b`

解决方案2
0 2016-03-14 15:21:03