简体   繁体   English

Python使用re.sub和dict替换配额和撇号

[英]Python using re.sub with a dict to replace quota and apostrophe

I'm trying to replace ' and " in a sting. Here is the dict: 我正在尝试在字符串中替换“和”。这是字典:

char_replace_list = {
    '"': '"',
    "'": ''',
    '&': '&',
    '<': '&lt',
    '>': '&gt',
}

This is what I did: 这是我所做的:

s = '\' " & < >'
pattern = re.compile(r'\b(' + '|'.join(self.char_replace_list.keys()) + r')\b')
pattern.sub(lambda x: char_replace_list[x.group()], s)

The result is: 结果是:

' " &amp; &lt; &gt;

Where did I do wrong? 我在哪里做错了?

Interestingly I get a different result, with no substitutions at all on my machine. 有趣的是,我得到了不同的结果,我的机器上根本没有替代品。

Your issue is that the edges of those punctuation characters are not considered word boundaries (in a platform-dependent way!?): 您的问题是这些标点符号的边缘不被视为单词边界(以平台相关的方式!?):

\\b

Matches the empty string, but only at the beginning or end of a word. 匹配空字符串,但仅匹配单词的开头或结尾。 A word is defined as a sequence of alphanumeric or underscore characters , so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. 单词定义为字母数字或下划线字符的序列 ,因此单词的结尾由空格或非字母数字的非下划线字符指示。 Note that formally, \\b is defined as the boundary between a \\w and a \\W character (or vice versa), or between \\w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags . 请注意,正式地, \\b被定义为\\w\\W字符之间的边界(反之亦然),或者\\w与字符串的开头/结尾之间的边界,因此被视为字母数字字符的精确字符集取决于关于UNICODELOCALE标志的值 For example, r'\\bfoo\\b' matches 'foo' , 'foo.' 例如, r'\\bfoo\\b''foo''foo.' r'\\bfoo\\b'匹配'foo.' , '(foo)' , 'bar foo baz' but not 'foobar' or 'foo3' . '(foo)''bar foo baz'而不是'foobar''foo3' Inside a character range, \\b represents the backspace character, for compatibility with Python's string literals. 在字符范围内, \\b表示退格字符,以与Python的字符串文字兼容。

Instead of \\b...\\b you could use (?<= |^)...(?= |$) 可以使用(?<= |^)...(?= |$)代替\\b...\\b

I this case you can use translate method: 在这种情况下,您可以使用翻译方法:

char_replace_list = {
    '"': '&quot;',
    "'": '&apos;',
    '&': '&amp;',
    '<': '&lt',
    '>': '&gt',
}
s = '\' " & < >'
# table translate from a mapping
t = "".maketrans(char_replace_list)
print(s.translate(t))
# &apos; &quot; &amp; &lt &gt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM