简体   繁体   English

使用字典和转义字符时奇怪的 Python 行为

[英]Weird Python behavior while using Dictionary and escape characters

I'm new to Python, and I'm trying to perform simple tasks the way I used to, but I've faced an interesting... feature?我是 Python 新手,我正在尝试像以前那样执行简单的任务,但是我遇到了一个有趣的...功能?

The code below works just how I want it to:下面的代码就像我想要的那样工作:

def cleanLDAP(search):
  escChars = {'(':r'\28', ')':r'\29' }
  for ch, val in escChars.items():
    if ch in search:
      search = search.replace(ch, val)
  return search

cleanLDAP('(123)')

The output is '\\\\28123\\\\29' as I expect, but when I change escChars as follows:正如我所料,输出是'\\\\28123\\\\29' ,但是当我按如下方式更改escChars时:

escChars = {'(':r'\28', ')':r'\29', '\\': '\5c' }

the output become a bit weird: '\\x05c28123\\x05c29'输出变得'\\x05c28123\\x05c29''\\x05c28123\\x05c29'

I understand that I might miss some implicit encoding changes, but still I want to know the reason why does this happening?我知道我可能会错过一些隐式编码更改,但我仍然想知道发生这种情况的原因? Thank you in advance!先感谢您!

5c in utf-8 is \\ . utf-8 5c\\

When you try save a string as \\5c Python returns the utf-8 hex for 5c since you prefixed the string with \\ this causes your value to become: \\x05c .当您尝试保存一个字符串作为\\5c的Python返回utf-8十六进制为5c因为你前缀字符串\\这会导致你的价值,成为: \\x05c

'\5c'
#'\x05c'

'5c'
#'5c'

escChars
#{'(': '\\28', ')': '\\29', '\\': '\x05c'}

When you iterate over your keys, the ch it tests against in your iteration is \\ , because you did not save your key with the raw format string r .当您迭代您的密钥时,它在您的迭代中测试的ch\\ ,因为您没有使用raw格式字符串r保存您的密钥。

for ch, value in escChars.items(): 
    print(ch, value)

#( \28
#) \29
#\ c

Finally, since you are modifying your everytime you find a match during iteration, you're checking if the \\ exists after you added it in via replace()最后,由于您在迭代期间每次找到匹配项时都在修改您的内容,因此您正在通过replace()添加它检查\\存在

This leads you to do your first replacement, then replaces the \\ you inserted into the string with the utf-8 symbol for \\ .这会导致你做你的第一个替换,然后替换\\你插入与字符串utf-8的符号\\

The simple fix here is to save your key with the r to ensure the code will only match against \\\\ and not \\ , and save your value with the same to ensure it does not get converted to hex.这里的简单修复是用r保存您的密钥,以确保代码仅匹配\\\\而不是\\ ,并使用相同的值保存您的值以确保它不会被转换为十六进制。

def cleanLDAP(search):
    escChars = {'(':r'\28', ')':r'\29', r'\\': r'\5c' }
    for ch, val in escChars.items():
        if ch in search:
            search = search.replace(ch, val)
    return search

>>> cleanLDAP('(123)')

#'\\28123\\29'

Change to -改成 -

escChars = {'(':r'\28', ')':r'\29', '\\': r'\5c' }

You missed adding r'\\5c' and just did '\\5c' .你错过了添加r'\\5c'而只是做了'\\5c' This makes it hexadecimal.这使它成为十六进制。

To understand with an example -举个例子来理解——

a='\5'
a
ord(a)

Returns '\\x05' and 5 respectively分别返回'\\x05'5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM