繁体   English   中英


[英]Replacing unicode brackets in python


当我尝试使用re.sub ,出现sre_constants.error

>>> import re
>>> open_punct = ur'([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「'
>>> text = u'this is a weird ❴sentence ⟅with some crazy ⟦punctuations sprinkled⟨'
>>> re.sub(open_punct, ur'\1 ', text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: unexpected end of regular expression

为什么会这样呢? 为什么正则表达式意外结束?

当我尝试使用re.escape ,它不会引发错误,但是re.sub并没有用空格填充标点符号:

>>> re.sub(re.escape(open_punct), ur'\1 ', text)
u'this is a weird \u2774sentence \u27c5with some crazy \u27e6punctuations sprinkled\u27e8'
>>> print re.sub(re.escape(open_punct), ur'\1 ', text)
this is a weird ❴sentence ⟅with some crazy ⟦punctuations sprinkled⟨


>>> for p in open_punct:
...     text = text.replace(p, p+' ')
>>> text
u'this is a weird \u2774 sentence \u27c5 with some crazy \u27e6 punctuations sprinkled\u27e8 '
>>> print text
this is a weird ❴ sentence ⟅ with some crazy ⟦ punctuations sprinkled⟨ 
>>> open_punct
>>> print open_punct



如果open_punct设为字符组 ,则无论如何都将所有字符都用[..]括起来,这时([都可以不转义包括在内。您的'表达式'仅匹配文本中所有这些字符该订单存在。

由于您还希望引用捕获组( \\1 ),因此添加paretheses:

>>> re.sub(u'([{}])'.format(open_punct), ur'\1 ', text)
u'this is a weird \u2774 sentence \u27c5 with some crazy \u27e6 punctuations sprinkled\u27e8 '
>>> print re.sub(u'([{}])'.format(open_punct), ur'\1 ', text)
this is a weird ❴ sentence ⟅ with some crazy ⟦ punctuations sprinkled⟨

请注意,如果您要与之匹配的组中有-]字符或\\[group]序列,则使用re.escape()仍然是一个好主意。 -定义字符序列(所有数字为0-9 ), ]组的末尾和\\d\\w\\s等,均定义了预定义的字符组:

re.sub(u'([{}])'.format(re.escape(open_punct)), ur'\1 ', text)


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM