简体   繁体   中英

how to match `“` with regex in python?

There is a symbol in tweets :

“@BrownieSWP: High is s***????” you like 12 tf

The symbol is not " . I write this regex to match it:

re.sub('(“|”)', '"', tweet)

This regex (“|”) worked in sublime text. But it didn't work in python.

The character you have copy/pasted is a U+201C "LEFT DOUBLE QUOTATION MARK". In the re.sub() you also have the corresponding right quotation mark U+201D . Perhaps the environment in which you tried to paste it wasn't set up to handle Unicode correctly, and converted it to some other encoding. (See also How do I see the current encoding of a file in Sublime Text 2? )

You can always use Python's escape codes to unambiguously and ASCII-compatibly refer to a Unicode character; re.sub(u'[\“\”]', '', tweet)

It works for me,

>>> s = r"“@BrownieSWP: High is s***????” you like 12 tf"
>>> m = re.sub(r'[”“]', r'', s)
>>> m
'@BrownieSWP: High is s***???? you like 12 tf'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM