简体   繁体   中英

How to substitute '\N' in a string using python as it is leading to unicodeescape error?

import re

import string

a= """  Message-ID: <13505866.1075863688222.JavaMail.evans@thyme>
Date: Mon, 23 Oct 2000 06:13:00 -0700 (PDT)
From: phillip.allen@enron.com
To: randall.gay@enron.com
Subject: 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Phillip K Allen
X-To: Randall L Gay
X-cc: 
X-bcc: 
X-Folder: \Phillip_Allen_Dec2000\Notes Folders\'sent mail
X-Origin: Allen-P
X-FileName: pallen.nsf

Randy,

 Can you send me a schedule of the salary and level of everyone in the 
scheduling group.  Plus your thoughts on any changes that need to be made.  
(Patti S for example)

Phillip

""" <br>
s=re.sub('[\\\]+', ' yy', a)
print(s)

error message:unicodeescape' decode can't decode bytes in position 354-355:malformed error image \\N character space

I've already tried using different combinations of backslashes but its still showing the same error

To encode a literal backslash in a regex, you need four backlashes in a normal string (or two backslashes in a raw string), not three:

s = re.sub('\\\\+', ' yy', a)

or

s = re.sub(r'\\+', ' yy', a)

You don't need a character class for a single character (although it doesn't hurt much, either).

The problem that leads to your error message (which occurs during compile time, long before the also faulty regex is constructed (see my other answer)) is in this line:

X-Folder: \Phillip_Allen_Dec2000\Notes Folders\'sent mail

Here you have a \\N that Python tries to interpret as an escape sequence like "\\N{GREEK CAPITAL LETTER DELTA}" , and of course fails doing so.

You need two backslashes to correct that problem.

X-Folder: \\Phillip_Allen_Dec2000\\Notes Folders\\'sent mail

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM