简体   繁体   中英

Python UTF-8 REGEX

I have a problem while trying to find text specified in regex. Everything work perfectly fine but when i added "\\£" to my regex it started causing problems. I get SyntaxError. "NON ASCII CHACTER "\\xc2" in file (...) but no encoding declared...

I've tried to solve this problem with using

import sys
reload(sys)  # to enable `setdefaultencoding` again
sys.setdefaultencoding("UTF-8")

but it doesnt help. I just want to build regular expression and use pound sign there. flag re.Unicode flag doesnt help, saving string as unicode (pat) doesnt help. Is there any solution to fix this regex? I just want to build regular expression and use pound sign there.Thanks for help.

                    k = text.encode('utf-8')
                    pat = u'salar.{1,6}?([0-9\-,\. \tkFFRroOMmTtAanNuUMm\$\&\;\£]{2,})'
                    pattern = re.compile(pat, flags = re.DOTALL|re.I|re.UNICODE)
                    salary =  pattern.search(k).group(1)
                    print (salary)

Error is still there even if I comment(put "#" and skip all of those lines. Maybe its not connected with re. library but my settings?

The error message means Python cannot guess which character set you are using. It also tells you that you can fix it by telling it the encoding of your script.

# coding: utf-8
string = "£"

or equivalently

string = u"\u00a3"

Without an encoding declaration, Python sees a bunch of bytes which mean different things in different encodings. Rather than guess, it forces you to tell you what they mean. This is codified in PEP-263 .

(ASCII is unambiguous [except if your system is EBCDIC I guess] so it knows what you mean if you use a pure-ASCII representation for everything.)

The encoding settings you were fiddling with affect how files and streams are read, and program I/O generally, but not how the program source is interpreted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM