简体   繁体   中英

How to set preferred encoding in WSGI to UTF-8

Feeling a bit crazy here. I've got Apache set up with mod_wsgi, but I can't get the encoding to work properly. I have:

  • tested that mod_wsgi is running in daemon mode
  • read Graham Dumpleton's blog post about setting up the lang and locale settings for the WSGIDaemonProcess directive.
  • created a minimal test that seems to demonstrate the issue
# I recompiled the mod_wsgi file to get the Python version correct
sys.version = '3.8.6 (default, Sep 24 2020, 21:54:23) \n[GCC 8.3.0]'
sys.prefix = '/usr/local'
sys.path = ['/usr/local/lib/python38.zip', '/usr/local/lib/python3.8', '/usr/local/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/site-packages', '/usr/local/src/scorched']

# This seems to be a timing thing? Not sure, but possibly problematic
locale.getlocale() = (None, None)
# This was fixed by setting lang or locale (not sure which)
locale.getdefaultlocale() = ('en_US', 'UTF-8')
sys.getdefaultencoding() = 'utf-8'

# These seem like a problem...
sys.getfilesystemencoding() = 'ascii'
locale.getpreferredencoding(False): 'ANSI_X3.4-1968'

# It's daemon mode
mod_wsgi.process_group = 'cl'

My WSGI configs look like this:

    WSGIScriptAlias / /opt/courtlistener/docker/apache/wsgi-configs/python_version_test.py
    WSGIDaemonProcess cl \
      threads=10 \
      processes=64 \
      python-path=/usr/local/lib/python3.8/site-packages/ \
      lang='en_US.UTF-8' \
      locale='en_US.UTF-8'
    WSGIProcessGroup cl
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On

When I log into the server and start python in the terminal, this line works fine, but it fails when it runs via mod_wsgi :

from reporters_db import REPORTERS

All that line does is import a json file that has some utf-8 content in it. Here's the code behind that import:

db_root = os.path.dirname(os.path.realpath(__file__))
with open(os.path.join(db_root, "data", "reporters.json")) as f:
    REPORTERS = json.load(f, object_hook=datetime_parser)

Since the json call above doesn't have the encoding specified, it uses ASCII and fails:

 Traceback (most recent call last):
   File "/opt/courtlistener/docker/apache/wsgi-configs/python_version_test.py", line 6, in <module>
     from reporters_db import REPORTERS
   File "/usr/local/lib/python3.8/site-packages/reporters_db/__init__.py", line 22, in <module>
     REPORTERS = json.load(f, object_hook=datetime_parser)
   File "/usr/local/lib/python3.8/json/__init__.py", line 293, in load
     return loads(fp.read(),
   File "/usr/local/lib/python3.8/encodings/ascii.py", line 26, in decode
     return codecs.ascii_decode(input, self.errors)[0]
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 441720: ordinal not in range(128)

How can I tell it (and the rest of my codebase) to use utf-8 like sane adults?


Edit 1

Perhaps it is important to mention that I'm running apache with the following command:

exec apache2ctl -D FOREGROUND "$@"

I thought that would source the /etc/apache2/envvars file, so I appended the following to that file:

export LANG="en_US.UTF-8"

And I tried tweaking my startup command to:

LANG="en_US.UTF-8" exec apache2ctl -D FOREGROUND "$@"

I was hopeful, but no. Still no progress.

Well, I finally figured this out by searching for every time Graham Dumpleton mentioned the word "lang" on the Internet. That eventually turned up this thread , which mentioned that it was possible to not have a locale installed. I was able to check that by running locale -a inside my Ubuntu Docker image, which revealed:

locale -a
C
C.UTF-8
POSIX

So that's the issue! mod_wsgi doesn't know what I'm asking for when I ask for en_US.utf-8 , and it doesn't throw an error either. Swapping my settings to instead be set to C.UTF-8 fixed this immediately.

I'm running a slim docker image, so that must be why I lack locales. I also don't have a file at /etc/default/locale that a lot of other answers in this general area refer to.

I've filed this as a bug .

I had a similar UnicodeDecodeError issue when parsing a yaml file containing Unicode characters on Debian 11, Apache2, mod_wsgi.

It was enough to set WSGIDaemonProcess locale to C.UTF-8, then the error went. This single line changed in my /etc/apache2/sites-available/000-default.conf

     WSGIDaemonProcess my_app locale='C.UTF-8'

In the question, mlissner mentioned a bunch of settings tried, but those were not needed for me.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM