简体   繁体   中英

Apache, LDAP and WSGI encoding issue

I am using Apache 2.4.7 with mod_wsgi 3.4 on Ubuntu 14.04.2 (x86_64) and python 3.4.0. My python app relies on apache to perform user authentication against our company's LDAP server (MS Active Directory 2008). It also passes some additional LDAP data to the python app using the OS environment. In the apache config, I query the LDAP like so:

…
AuthLDAPURL "ldap://server:389/DC=company,DC=lokal?sAMAccountName,sn,givenName,mail,memberOf?sub?(objectClass=*)"
AuthLDAPBindDN …
AuthLDAPBindPassword …
AuthLDAPRemoteUserAttribute sAMAccountName
AuthLDAPAuthorizePrefix AUTHENTICATE_
…

This passes some user data to my WSGI script where I handle the info as follows:

# Make sure the packages from the virtualenv are found
import site
site.addsitedir('/home/user/.virtualenvs/ispot-cons/lib/python3.4/site-packages')

# Patch path for app (so that libispot can be found)
import sys
sys.path.insert(0, '/var/www/my-app/')

import os
from libispot.web import app as _application

def application(environ, start_response):
    os.environ['REMOTE_USER'] = environ.get('REMOTE_USER', "")
    os.environ['REMOTE_USER_FIRST_NAME'] = environ.get('AUTHENTICATE_GIVENNAME', "")
    os.environ['REMOTE_USER_LAST_NAME'] = environ.get('AUTHENTICATE_SN', "")
    os.environ['REMOTE_USER_EMAIL'] = environ.get('AUTHENTICATE_MAIL', "")
    os.environ['REMOTE_USER_GROUPS'] = environ.get('AUTHENTICATE_MEMBEROF', "")
    return _application(environ, start_response)

I can then access this info in my python app using os.environ.get(…) . (BTW: If you have a more elegant solution, please let me know!)

The problem is that some of the user names contain special characters (German umlauts, eg, äöüÄÖÜ ) that are not encoded correctly. So, for example, the name Tölle arrives in my python app as Tölle .

Obviously, this is an encoding problem, because

$ echo "Tölle" | iconv --from utf-8 --to latin1 

gives me the correct Tölle .

Another observation that might help: in my apache logs I found the character ü represented as \\xc3\\x83\\xc2\\xbc .

I told my Apache in /etc/apache2/envvars to use LANG=de_DE.UTF-8 and python 3 is utf-8 aware as well. I can't seem to specify anything about my LDAP server. So my question is: where is the encoding getting mixed up and how do I mend it?

It is bad practice to copy the values to os.environ on each request as this will fail miserable if the WSGI server is running with a multithreaded configuration, with concurrent requests interfering with each other. Look at thread locals instead.

As to the issue of encoded data from LDAP, if I under stand the problem, you would need to do:

"Tölle".encode('latin-1').decode('utf-8')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM