简体   繁体   中英

subprocess.run() argument encoding

I have a Flask application (Linux, Apache with mod_wsgi, Python 3) which calls a shell script with some arguments. When there are any non-ascii characters in the subprocess.run() command arguments, following error occurs in the application:

'ascii' codec can't encode characters in position 5-6: ordinal not in range(128)

I spent a lot of time trying to fix it.

No such problem exists in the command line, only in the application.

The entire application's output is in Unicode and there are no problems with it. After some research I came to the conclusion the problem is with the "filesystem encoding".

I have added some logging statements to my run.wsgi script. The FS encoding was 'ascii' indeed (and 'utf-8' in the command line).

In the next step I found this article How to change file system encoding via python?

The Apache httpd server was started with LANG=C in its environment. I have changed it to C.UTF-8 despite warnings in /etc/sysconfig/httpd . That did not help, the FS encoding was still 'ascii'. I have then even monkey-patched the sys.getfilesystemencoding() to lambda: 'utf-8' . But the error is still there.

I have properly restarted the httpd service after each change.

I'm at my wits' end.

  1. Is my problem really caused by the FS encoding?
  2. If yes, why my attempts to change it to utf-8 failed?
  3. Most importantly: How can I solve this issue?

UPDATE1:

code snippet:

    import subprocess as sub
    cmdresult = sub.run(
        [SCRIPT, tid, days, name],
        stdin=sub.DEVNULL, stdout=sub.PIPE, stderr=sub.DEVNULL,
        encoding='ascii', # 'utf-8' will not help, this affects stdin, stdout I/O only
        check=True)

In the context of mod_wsgi, you should ensure you are using mod_wsgi daemon mode and set the lang/locale for the mod_wsgi daemon process group. For a much more detailed explanation which is too much to repeat here, see:

(Answering own question hoping it could be helpfull to others)

I made a short test program. This is what I have found:

  1. File system encoding is the key point.
  2. Monkey patching does not work. Well, that's OK. It is not acceptable as a solution anyway.
  3. LANG=C.UTF-8 requires the locale installed and it was not on my system (checked with locale -a ). But on a second system where it was available, it worked.
  4. I can make the encoding explicitly and pass bytes as one of the args:

     cmdresult = sub.run( [SCRIPT, tid, days, name.encode('utf-8')], ... 

This works, but one question remianed:

Does it comply with the docs?

All I could find is:

args should be a sequence of program arguments or else a single string

And I did understand it as one string or a list of strings, but actually it does not specify a list of what types. I passed also and int to see what will happen. I got this error:

expected str, bytes or os.PathLike object

So my solution seems to be fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM