简体   繁体   English

为什么要使用Python的os模块方法而不是直接执行shell命令?

[英]Why use Python's os module methods instead of executing shell commands directly?

I am trying to understand what is the motivation behind using Python's library functions for executing OS-specific tasks such as creating files/directories, changing file attributes, etc. instead of just executing those commands via os.system() or subprocess.call() ? 我试图了解使用Python的库函数执行特定于操作系统的任务(例如创建文件/目录,更改文件属性等)的动机是什么,而不是仅通过os.system()os.system()执行这些命令subprocess.call()

For example, why would I want to use os.chmod instead of doing os.system("chmod...") ? 例如,为什么我要使用os.chmod而不是os.system("chmod...")

I understand that it is more "pythonic" to use Python's available library methods as much as possible instead of just executing shell commands directly. 我知道尽可能多地使用Python的可用库方法而不是直接执行shell命令更“pythonic”。 But, is there any other motivation behind doing this from a functionality point of view? 但是,从功能的角度来看,还有其他动机吗?

I am only talking about executing simple one-line shell commands here. 我只是在谈论在这里执行简单的单行shell命令。 When we need more control over the execution of the task, I understand that using subprocess module makes more sense, for example. 当我们需要更多地控制任务的执行时,我理解使用subprocess进程模块更有意义,例如。

  1. It's faster , os.system and subprocess.call create new processes which is unnecessary for something this simple. 它的速度更快os.systemsubprocess.call创建新进程是不必要的东西这么简单。 In fact, os.system and subprocess.call with the shell argument usually create at least two new processes: the first one being the shell, and the second one being the command that you're running (if it's not a shell built-in like test ). 事实上, os.systemsubprocess.callshell参数通常至少创建两个新进程:第一个是罩,而第二个是,你正在运行的命令(如果它不是内置在外壳喜欢test )。

  2. Some commands are useless in a separate process . 某些命令在单独的进程无用 For example, if you run os.spawn("cd dir/") , it will change the current working directory of the child process, but not of the Python process. 例如,如果运行os.spawn("cd dir/") ,它将更改子进程的当前工作目录,但不会更改Python进程的当前工作目录。 You need to use os.chdir for that. 你需要使用os.chdir

  3. You don't have to worry about special characters interpreted by the shell. 您不必担心shell 解释的特殊字符 os.chmod(path, mode) will work no matter what the filename is, whereas os.spawn("chmod 777 " + path) will fail horribly if the filename is something like ; rm -rf ~ 无论文件名是什么os.chmod(path, mode)都会工作,而如果文件名是这样的话, os.spawn("chmod 777 " + path)将会失败; rm -rf ~ ; rm -rf ~ . ; rm -rf ~ (Note that you can work around this if you use subprocess.call without the shell argument.) (请注意,如果使用不带shell参数的subprocess.call则可以解决此问题。)

  4. You don't have to worry about filenames that begin with a dash . 您不必担心以破折号开头的文件名 os.chmod("--quiet", mode) will change the permissions of the file named --quiet , but os.spawn("chmod 777 --quiet") will fail, as --quiet is interpreted as an argument. os.chmod("--quiet", mode)将更改名为--quiet的文件的权限,但os.spawn("chmod 777 --quiet")将失败,因为--quiet被解释为参数。 This is true even for subprocess.call(["chmod", "777", "--quiet"]) . 即使对于subprocess.call(["chmod", "777", "--quiet"])

  5. You have fewer cross-platform and cross-shell concerns, as Python's standard library is supposed to deal with that for you. 您的跨平台和跨shell问题较少,因为Python的标准库应该为您处理。 Does your system have chmod command? 你的系统有chmod命令吗? Is it installed? 它安装了吗? Does it support the parameters that you expect it to support? 它是否支持您希望它支持的参数? The os module will try to be as cross-platform as possible and documents when that it's not possible. os模块将尝试尽可能跨平台,并在不可能的情况下记录文档。

  6. If the command you're running has output that you care about, you need to parse it, which is trickier than it sounds, as you may forget about corner-cases (filenames with spaces, tabs and newlines in them), even when you don't care about portability. 如果您正在运行的命令具有您关心的输出 ,则需要解析它,这比听起来更棘手,因为您可能会忘记角落情况(文件名中包含空格,制表符和换行符),即使您不关心可移植性。

It is safer. 它更安全。 To give you an idea here is an example script 这里给出一个想法是一个示例脚本

import os
file = raw_input("Please enter a file: ")
os.system("chmod 777 " + file)

If the input from the user was test; rm -rf ~ 如果用户的输入是test; rm -rf ~ test; rm -rf ~ this would then delete the home directory. test; rm -rf ~这将删除主目录。

This is why it is safer to use the built in function. 这就是使用内置函数更安全的原因。

Hence why you should use subprocess instead of system too. 因此,为什么你应该使用subprocess而不是system。

There are four strong cases for preferring Python's more-specific methods in the os module over using os.system or the subprocess module when executing a command: 在执行命令时,在os模块中使用os.systemsubprocess os.system模块优先选择Python更具体的方法有四种情况:

  • Redundancy - spawning another process is redundant and wastes time and resources. 冗余 - 产生另一个过程是多余的,浪费时间和资源。
  • Portability - Many of the methods in the os module are available in multiple platforms while many shell commands are os-specific. 可移植性 - os模块中的许多方法都可以在多个平台上使用,而许多shell命令是特定于操作系统的。
  • Understanding the results - Spawning a process to execute arbitrary commands forces you to parse the results from the output and understand if and why a command has done something wrong. 理解结果 - 生成执行任意命令的进程会强制您解析输出结果,并了解命令是否以及为何出错。
  • Safety - A process can potentially execute any command it's given. 安全性 - 进程可以执行它给出的任何命令。 This is a weak design and it can be avoided by using specific methods in the os module. 这是一种弱设计,可以通过在os模块中使用特定方法来避免。

Redundancy (see redundant code ): 冗余(参见冗余代码 ):

You're actually executing a redundant "middle-man" on your way to the eventual system calls ( chmod in your example). 您实际上在前往最终系统调用的路上执行冗余的“中间人”(在您的示例中为chmod )。 This middle man is a new process or sub-shell. 这个中间人是一个新的过程或子壳。

From os.system : 来自os.system

Execute the command (a string) in a subshell ... 子shell中执行命令(字符串)...

And subprocess is just a module to spawn new processes. subprocess进程只是一个产生新进程的模块。

You can do what you need without spawning these processes. 您可以在不产生这些过程的情况下完成所需的操作。

Portability (see source code portability ): 可移植性(参见源代码可移植性 ):

The os module's aim is to provide generic operating-system services and it's description starts with: os模块的目标是提供通用的操作系统服务,它的描述始于:

This module provides a portable way of using operating system dependent functionality. 该模块提供了一种使用操作系统相关功能的便携方式。

You can use os.listdir on both windows and unix. 您可以在windows和unix上使用os.listdir Trying to use os.system / subprocess for this functionality will force you to maintain two calls (for ls / dir ) and check what operating system you're on. 尝试使用os.system / subprocess此功能将迫使你维持两个电话( ls / dir ),并检查你是什么操作系统。 This is not as portable and will cause even more frustration later on (see Handling Output ). 这不是那么便携,以后引起更多的挫折(参见处理输出 )。

Understanding the command's results: 理解命令的结果:

Suppose you want to list the files in a directory. 假设您要列出目录中的文件。

If you're using os.system("ls") / subprocess.call(['ls']) , you can only get the process's output back, which is basically a big string with the file names. 如果你正在使用os.system("ls") / os.system("ls") subprocess.call(['ls']) ,你只能得到进程的输出,这基本上是一个带有文件名的大字符串。

How can you tell a file with a space in it's name from two files? 如何从两个文件中告诉一个带有空格的文件?

What if you have no permission to list the files? 如果您没有列出文件的权限怎么办?

How should you map the data to python objects? 你应该如何将数据映射到python对象?

These are only off the top of my head, and while there are solutions to these problems - why solve again a problem that was solved for you? 这些只是我的头脑,虽然有这些问题的解决方案 - 为什么再次解决一个为你解决的问题?

This is an example of following the Don't Repeat Yourself principle (Often reffered to as "DRY") by not repeating an implementation that already exists and is freely available for you. 这也是继的例子不要重复自己原则上没有重复已经存在并且是免费提供给你一个实现(通常下文称“干”)。

Safety: 安全:

os.system and subprocess are powerful. os.systemsubprocess是强大的。 It's good when you need this power, but it's dangerous when you don't. 当你需要这种力量时它很好,但是当你不需要它时它很危险。 When you use os.listdir , you know it can not do anything else other then list files or raise an error. 当你使用os.listdir ,你知道它除了列出文件或引发错误之外别无能力。 When you use os.system or subprocess to achieve the same behaviour you can potentially end up doing something you did not mean to do. 当您使用os.systemsubprocess来实现你有可能最终会做一些相同的行为,你不是故意这样做。

Injection Safety (see shell injection examples ) : 注射安全性(见壳注射实例

If you use input from the user as a new command you've basically given him a shell. 如果你使用来自用户的输入作为新命令,你基本上给了他一个shell。 This is much like SQL injection providing a shell in the DB for the user. 这很像SQL注入,在DB中为用户提供shell。

An example would be a command of the form: 一个例子是表单的命令:

# ... read some user input
os.system(user_input + " some continutation")

This can be easily exploited to run any arbitrary code using the input: NASTY COMMAND;# to create the eventual: 这可以很容易利用使用输入来运行任意代码: NASTY COMMAND;#以创建最终:

os.system("NASTY COMMAND; # some continuation")

There are many such commands that can put your system at risk. 有许多此类命令可能会使您的系统面临风险。

For a simple reason - when you call a shell function, it creates a sub-shell which is destroyed after your command exists, so if you change directory in a shell - it does not affect your environment in Python. 原因很简单 - 当你调用shell函数时,它会创建一个子shell,在命令存在后会被销毁,所以如果你在shell中更改目录 - 它不会影响Python中的环境。

Besides, creating sub-shell is time consuming, so using OS commands directly will impact your performance 此外,创建子shell非常耗时,因此直接使用OS命令会影响您的性能

EDIT 编辑

I had some timing tests running: 我有一些运行时间测试:

In [379]: %timeit os.chmod('Documents/recipes.txt', 0755)
10000 loops, best of 3: 215 us per loop

In [380]: %timeit os.system('chmod 0755 Documents/recipes.txt')
100 loops, best of 3: 2.47 ms per loop

In [382]: %timeit call(['chmod', '0755', 'Documents/recipes.txt'])
100 loops, best of 3: 2.93 ms per loop

Internal function runs more than 10 time faster 内部功能运行速度提高10倍以上

EDIT2 EDIT2

There may be cases when invoking external executable may yield better results than Python packages - I just remembered a mail sent by a colleague of mine that performance of gzip called through subprocess was much higher than the performance of a Python package he used. 可能存在调用外部可执行文件可能比Python包产生更好结果的情况 - 我只记得我的一位同事发来的邮件,通过子进程调用的gzip的性能比他使用的Python包的性能要高得多。 But certainly not when we are talking about standard OS packages emulating standard OS commands 但当我们谈论模拟标准OS命令的标准OS包时,当然不是

Shell call are OS specific whereas Python os module functions are not, in most of the case. Shell调用是特定于操作系统的,而在大多数情况下,Python os模块函数不是。 And it avoid spawning a subprocess. 它避免产生子进程。

It's far more efficient. 效率更高。 The "shell" is just another OS binary which contains a lot of system calls. “shell”只是另一个包含大量系统调用的OS二进制文件。 Why incur the overhead of creating the whole shell process just for that single system call? 为什么只为单个系统调用产生创建整个shell进程的开销?

The situation is even worse when you use os.system for something that's not a shell built-in. 当你将os.system用于不是内置shell的东西时,情况会更糟。 You start a shell process which in turn starts an executable which then (two processes away) makes the system call. 你启动一个shell进程,然后启动一个可执行文件,然后(两个进程)进行系统调用。 At least subprocess would have removed the need for a shell intermediary process. 至少subprocess将不再需要shell中间进程。

It's not specific to Python, this. 这不是Python特有的。 systemd is such an improvement to Linux startup times for the same reason: it makes the necessary system calls itself instead of spawning a thousand shells. systemd是对Linux启动时间的一种改进,原因相同:它使得必要的系统调用本身而不是产生一千个shell。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM