简体   繁体   English

如何从python传递许多bash命令?

[英]How to pipe many bash commands from python?

Hi I'm trying to call the following command from python: 嗨,我正在尝试从python调用以下命令:

comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v "#" | sed "s/\t//g"

How could I do the calling when the inputs for the comm command are also piped? 当comm命令的输入也通过管道传输时,如何进行调用?

Is there an easy and straight forward way to do it? 有没有简单而直接的方法呢?

I tried the subprocess module: 我尝试了子流程模块:

subprocess.call("comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'")

Without success, it says: OSError: [Errno 2] No such file or directory 没有成功,它说:OSError:[Errno 2]没有这样的文件或目录

Or do I have to create the different calls individually and then pass them using PIPE as it is described in the subprocess documentation: 还是我必须单独创建不同的调用,然后使用PIPE传递它们,如子流程文档中所述:

p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

Process substitution ( <() ) is bash-only functionality. 进程替换( <() )是仅bash的功能。 Thus, you need a shell, but it can't be just any shell (like /bin/sh , as used by shell=True on non-Windows platforms) -- it needs to be bash . 因此,您需要一个shell,但它不能仅仅是任何shell(例如/bin/sh ,在非Windows平台上被shell=True使用),它必须是bash

subprocess.call(['bash', '-c', "comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'"])

By the way, if you're going to be going this route with arbitrary filenames, pass them out-of-band (as below: Passing _ as $0 , File1.txt as $1 , and File2.txt as $2 ): 顺便说一句,如果你打算是想与任意文件名的这条路线,传递出来的带外(如下图所示:传递_$0File1.txt$1 ,和File2.txt$2 ):

subprocess.call(['bash', '-c',
  '''comm -3 <(awk '{print $1}' "$1" | sort | uniq) '''
  '''        <(awk '{print $1}' "$2" | sort | uniq) '''
  '''        | grep -v '#' | tr -d "\t"''',
  '_', "File1.txt", "File2.txt"])

That said, the best-practices approach is indeed to set up the chain yourself. 也就是说,最佳实践方法的确是您自己建立链。 The below is tested with Python 3.6 (note the need for the pass_fds argument to subprocess.Popen to make the file descriptors referred to via /dev/fd/## links available): 下面是与Python 3.6测试(注意需要对pass_fds参数subprocess.Popen使简称通过文件描述符/dev/fd/##提供链接):

awk_filter='''! /#/ && !seen[$1]++ { print $1 }'''

p1 = subprocess.Popen(['awk', awk_filter],
                      stdin=open('File1.txt', 'r'),
                      stdout=subprocess.PIPE)
p2 = subprocess.Popen(['sort', '-u'],
                      stdin=p1.stdout,
                      stdout=subprocess.PIPE)
p3 = subprocess.Popen(['awk', awk_filter],
                      stdin=open('File2.txt', 'r'),
                      stdout=subprocess.PIPE)
p4 = subprocess.Popen(['sort', '-u'],
                      stdin=p3.stdout,
                      stdout=subprocess.PIPE)
p5 = subprocess.Popen(['comm', '-3',
                       ('/dev/fd/%d' % (p2.stdout.fileno(),)),
                       ('/dev/fd/%d' % (p4.stdout.fileno(),))],
                      pass_fds=(p2.stdout.fileno(), p4.stdout.fileno()),
                      stdout=subprocess.PIPE)
p6 = subprocess.Popen(['tr', '-d', '\t'],
                      stdin=p5.stdout,
                      stdout=subprocess.PIPE)
result = p6.communicate()

This is a lot more code, but (assuming that the filenames are parameterized in the real world) it's also safer code -- you aren't vulnerable to bugs like ShellShock that are triggered by the simple act of starting a shell, and don't need to worry about passing variables out-of-band to avoid injection attacks (except in the context of arguments to commands -- like awk -- that are scripting language interpreters themselves). 这是很多代码,但是(假设文件名在现实世界中已被参数化),它也是更安全的代码-您不容易受到像ShellShock这样的由启动shell的简单动作触发的bug的攻击,并且不要无需担心会带外传递变量以避免注入攻击(除非是脚本语言解释程序本身的命令参数(如awk的上下文中)。


That said, another thing to think about is just implementing the whole thing in native Python. 也就是说,要考虑的另一件事是仅在本机Python中实现整个事情。

lines_1 = set(line.split()[0] for line in open('File1.txt', 'r') if not '#' in line)
lines_2 = set(line.split()[0] for line in open('File2.txt', 'r') if not '#' in line)
not_common = (lines_1 - lines_2) | (lines_2 - lines_1)
for line in sorted(not_common):
  print line

Also checkout plumbum. 还结帐铅。 Makes life easier 让生活更轻松

http://plumbum.readthedocs.io/en/latest/ http://plumbum.readthedocs.io/en/latest/

Pipelining 流水线

This may be wrong, but you can try this: 这可能是错误的,但是您可以尝试以下操作:

from plumbum.cmd import grep, comm, awk, sort, uniq, sed 
_c1 = awk['{print $1}', 'File1.txt'] | sort | uniq
_c2 = awk['{print $1}', 'File2.txt'] | sort | uniq
chain = comm['-3', _c1(), _c2() ] | grep['-v', '#'] | sed['s/\t//g']
chain()

Let me know if this goes wrong, Will try to fix it. 让我知道这是否出错,将尝试修复它。

Edit: As pointed out, I missed the substitution thing, and I think it would have to be explicitly done by redirecting the above command output to a temporary file and then using that file in the argument to comm. 编辑:正如我指出的那样,我错过了替换的事情,我认为必须通过将以上命令输出重定向到一个临时文件,然后在comm参数中使用该文件来明确地完成替换。

So the above would now actually become: 因此,以上内容实际上变为:

from plumbum.cmd import grep, comm, awk, sort, uniq, sed 
_c1 = awk['{print $1}', 'File1.txt'] | sort | uniq
_c2 = awk['{print $1}', 'File2.txt'] | sort | uniq
(_c1 > "/tmp/File1.txt")(), (_c2 > "/tmp/File2.txt")()
chain = comm['-3', "/tmp/File1.txt", "/tmp/File2.txt" ] | grep['-v', '#'] | sed['s/\t//g']
chain()

Also, alternatively you can use the method described by @charles by making use of mkfifo. 另外,也可以通过使用mkfifo使用@charles描述的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM