[英]How to pipe many bash commands from python?
Hi I'm trying to call the following command from python: 嗨,我正在尝试从python调用以下命令:
comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v "#" | sed "s/\t//g"
How could I do the calling when the inputs for the comm command are also piped? 当comm命令的输入也通过管道传输时,如何进行调用?
Is there an easy and straight forward way to do it? 有没有简单而直接的方法呢?
I tried the subprocess module: 我尝试了子流程模块:
subprocess.call("comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'")
Without success, it says: OSError: [Errno 2] No such file or directory 没有成功,它说:OSError:[Errno 2]没有这样的文件或目录
Or do I have to create the different calls individually and then pass them using PIPE as it is described in the subprocess documentation: 还是我必须单独创建不同的调用,然后使用PIPE传递它们,如子流程文档中所述:
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
Process substitution ( <()
) is bash-only functionality. 进程替换(
<()
)是仅bash的功能。 Thus, you need a shell, but it can't be just any shell (like /bin/sh
, as used by shell=True
on non-Windows platforms) -- it needs to be bash . 因此,您需要一个shell,但它不能仅仅是任何shell(例如
/bin/sh
,在非Windows平台上被shell=True
使用),它必须是bash 。
subprocess.call(['bash', '-c', "comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'"])
By the way, if you're going to be going this route with arbitrary filenames, pass them out-of-band (as below: Passing _
as $0
, File1.txt
as $1
, and File2.txt
as $2
): 顺便说一句,如果你打算是想与任意文件名的这条路线,传递出来的带外(如下图所示:传递
_
为$0
, File1.txt
为$1
,和File2.txt
为$2
):
subprocess.call(['bash', '-c',
'''comm -3 <(awk '{print $1}' "$1" | sort | uniq) '''
''' <(awk '{print $1}' "$2" | sort | uniq) '''
''' | grep -v '#' | tr -d "\t"''',
'_', "File1.txt", "File2.txt"])
That said, the best-practices approach is indeed to set up the chain yourself. 也就是说,最佳实践方法的确是您自己建立链。 The below is tested with Python 3.6 (note the need for the
pass_fds
argument to subprocess.Popen
to make the file descriptors referred to via /dev/fd/##
links available): 下面是与Python 3.6测试(注意需要对
pass_fds
参数subprocess.Popen
使简称通过文件描述符/dev/fd/##
提供链接):
awk_filter='''! /#/ && !seen[$1]++ { print $1 }'''
p1 = subprocess.Popen(['awk', awk_filter],
stdin=open('File1.txt', 'r'),
stdout=subprocess.PIPE)
p2 = subprocess.Popen(['sort', '-u'],
stdin=p1.stdout,
stdout=subprocess.PIPE)
p3 = subprocess.Popen(['awk', awk_filter],
stdin=open('File2.txt', 'r'),
stdout=subprocess.PIPE)
p4 = subprocess.Popen(['sort', '-u'],
stdin=p3.stdout,
stdout=subprocess.PIPE)
p5 = subprocess.Popen(['comm', '-3',
('/dev/fd/%d' % (p2.stdout.fileno(),)),
('/dev/fd/%d' % (p4.stdout.fileno(),))],
pass_fds=(p2.stdout.fileno(), p4.stdout.fileno()),
stdout=subprocess.PIPE)
p6 = subprocess.Popen(['tr', '-d', '\t'],
stdin=p5.stdout,
stdout=subprocess.PIPE)
result = p6.communicate()
This is a lot more code, but (assuming that the filenames are parameterized in the real world) it's also safer code -- you aren't vulnerable to bugs like ShellShock that are triggered by the simple act of starting a shell, and don't need to worry about passing variables out-of-band to avoid injection attacks (except in the context of arguments to commands -- like awk
-- that are scripting language interpreters themselves). 这是很多代码,但是(假设文件名在现实世界中已被参数化),它也是更安全的代码-您不容易受到像ShellShock这样的由启动shell的简单动作触发的bug的攻击,并且不要无需担心会带外传递变量以避免注入攻击(除非是脚本语言解释程序本身的命令参数(如
awk
的上下文中)。
That said, another thing to think about is just implementing the whole thing in native Python. 也就是说,要考虑的另一件事是仅在本机Python中实现整个事情。
lines_1 = set(line.split()[0] for line in open('File1.txt', 'r') if not '#' in line)
lines_2 = set(line.split()[0] for line in open('File2.txt', 'r') if not '#' in line)
not_common = (lines_1 - lines_2) | (lines_2 - lines_1)
for line in sorted(not_common):
print line
Also checkout plumbum. 还结帐铅。 Makes life easier
让生活更轻松
http://plumbum.readthedocs.io/en/latest/ http://plumbum.readthedocs.io/en/latest/
This may be wrong, but you can try this: 这可能是错误的,但是您可以尝试以下操作:
from plumbum.cmd import grep, comm, awk, sort, uniq, sed
_c1 = awk['{print $1}', 'File1.txt'] | sort | uniq
_c2 = awk['{print $1}', 'File2.txt'] | sort | uniq
chain = comm['-3', _c1(), _c2() ] | grep['-v', '#'] | sed['s/\t//g']
chain()
Let me know if this goes wrong, Will try to fix it. 让我知道这是否出错,将尝试修复它。
Edit: As pointed out, I missed the substitution thing, and I think it would have to be explicitly done by redirecting the above command output to a temporary file and then using that file in the argument to comm. 编辑:正如我指出的那样,我错过了替换的事情,我认为必须通过将以上命令输出重定向到一个临时文件,然后在comm参数中使用该文件来明确地完成替换。
So the above would now actually become: 因此,以上内容实际上变为:
from plumbum.cmd import grep, comm, awk, sort, uniq, sed
_c1 = awk['{print $1}', 'File1.txt'] | sort | uniq
_c2 = awk['{print $1}', 'File2.txt'] | sort | uniq
(_c1 > "/tmp/File1.txt")(), (_c2 > "/tmp/File2.txt")()
chain = comm['-3', "/tmp/File1.txt", "/tmp/File2.txt" ] | grep['-v', '#'] | sed['s/\t//g']
chain()
Also, alternatively you can use the method described by @charles by making use of mkfifo. 另外,也可以通过使用mkfifo使用@charles描述的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.