简体   繁体   English

需要使用子进程模块在 python 脚本中运行 AWK 命令

[英]Need to run AWK command in python script using subprocess module

c1 = "'BEGIN{FS = OFS = \",\"}{if(toupper($11) ~ \"DVT\"){$(NF+1) = NR==1 ? \"indication\" : \"DVT\"}else if(toupper($11) ~ \"AFIB\"){$(NF+1) = NR==1 ? \"indication\" : \"AFIB\"}else{$(NF+1) = NR==1 ? \"indication\" : \"TESTING\"}} 1'"

print(c1)

p1=subprocess.Popen(["awk",c1,"abc.csv"],stdout=outfile)

p1.communicate()

This command is running fine in shell script.此命令在 shell 脚本中运行良好。 So, the command parameters seem fine.所以,命令参数看起来不错。 But while running through python i keep getting the error: "invalid char ''' in expression."但是在运行python时,我不断收到错误消息:“表达式中的字符'''无效。”

The outer quotes are unnecessary.外部引号是不必要的。 When you have a shell, the single quotes are necessary to protect the Awk script from the shell;当你有一个 shell 时,单引号是必要的,以保护 awk 脚本不受 shell 影响; but here, you don't have a shell.但是在这里,您没有外壳。 Then you can and should also get rid of all those backslashes.然后你可以也应该摆脱所有这些反斜杠。

You should prefer subprocess.run() over bare Popen if you just need the subprocess to run, and then continue your program when it's done.如果您只需要运行子Popen那么您应该更喜欢subprocess.run()不是裸Popen ,然后在完成后继续您的程序。

c1 = '''BEGIN{FS = OFS = ","}
{if(toupper($11) ~ "DVT"){$(NF+1) = NR==1 ? "indication" : "DVT"}
 else if(toupper($11) ~ "AFIB"){$(NF+1) = NR==1 ? "indication" : "AFIB"}
 else{$(NF+1) = NR==1 ? "indication" : "TESTING"}} 
1'''

print(c1)

result = subprocess.run(["awk", c1, "abc.csv"],
    stdout=outfile)

However, running Awk as a subprocess of Python is almost always unnecessary.但是,将 Awk 作为 Python 的子进程运行几乎总是没有必要的。 You should be able to refactor this into roughly the same amount of Python code.您应该能够将其重构为大致相同数量的 Python 代码。

with open("abc.csv") as infile:
    firstline = True
    for line in infile:
        fields = line.rstrip("\n").split(",")
        if firstline:
            added = "indication"
        else:
            ind = fields[10].upper()
            if "DVT" in ind:
                added = "DVT"
            elif "AFIB" in ind:
                added = "AFIB"
            else:
                added = "TESTING"
        fields.append(added)
        outfile.write(",".join(fields) + "\n")
        firstline = False

This is somewhat more verbose but that's mainly because I used more descriptive variable names.这有点冗长,但这主要是因为我使用了更具描述性的变量名称。 If your input file is really a CSV file, the csv module from the Python standard library could conveniently replace the splitting logic etc, though it then introduces some other idiosyncrasies of its own.如果您的输入文件确实是一个 CSV 文件,则 Python 标准库中的csv模块可以方便地替换拆分逻辑等,尽管它会引入一些其他自己的特性。 If you have to deal with quoted commas etc in CSV, that's where it really adds value.如果您必须在 CSV 中处理带引号的逗号等,这就是它真正增加价值的地方。

All things counted, Awk is more terse, but that's because it's more specialized.所有的事情计,AWK更简洁,但这是因为它更专业。 The main drawback of embedding it in your Python code is that the reader has to understand both languages in order to understand the code (though avoiding an external process is also always nice).将其嵌入 Python 代码的主要缺点是读者必须了解两种语言才能理解代码(尽管避免外部进程也总是好的)。 I'll venture a guess that you were handed this Awk code without any explanation as to how it works or how to fix it if it breaks...?我敢猜测,你收到了这个 Awk 代码,但没有解释它是如何工作的,或者如果它损坏了如何修复它......?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM