简体   繁体   English

如何结合多个sed和awk命令?

[英]How to combine multiple sed and awk commands?

I have a folder with about 2 million files in it. 我有一个包含约200万个文件的文件夹。 I need to run the following commands: 我需要运行以下命令:

sed -i 's/<title>/<item><title>/g;s/rel="nofollow"//g;s/<\/a> &bull;/]]><\/wp:meta_value><\/wp:postmeta><content:encoded><![CDATA[/g;s/By <a href="http:\/\/www.website.com\/authors.*itemprop="author">/<wp:postmeta><wp:meta_key><![CDATA[custom_author]]><\/wp:meta_key><wp:meta_value><![CDATA[/g' /home/testing/*

sed -i '$a]]></content:encoded><wp:status><![CDATA[draft]]></wp:status><wp:post_type><![CDATA[post]]></wp:post_type><dc:creator><![CDATA[Database]]></dc:creator></item>\' /home/testing/*

awk -i inplace 1 ORS=' ' /home/testing/*

The problem I'm having is that when I run the first command, it cycles through all 2 million files, then I move on to the second command and so on. 我遇到的问题是,当我运行第一个命令时,它将遍历所有200万个文件,然后继续执行第二个命令,依此类推。 The problem is that I'm basically having to open files 6 million times in total. 问题是我基本上必须总共打开600万次文件。

I'd prefer that when each file is opened, all 3 commands are run on it and then it moves on to the next. 我希望在打开每个文件时,在其上运行所有3个命令,然后将其移至下一个。 Hopefully that makes sense. 希望这是有道理的。

You can do everything in one awk command as something like: 您可以在一个awk命令中执行所有操作,如下所示:

awk -i inplace -v ORS=' ' '{
    gsub(/<title>/,"<item><title>")
    gsub(/rel="nofollow"/,"")
    gsub(/<\/a> &bull;/,"]]><\/wp:meta_value><\/wp:postmeta><content:encoded><![CDATA[")
    gsub(/By <a href="http:\/\/www.website.com\/authors.*itemprop="author">/,"<wp:postmeta><wp:meta_key><![CDATA[custom_author]]><\/wp:meta_key><wp:meta_value><![CDATA[")
    print $0 "]]></content:encoded><wp:status><![CDATA[draft]]></wp:status><wp:post_type><![CDATA[post]]></wp:post_type><dc:creator><![CDATA[Database]]></dc:creator></item>"
}' /home/testing/*

but that doesn't mean it's necessarily the best way to do what you want. 但这并不意味着它一定是您想要做的最好的方法。

The above relies on my correctly interpreting what your commands are doing and is obviously untested since you didn't provide any sample input and expected output. 上面的内容依靠我正确地解释了您的命令正在执行的操作,并且由于您没有提供任何示例输入和预期输出,因此显然未经测试。 It also still relies on GNU awk for -i inplace like your original script did. 它仍然像您的原始脚本一样依赖于-i inplace GNU awk。

Assuming that your files are small enough for a single file to fit into memory as a whole (and assuming GNU sed , which your use of -i without an option-argument implies): 假设您的文件足够小,以至于单个文件可以整体上容纳到内存中(并假定为GNU sed ,那么您使用-i不带选项参数意味着):

sed -i -e ':a;$!{N;ba}; s/.../.../g; ...; $a...' -e 's/\n/ /g' /home/testing/*

s/.../.../g; ...; and $a... in the command above represent your actual substitution and append commands. 上面命令中的$a...$a...代表实际的替换和附加命令。

:a;$!{N;ba}; reads each input file as a whole, and then performs the desired substitutions, appending, and replacement of all newlines with a single space each. 整体读取每个输入文件,然后执行所需的替换,附加和替换所有换行符,每个换行符用一个空格隔开。 [1] [1]

This allows you to make do with a single sed command per input file. 这使您可以对每个输入文件使用单个sed命令。


[1] Your awk 1 ORS=' ' command actually creates output with a trailing space instead of a newline. [1]您的awk 1 ORS=' '命令实际上创建带有尾随空格而不是换行符的输出。 By contrast, 's/\\n/ /g' applied to the whole input file will only place a space between lines, and terminate the overall file with a newline (assuming the input file ended in one). 相比之下,应用于整个输入文件的's/\\n/ /g'将仅在行之间放置一个空格,并以换行符终止整个文件(假设输入文件以1结尾)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM