简体   繁体   中英

How to combine multiple sed and awk commands?

I have a folder with about 2 million files in it. I need to run the following commands:

sed -i 's/<title>/<item><title>/g;s/rel="nofollow"//g;s/<\/a> &bull;/]]><\/wp:meta_value><\/wp:postmeta><content:encoded><![CDATA[/g;s/By <a href="http:\/\/www.website.com\/authors.*itemprop="author">/<wp:postmeta><wp:meta_key><![CDATA[custom_author]]><\/wp:meta_key><wp:meta_value><![CDATA[/g' /home/testing/*

sed -i '$a]]></content:encoded><wp:status><![CDATA[draft]]></wp:status><wp:post_type><![CDATA[post]]></wp:post_type><dc:creator><![CDATA[Database]]></dc:creator></item>\' /home/testing/*

awk -i inplace 1 ORS=' ' /home/testing/*

The problem I'm having is that when I run the first command, it cycles through all 2 million files, then I move on to the second command and so on. The problem is that I'm basically having to open files 6 million times in total.

I'd prefer that when each file is opened, all 3 commands are run on it and then it moves on to the next. Hopefully that makes sense.

You can do everything in one awk command as something like:

awk -i inplace -v ORS=' ' '{
    gsub(/<title>/,"<item><title>")
    gsub(/rel="nofollow"/,"")
    gsub(/<\/a> &bull;/,"]]><\/wp:meta_value><\/wp:postmeta><content:encoded><![CDATA[")
    gsub(/By <a href="http:\/\/www.website.com\/authors.*itemprop="author">/,"<wp:postmeta><wp:meta_key><![CDATA[custom_author]]><\/wp:meta_key><wp:meta_value><![CDATA[")
    print $0 "]]></content:encoded><wp:status><![CDATA[draft]]></wp:status><wp:post_type><![CDATA[post]]></wp:post_type><dc:creator><![CDATA[Database]]></dc:creator></item>"
}' /home/testing/*

but that doesn't mean it's necessarily the best way to do what you want.

The above relies on my correctly interpreting what your commands are doing and is obviously untested since you didn't provide any sample input and expected output. It also still relies on GNU awk for -i inplace like your original script did.

Assuming that your files are small enough for a single file to fit into memory as a whole (and assuming GNU sed , which your use of -i without an option-argument implies):

sed -i -e ':a;$!{N;ba}; s/.../.../g; ...; $a...' -e 's/\n/ /g' /home/testing/*

s/.../.../g; ...; and $a... in the command above represent your actual substitution and append commands.

:a;$!{N;ba}; reads each input file as a whole, and then performs the desired substitutions, appending, and replacement of all newlines with a single space each. [1]

This allows you to make do with a single sed command per input file.


[1] Your awk 1 ORS=' ' command actually creates output with a trailing space instead of a newline. By contrast, 's/\\n/ /g' applied to the whole input file will only place a space between lines, and terminate the overall file with a newline (assuming the input file ended in one).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM