简体   繁体   中英

Working in Mawk; but not in Awk

This could be a difficult fix, could be a dead simple fix that's glaring me in the eyes and I just can't see it.

In trying to run this awk command for file piece1.txt:

    awk 'BEGIN { RS = "href=\""; ORS = ""; FS = OFS = "\"" } NR > 1 {  gsub("~", "", $1); gsub("!", "", $1); gsub("%20", "_", $1); gsub("#", "", $1); gsub("$", "", $1); gsub("%", "", $1); gsub("^", "", $1); gsub("&", "_", $1); gsub("@", "", $1); gsub("*", "", $1); gsub("\(", "", $1); gsub("\)", "", $1); gsub(/ /, "_", $1); gsub("____", "_", $1); gsub("___", "_", $1); gsub("__", "_", $1); print RS } 1' piece1.txt

Output error:

    awk: cmd. line:1: (FILENAME=piece1.txt FNR=2) fatal: Unmatched ( or \(: /(/

It seems to run the cmd up until the first instance of "href=", as specified, and then it wipes the rest of the txt file.

I'm led to believe there's just a problem in my code that I'm overlooking. But the strange thing is that this code works perfectly in a Debian/Ubuntu Distro (MAWK is default). It's only in GNU Awk in a Mint KDE distro that I'm getting this error.

If it's relevant:

    > awk --version
    > GNU Awk 4.0.1

Any help?

You needed to quote your regex string further as you used "" to encapsulate them instead of // :

awk 'BEGIN { RS = "href=\""; ORS = ""; FS = OFS = "\"" } NR > 1 {  gsub("~", "", $1); gsub("!", "", $1); gsub("%20", "_", $1; gsub("#", "", $1); gsub("$", "", $1); gsub("%", "", $1); gsub("^", "", $1); gsub("&", "_", $1); gsub("@", "", $1); gsub("*", "", $1); gsub("\\(", "", $1); gsub("\\)", "", $1); gsub(/ /, "_", $1); gsub("____", "_", $1); gsub("___", "_", $1); gsub("__", "_", $1); print RS } 1' piece1.txt

This was the part that was changed: gsub("\\\\(", "", $1); gsub("\\\\)", "", $1);

I suggest changing your patterns and use // instead. It's also more efficient.

You might find this simpler as well:

awk 'BEGIN { RS = "href=\""; ORS = ""; FS = OFS = "\"" } NR > 1 { gsub(/(%20|_)+/, "_", $1); gsub(/[~!#$%^&*()@]/, "", $1); print RS } 1' piece1.txt

Or

awk 'BEGIN { RS = "href=\""; ORS = ""; FS = OFS = "\"" } NR > 1 { gsub(/%20/, "_", $1); gsub(/[~!#$%^&*()@]/, "", $1); gsub(/_+/, "_", $1); print RS } 1' piece1.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM