简体   繁体   中英

How to prevent subshell expansion in awk

I have an awk script in which I need to compute hashes with some filenames that appear in the first field of the file I am processing. I am currently using:

command="sha1sum "$1
command | getline hash

Unfortunately, the command undergoes shell expansion before being piped to getline . This is problematic for filenames that have spaces or other special characters in them. How could I accomplish the task in a way that allows for filenames with arbitrary characters?

Edit: Some example filenames might include foo(2).txt or x&y.mp3

I will also include the entire program here, since it isn't too long. The purpose is to take a list of filenames from a text file and search for duplicate files.

#take a list of filenames and compute sha1sums to look for duplicates
BEGIN {storage[0]=0}
{
    command="sha1sum "$1
    command | getline hash
    split(hash, line)
    #storage array has the sha1sum hash as a key and the filename as a value
    #check each hash in storage, and report the duplicate if the current
    #sum matches any encountered before
    hash_exists=0
    for (x in storage) {
        if (x == line[1]) {
            hash_exists=1
            print("Duplicate found: " line[2])
        }
    }

    if (hash_exists == 0) {
        storage[line[1]]=line[2]
    }

    close(command)
}
$ ll file\ with\ spaces
-rw-rw-r-- 1 foo foo 0 Mar  5 16:49 file with spaces

$ echo "file with spaces" | awk -F: '{
    command="sha1sum \"" $1 "\"";
    command | getline line
    print line
}'
da39a3ee5e6b4b0d3255bfef95601890afd80709  file with spaces

Prefix the sha1sum with set -f;

Example

$ touch f\*
$ nawk 'BEGIN {
  command="set -f;sha1sum f*"
  command | getline hash
  print hash
}'
da39a3ee5e6b4b0d3255bfef95601890afd80709  f*

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM