I have an awk script in which I need to compute hashes with some filenames that appear in the first field of the file I am processing. I am currently using:
command="sha1sum "$1
command | getline hash
Unfortunately, the command undergoes shell expansion before being piped to getline
. This is problematic for filenames that have spaces or other special characters in them. How could I accomplish the task in a way that allows for filenames with arbitrary characters?
Edit: Some example filenames might include foo(2).txt
or x&y.mp3
I will also include the entire program here, since it isn't too long. The purpose is to take a list of filenames from a text file and search for duplicate files.
#take a list of filenames and compute sha1sums to look for duplicates
BEGIN {storage[0]=0}
{
command="sha1sum "$1
command | getline hash
split(hash, line)
#storage array has the sha1sum hash as a key and the filename as a value
#check each hash in storage, and report the duplicate if the current
#sum matches any encountered before
hash_exists=0
for (x in storage) {
if (x == line[1]) {
hash_exists=1
print("Duplicate found: " line[2])
}
}
if (hash_exists == 0) {
storage[line[1]]=line[2]
}
close(command)
}
$ ll file\ with\ spaces
-rw-rw-r-- 1 foo foo 0 Mar 5 16:49 file with spaces
$ echo "file with spaces" | awk -F: '{
command="sha1sum \"" $1 "\"";
command | getline line
print line
}'
da39a3ee5e6b4b0d3255bfef95601890afd80709 file with spaces
Prefix the sha1sum
with set -f;
$ touch f\*
$ nawk 'BEGIN {
command="set -f;sha1sum f*"
command | getline hash
print hash
}'
da39a3ee5e6b4b0d3255bfef95601890afd80709 f*
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.