简体   繁体   中英

Append output of a command to each line in large file

I need to add a random guid to each line in a large text file. I need that guid to be different for each line.

This works except that the guid is the same for every line:

sed -e "s/$/$(uuidgen -r)/" text1.log > text2.log

Here is a way to do it using awk :

awk -v cmd='uuidgen' 'NF{cmd | getline u; print $0, u > "test2.log"; close(cmd)}' test1.log
  • Condition NF (or NF > 0 ) ensures we do it only for non-empty lines.
  • Since we are calling close(cmd) each time there will be a new call to uuidgen for every record.

However since uuidgen is called for every non-empty line, it might be slow for huge files.

That's because the command substitution will get evaluated before the commands gets started.

The shell will first execute uuidgen -r , and replace the command substitution be it's result, let's say 0e4e5a48-82d1-43ea-94b6-c5de7573bdf8 . The shell will then execute sed like this:

sed -e "s/$/0e4e5a48-82d1-43ea-94b6-c5de7573bdf8/" text1.log > text2.log

You can use a while loop in the shell to achieve your goal:

while read -r line ; do
    echo "$line $(uuidgen -r)"
done < file > file_out

Rather than run a whole new uuidgen process for each and every line, I generated a new UUID for each line in Perl which is just a function call:

#!/usr/bin/perl
use strict;
use warnings;
use UUID::Tiny ':std';

my $filename = 'data.txt';
open(my $fh,'<',$filename)
  or die "Could not open file '$filename' $!";

while (my $row = <$fh>) {
  chomp $row;
  my $uuid = create_uuid(UUID_V4);
  my $str  = uuid_to_string($uuid);
  print "$row $str\n";
}

To test, I generated a 1,000,000 line CSV as shown here .

It takes 10 seconds to add the UUID to the end of each line of the 1,000,000 record file on my iMac.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM