How to extract unique fields from a CSV file using a Perl script

Question

I have a CSV file with data that looks similar to this:

alpha,a,foo,bar
alpha,b,foo,bar
alpha,c,foo,bar
beta,d,foo,bar
beta,e,foo,bar

I'm able to use the following code to successfully create two new files using the data:

open (my $FH, '<', '/home/<username>/inputs.csv') || die "ERROR Cannot read file\n";
while (my $line = <$FH>) {
    chomp $line;

    my @fields = split "," , $line;
    my $file = "ziggy.$fields[0]";
    open (my $FH2, '>>', $file) || die "ERROR Cannot open file\n";
    print $FH2 "$fields[1]\n";
    print $FH2 "$fields[2]\n";
    print $FH2 "$fields[3]\n\n";
    close $FH2;
}

Basically, this code reads through the rows in the CSV file and creates content in files that are named based on the first field. So, the "ziggy.alpha" file has nine lines of content, while the "ziggy.beta" file has six lines of content. Note that I'm appending data to these files as the rows are being read via the "while" loop.

My challenge:

Following the data set example cited, I need to create a second pair of files that use the same "first field" naming convention (something like "zaggy.alpha" and "zaggy.beta"). The files will only be created once with static content written to them, and will not have additional data appended to them from the CSV file.

My question:

Is there a way to identify the unique values in the first field ("alpha" and "beta"), store them in a hash, then reference them in a "while" loop in order to create my second set of files while the inputs.csv file is open?

Thanks in advance for any insight that can be provided!

Answer 1

In perl you can a get a list of keys from an associative array like:

my @keys = keys %hash;

So something like this will work;

my %unique_first_values;

Then later in the loop.

$my_unique_first_values{$fields[0]} = 1;

You can then call 'keys' on the hash to get the unique values.

@unique = keys %my_unique_virst_values;

Answer 2

In order to "create my second set of files while the inputs.csv file is open" you're going to want to know if you've seen a value before.

The conventional way to do this in Perl is to create a hash to store previously-seen values, and check-then-set in order to determine whether you've seen it, record that it has been seen, and go on.

if (exists($seen_before{$key})) {
    # seen it
} 
else {
    # new key!
    $seen_before{$key} = 1;
}

Given that you're going to be opening files and appending data, it might make sense to store a file handle in the hash instead of a 1 . That way, your # new key! code could just be opening the file, and your # seen it code could be a default condition (fall-through) writing the fields out. Something like this:

unless (exists($file_handle{$key})) {
    $file_handle{$key} = open ... or die ...
}

# now we know it's in the hash, write the data:
print $file_handle{$key} ...

How to extract unique fields from a CSV file using a Perl script

Question

2 answers

solution1
1 ACCPTED 2017-02-27 17:09:19

solution2
0 2017-02-27 17:18:41

How to extract unique fields from a CSV file using a Perl script

Question

2 answers

solution1 1 ACCPTED 2017-02-27 17:09:19

solution2 0 2017-02-27 17:18:41

solution1
1 ACCPTED 2017-02-27 17:09:19

solution2
0 2017-02-27 17:18:41