简体   繁体   中英

Best way to compare two hash of hashes?

Right now I have two hash of hashes, 1 that I created by parsing a log file, and 1 that I grab from SQL. I need to compare them to find out if the record from the log file exists in the database already. Right now I am iterating through each element to compare them:

foreach my $i(@record)
{
    foreach my $a(@{$data})
    {
        if ($i->{port} eq $a->{port} and $i->{name} eq $a->{name})
        {
            print "match found $i->{name}, updating record in table\n";
        }
        else
        {
            print "no match found for $tableDate $i->{port} $i->{owner} $i->{name} adding record to table\n";
            executeStatement("INSERT INTO client_usage (date, port, owner, name, emailed) VALUES (\'$tableDate\', \'$i->{port}\', \'$i->{owner}\', \'$i->{name}\', '0')");

        }
    }

}

Naturally, this takes a long time to run through as the database gets bigger. Is there a more efficient way of doing this? Can I compare the keys directly?

You have more than a hash of hashes. You have two lists and each element in each list contains a hash of hashes. Thus, you have to compare each item in the list with each item in the other list. Your algorithm is efficiency is O 2 -- not because it's a hash of hashes, but because you're comparing each row in one list with each row in another list.

Is it possible to go through your lists and turn them into a hash that is keyed by the port and name? That way, you go through each list once to create the indexing hash, then go through the hash once to do the comparison.

For example, to create the hash from the record:

my %record_hash;
foreach my $record_item (@record) {
   my $name = $record_item->{name};
   my $data = $record_item->{data}
   my $record_hash{$name:$data} = \$record_item  #Or something like this...
}

Next, you'd do the same for your data list:

my %data_hash;
foreach my $data_item (@{$data}) {
   my $name = $data_item->{name};
   my $data = $data_item->{data}
   my $data_hash{$name:$data} = \$data_item  #Or something like this...
}

Now you can go through your newly created hash just once:

foreach my $key (keys %record_hash) {
   if (exists $data_hash{$key}) {
       print "match found $i->{name}, updating record in table\n";
   }
   else {
      print "no match found for $tableDate $i->{port} $i->{owner} $i->{name} adding record to table\n";
      executeStatement("INSERT INTO client_usage (date, port, owner, name, emailed) VALUES (\'$tableDate\', \'$i->{port}\', \'$i->{owner}\', \'$i->{name}\', '0')");

   }
}

Let's say you have 1000 elements in one list, and 500 elements in the other. Your original algorithm would have to loop 500 * 1000 times (half a million times). By creating an index hash, you have to loop through 2(500 + 1000) times (about 3000 times).

Another possibility: Since you're already using a SQL database, why not do the whole thing as a SQL query. That is, don't fetch the records. Instead, go through your data, and for each data item, fetch the record. If the record exists, you update it. If not, you create a new one. That maybe even faster because you're not turning the whole thing into a list in order to turn it into a hash.

There's a way to tie SQL databases directly to hashes. That might be a good way to go too.

Are you using Perl-DBI ?

How about using Data::Difference:

use Data:Difference qw(data_diff);

my @diff = data_diff(\%hash_a, \%hash_b);

@diff = (
    { 'a' => 'value', 'path' => [ 'data' ] }, # exists in 'a' but not in 'b'
    { 'b' => 'value', 'path' => [ 'data' ] }, # exists in 'b' not in 'a'
);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM