简体   繁体   中英

String comparison

I have two file generated from IntexCalc and Intex API, and I want to compare their content. I do not want to compare line by line. Below example will give more detail about it.

File 1

LOSS_UNITS[\"GRPY\"]==CDR  
LOSS_USERCURVE_TYPE[\"GRPY\"]==PCT_MULTIPLY  
LOSS_USERCURVE_INDEX_OFFSET[\"GRPY\"]==BY_LOAN_AGE  
LOSS_RATE[\"GRPY\"]==100  
LOSS_NONPERF_ADV_PCT_P[\"GRPY\"]==0  
LOSS_NONPERF_ADV_PCT_I[\"GRPY\"]==0  
SEVERITY_USERCURVE_TYPE[\"GRPY\"]==NONE  

File 2

LOSS_USERCURVE_TYPE[\"GRPY\"]=PCT_MULTIPLY  
LOSS_NONPERF_ADV_PCT_P[\"GRPY\"]=0  
LOSS_UNITS[\"GRPY\"]=CDR  
LOSS_NONPERF_ADV_PCT_I[\"GRPY\"]=0  
SEVERITY_USERCURVE_TYPE[\"GRPY\"]=NONE  
LOSS_SEVERITY[\"GRPY\"]=31.73  
LOSS_USERCURVE_INDEX_OFFSET[\"GRPY\"]=BY_DEAL_AGE  
  1. I want to compare the LOSS_UNITS[\\"GRPY\\"] flag value from both files. In both files their value after =/== is the same regardless of their position in file, so this flag value is equal.

  2. The flag value of LOSS_USERCURVE_INDEX_OFFSET[\\"GRPY\\"] in File 1 is BY_LOAN_AGE and in File 2 is BY_DEAL_AGE , so this flag value is different.

  3. The flag LOSS_RATE[\\"GRPY\\"] is present only in File 1 so this is a difference

  4. The flag LOSS_SEVERITY[\\"GRPY\\"] is present only in File 2 so this is also a difference.

What is the best way or tool to compare this kind of file structure?

I suggest you make use of the Data::Diff module

It returns a reference to a hash containing a summary of the differences between the parameters. The keys are

  • same — elements that are the same in both cases
  • diff — elements that have a different value for a given key
  • uniq_a and uniq_b — elements that appear in only one structure or the other


use strict;
use warnings 'all';
use autodie;

use Data::Dump;
use Data::Diff 'Diff';

my %f1 = do {
    open my $fh, '<', 'file1.txt';
    map { s/\s+\z//; split /=+/, $_, 2 } <$fh>;
};

my %f2 = do {
    open my $fh, '<', 'file2.txt';
    map { s/\s+\z//; split /=+/, $_, 2 } <$fh>;
};

my $diff = Diff(\(%f1, %f2));
dd $diff;

output

{
  diff   => {
              "LOSS_USERCURVE_INDEX_OFFSET[\\\"GRPY\\\"]" => { diff_a => "BY_LOAN_AGE", diff_b => "BY_DEAL_AGE", type => "" },
            },
  same   => {
              "LOSS_NONPERF_ADV_PCT_I[\\\"GRPY\\\"]"  => { same => 0, type => "" },
              "LOSS_NONPERF_ADV_PCT_P[\\\"GRPY\\\"]"  => { same => 0, type => "" },
              "LOSS_UNITS[\\\"GRPY\\\"]"              => { same => "CDR", type => "" },
              "LOSS_USERCURVE_TYPE[\\\"GRPY\\\"]"     => { same => "PCT_MULTIPLY", type => "" },
              "SEVERITY_USERCURVE_TYPE[\\\"GRPY\\\"]" => { same => "NONE", type => "" },
            },
  type   => "HASH",
  uniq_a => { "LOSS_RATE[\\\"GRPY\\\"]" => 100 },
  uniq_b => { "LOSS_SEVERITY[\\\"GRPY\\\"]" => 31.73 },
}

An uninspired solution: put keys and values into two hashes and compare them.

sub f2h {
  my( $hr, $path ) = @_;    
  open FILE, $path or die "$path: couldn't open: $!";
  while( my $line = <FILE> ){
    $line =~ s/\s+$//;   # there are trailing spaces in your data
    my( $key, $val ) = split( /==?/, $line );
    $hr->{$key} = $val;
  }
  close FILE;
}

my %h1;
my %h2;
f2h( \%h1, "file1.dat" );
f2h( \%h2, "file2.dat" );
while( my( $k, $v ) = each %h1 ){
  if( exists( $h2{$k} ) ){
    print "different $k\n" if $h2{$k} ne $v;
  } else {
    print "$k missing in 2\n";
  }
}
while( my( $k, $v ) = each %h2 ){
  print "$k missing in 1\n" unless exists $h1{$k};
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM