I have two CSV files that I want to compare with Perl.
I have the code to get the files into Perl using Text::CSV::Slurp
and it gives me a nice array of hash references for the files.
Using Data::Dumper::Concise
shows all my data imports correctly.
use strict;
use warnings;
use Text::CSV::Slurp;
use Data::Dumper::Concise;
my $file1_src = "IPB-CSV.csv";
my $file2_src = "SRM-CSV.csv";
my $IPB = Text::CSV::Slurp->load(file => $file1_src);
my $SRM = Text::CSV::Slurp->load(file => $file2_src);
print Dumper($IPB);
print Dumper($SRM);
The results of the dump look something like this
$IPB
[
{
Drawing => "1001"
},
{
Drawing => "1002"
},
{
Drawing => "1003"
}
]
$SRM
[
{
Drawing => "1001",
Figure => "Figure 2-8",
Index => 2,
Nomenclature => "Some Part"
},
{
Drawing => "1002",
Figure => "Figure 2-8",
Index => 2,
Nomenclature => "Some Part"
},
{
Drawing => "2001",
Figure => "Figure 2-8",
Index => 2,
Nomenclature => "Some Part"
},
{
Drawing => "2002",
Figure => "Figure 2-8",
Index => 2,
Nomenclature => "Some Part"
}
]
I want to compare the two arrays based on each hash's Drawing
key, and create two CSV files as follows
One containing the items that are in $IPB
but not $SRM
, containing only the data in the `Drawing column.
Another where the item is in the $SRM
but not the $IPB
, containing all the fields that are related to the Drawing
column.
I have found lots of information to compare files to see if they match, or to compare hashes or arrays for single pieces of data, but I can't find something specific to what I need.
Since drawing is a criterion of sorts, why not "index" the data into something a little more convenient where the drawing index is the key and the corresponding data is a corresponding value?
my %ipb;
for my $record ( @$IPB ) {
my $index = $record->{Drawing};
push @{ $ipb{$index} }, $record;
}
my %srm;
for my $record ( @$SRM ) {
my $index = $record->{Drawing};
push @{ $srm{$index} }, $record;
}
Now it should be a breeze to figure out the indexes unique to $IPB
and $SRM
:
use List::MoreUtils 'uniq';
my @unique_ipb = uniq( grep { $ipb{$_} and not $srm{$_} } keys( %ipb ), keys( %srm ) );
my @unique_srm = uniq( grep { $srm{$_} and not $ipb{$_} } keys( %ipb ), keys( %srm ) );
What's common to both?
my @intersect = uniq( grep { $srm{$_} and $ipb{$_} } keys( %ipb ), keys( %srm ) );
What are all the figure number(s) for Drawing index 1002?
print $_->{Figure}, "\n" for @{ $ipb{1002} // [] }, @{ $srm{1002} // [] };
This short program uses your example values for $ipb
and $srm
and creates the output that I think you want. ( Please don't use capital letters for anything but global identifiers like package names.)
There are a couple of problems
Using Text::CSV::Slurp
leaves you with two arrays of hashes that are no use for this task without further indexing. You would be much better off creating appropriate data structures from scratch by processing the file line-by-line
You say that your second file must contain all of the information related to each Drawing
key, but, because Perl hashes are inherently unordered, Text::CSV::Slurp
has lost the order of the field names. The best that can be done is to print the data in whatever order it is found, but preceding it by a header line showing the field names. This is another reason for avoiding Text::CSV::Slurp
use strict;
use warnings;
use autodie;
# The original data
my $ipb = [{ Drawing => 1001 }, { Drawing => 1002 }, { Drawing => 1003 }];
my $srm = [
{
Drawing => "1001",
Figure => "Figure 2-8",
Index => 2,
Nomenclature => "Some Part"
},
{
Drawing => "1002",
Figure => "Figure 2-8",
Index => 2,
Nomenclature => "Some Part"
},
{
Drawing => "2001",
Figure => "Figure 2-8",
Index => 2,
Nomenclature => "Some Part"
},
{
Drawing => "2002",
Figure => "Figure 2-8",
Index => 2,
Nomenclature => "Some Part"
}
];
# Index the data
my %srm;
for my $item (@$srm) {
my $drawing = $item->{Drawing};
$srm{$drawing} = $item;
}
my %ipb;
for my $item (@$ipb) {
my $drawing = $item->{Drawing};
$ipb{$drawing} = 1;
}
# Create the output files
open my $csv1, '>', 'file1.csv';
for my $id (sort keys %ipb) {
next if $srm{$id};
print $csv1 $id, "\n";
}
close $csv1;
open my $csv2, '>', 'file2.csv';
my @keys = keys %{ $srm->[0] };
print $csv2 join(',', @keys), "\n";
for my $id (sort keys %srm) {
next if $ipb{$id};
print $csv2 join(',', @{$srm{$id}}{@keys}), "\n";
}
close $csv2;
output
file1.csv
1003
file2.csv
Drawing,Nomenclature,Index,Figure
2001,Some Part,2,Figure 2-8
2002,Some Part,2,Figure 2-8
This is a bit complicated, because your data structures are less than ideal for comparing. You have references to arrays of hash references, and you care about the data in one of the keys of the hashref. My first step would be to flatten IPB to an array (since there is no data under this), and convert SRM to a single hashref.
my @ipbarray = map { ${$_}{Drawing} } $IPB; # Creates an array from IPB.
my $srmhash = {};
for my $hash ($SRM) {
${$srmhash}{${$hash}{Drawing}} = $hash unless defined ${$srmhash}{${$hash}{Drawing}}; # Don't overwrite if it exists
}
Now we have 2 more workable data structures.
Next step is to contrast these values:
my @ipbonly = ();
my @srmonly = ();
for my $ipbitem (@ipbarray) {
push @ipbonly, ( Drawing => $ipbitem } unless defined ${$srmhash}{$ipbtem};
}
for my $srmitem (keys $srmhash) {
push @srmonly, ${$srmhash}{$srmitem} unless grep { $_ == $srmitem } @ipbarray;
}
At this point, @ipbonly and @srmonly will contain the data you want.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.