[英]How to merge columns from two different files in perl
I have written following perl code to read a text file (a1.txt) and average the time stamp. 我编写了以下perl代码来读取文本文件(a1.txt)并平均时间戳。 I want to read two files simultaneously (a1.txt and a2.txt) and combine all columns from both files.
我想同时读取两个文件(a1.txt和a2.txt)并合并两个文件中的所有列。
The code below can only read one file at a time. 下面的代码一次只能读取一个文件。 Please help me to modify my below Perl code and give output in following format.
请帮我修改我的下面的Perl代码并以下列格式提供输出。
a1.txt
: a1.txt
:
PERFORMANCE TESTING
-------------------------------------------------------------------
PERF_SMK_OCUS_50 Version P-20-17
-------------------------------------------------------------------
300_wireframe_view_redraws_(GR) 00:01:56
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:51
3_hidden_view_redraws_(GR) 00:01:35
6_Fast_HLR_activations_(CP) 00:01:10
120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:42
2_shaded_mouse_spins_(GR) 00:00:21
270_shaded_view_redraws_(GR) 00:01:39
-------------------------------------------------------------------
****************************************************
****************************************************
-------------------------------------------------------------------
PERF_SMK_OCUS_50 Version P-20-17
-------------------------------------------------------------------
300_wireframe_view_redraws_(GR) 00:01:56
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:51
3_hidden_view_redraws_(GR) 00:01:35
6_Fast_HLR_activations_(CP) 00:01:09
120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:42
2_shaded_mouse_spins_(GR) 00:00:20
270_shaded_view_redraws_(GR) 00:01:39
-------------------------------------------------------------------
****************************************************
****************************************************
-------------------------------------------------------------------
PERF_SMK_OCUS_50 Version P-20-17
-------------------------------------------------------------------
300_wireframe_view_redraws_(GR) 00:01:55
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50
3_hidden_view_redraws_(GR) 00:01:34
6_Fast_HLR_activations_(CP) 00:01:09
120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:40
2_shaded_mouse_spins_(GR) 00:00:21
270_shaded_view_redraws_(GR) 00:01:35
-------------------------------------------------------------------
****************************************************
****************************************************
a2.txt
: a2.txt
:
PERFORMANCE TESTING
-------------------------------------------------------------------
PERF_SMK_OCUS_50 Version P-20-17
-------------------------------------------------------------------
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50
3_hidden_view_redraws_(GR) 00:01:37
6_Fast_HLR_activations_(CP) 00:01:12
120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:43
2_shaded_mouse_spins_(GR) 00:00:21
270_shaded_view_redraws_(GR) 00:01:35
240_realtime_rendered_redraws_(GR)_1 00:13:16
-------------------------------------------------------------------
****************************************************
****************************************************
-------------------------------------------------------------------
PERF_SMK_OCUS_50 Version P-20-17
-------------------------------------------------------------------
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50
3_hidden_view_redraws_(GR) 00:01:37
6_Fast_HLR_activations_(CP) 00:01:12
120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:42
2_shaded_mouse_spins_(GR) 00:00:20
270_shaded_view_redraws_(GR) 00:01:40
240_realtime_rendered_redraws_(GR)_1 00:13:14
-------------------------------------------------------------------
****************************************************
****************************************************
-------------------------------------------------------------------
PERF_SMK_OCUS_50 Version P-20-17
-------------------------------------------------------------------
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50
3_hidden_view_redraws_(GR) 00:01:37
6_Fast_HLR_activations_(CP) 00:01:12
120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:44
2_shaded_mouse_spins_(GR) 00:00:20
270_shaded_view_redraws_(GR) 00:01:40
240_realtime_rendered_redraws_(GR)_1 00:13:24
-------------------------------------------------------------------
****************************************************
****************************************************
Desired output: 期望的输出:
> Test Cases a1.txt timestamp (hh:mm:ss) a2.txt(hh:mm:ss) delta (a1 -a2)(hh:mm:ss)
>----------------------------------------------------------------------------------------------------------------
>240_realtime_rendered_redraws_(GR)_1 N/A 00:13:18 N/A
> 3_hidden_view_redraws_(GR) 00:01:34 00:01:37 -00:00:03
> 270_shaded_view_redraws_(GR) 00:01:37 00:01:38 -00:00:01
> 120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:41 00:00:43 -00:00:02
> 300_wireframe_view_redraws_(GR) 00:01:55 N/A N/A
> 2_shaded_mouse_spins_(GR) 00:00:20 00:00:20 00:00:00
> 6_Fast_HLR_activations_(CP) 00:01:09 00:01:12 -00:00:03
> 80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50 00:00:50 00:00:00
My code: 我的代码:
my %retrieve;
my $count = 0;
my $file1 = 'a1.txt';
open (R, $file1) or die ("Could not open $file1!");
while (<R>) {
next unless /^*Retrieve_generic_/ ||
/^*Retrieve_assembly_1_/ ||
/^*Retrieve_assembly_2_/ ||
/^*300_wireframe_view_/ ||
/^*80_wireframe_view_/ ||
/^*3_hidden_view_/ ||
/^*Fast_HLR_/ ||
/^*120_hidden_view_/ ||
/^*shaded_view_/ ||
/^*shaded_mouse_/ ||
/^*realtime_rendered_/;
$count++;
my ( $retrieve, $time ) = split;
my ( $h, $m, $s ) = split ':', $time;
$retrieve{$retrieve} += $h * 3600 + $m * 60 + $s;
}
close(R);
for my $retrieve ( keys %retrieve ) {
my $hms = secondsToHMS($retrieve{$retrieve} / ( 3));
print "$retrieve\t$hms\n" if defined $hms;
}
# For seconds < 86400, else undef returned
sub secondsToHMS {
my $seconds = $_[0];
return undef if $seconds >= 86400;
my $h = int $seconds / 3600;
my $m = int( $seconds - $h * 3600 ) / 60;
my $s = $seconds % 60;
return sprintf( '%02d:%02d:%02d', $h, $m, $s );
}
Here's how I'd go about doing that. 这就是我要做的事情。
#!/usr/bin/perl -Tw
use strict;
use warnings;
use English qw( -no_match_vars $OS_ERROR );
die 'expecting two filenames as arguments'
if @ARGV != 2;
my @ids;
my %time_for;
for my $filename (@ARGV) {
my $id;
if ( $filename =~ m{\A ( .+? / )?( [^/.]+? )( [.] \w+ ) \z}xms ) {
my $path = $1 || "";
my $name = $2;
my $ext = $3 || "";
$id = $name;
$filename = "$path$name$ext";
push @ids, $id;
}
die "cant parse file ID from $filename"
if !$id;
die "cant find $filename"
if !stat $filename;
open my $fh, '<', "$filename"
or die "open $filename: $OS_ERROR";
while ( my $line = <$fh> ) {
if ( $line =~ m{\A ( \w+ \( \w+ \) \w* ) \s+ ( \d+:\d+:\d+ ) }xms ) {
my ( $subject, $hms ) = ( $1, $2 );
my $seconds = hms_to_sec( $hms );
$time_for{$subject}->{$id} ||= $seconds;
$time_for{$subject}->{$id}
= ( $seconds + $time_for{$subject}->{$id} ) / 2;
}
}
close $fh,
or die "close $filename: $OS_ERROR";
}
print <<"HEAD";
> Test Cases $ids[0] timestamp (hh:mm:ss) $ids[1] (hh:mm:ss) delta ($ids[0]-$ids[1])(hh:mm:ss)
> ------------------------------------------------------------------------------------------------------------------------------
HEAD
for my $subject (sort keys %time_for) {
my ( $a1, $a2 ) = @{ $time_for{$subject} }{@ids};
my $delta = defined $a1 && defined $a2 ? $a1 - $a2 : undef;
printf "> % -46s % -32s % -21s %s\n\n",
$subject,
sec_to_hms( $a1 ),
sec_to_hms( $a2 ),
sec_to_hms( $delta );
}
sub hms_to_sec {
my ( $h, $m, $s ) = map { int $_ } map { $_ ? $_ : 0 } split /:/, $_[0];
return $h * 3_600 + $m * 60 + $s;
}
sub sec_to_hms {
my ( $s ) = @_;
return 'N/A'
if !defined $s || $s > 86_400;
my $sign = ' ';
if ( $s < 0 ) {
$sign = '-';
$s *= -1;
}
my $h = int $s / 3_600;
my $m = int ( $s - $h * 3_600 ) / 60;
return sprintf '%s%02d:%02d:%02d', $sign, $h, $m, $s % 60;
}
The output comes out like this. 输出就像这样。
> Test Cases a1.txt timestamp (hh:mm:ss) a2.txt(hh:mm:ss) delta (a1 -a2)(hh:mm:ss)
> ------------------------------------------------------------------------------------------------------------------------------
> 120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:41 00:00:43 -00:00:02
> 240_realtime_rendered_redraws_(GR)_1 N/A 00:13:19 -00:13:19
> 270_shaded_view_redraws_(GR) 00:01:37 00:01:38 -00:00:01
> 2_shaded_mouse_spins_(GR) 00:00:20 00:00:20 00:00:00
> 300_wireframe_view_redraws_(GR) 00:01:55 N/A 00:01:55
> 3_hidden_view_redraws_(GR) 00:01:34 00:01:37 -00:00:02
> 6_Fast_HLR_activations_(CP) 00:01:09 00:01:12 -00:00:02
> 80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50 00:00:50 00:00:00
The filenames are assumed to use / as path separator. 假定文件名使用/作为路径分隔符。 (A proper portable implementation might be a topic for another question.)
(适当的可移植实现可能是另一个问题的主题。)
You can call this like: 您可以这样称呼:
./merge_columns.pl /some/path/a1.txt /another/path/a2.txt
I hope that's helpful. 我希望这很有帮助。
Try this... 尝试这个...
#!/usr/bin/perl -w
use strict;
sub t2i {
my @v=split(":",$_[0]);
return $v[0]*3600+$v[1]*60+$v[2];
};
sub i2t {
return sprintf "%02d:%02d:%02d", $_[0]/3600,$_[0]/60%60,$_[0]%60;
};
my %hash;
foreach my $file (qw|a1 a2|) {
open my $fh,"<".$file.".txt" or die;
while (<$fh>) {
$hash{$1}{$file}=t2i($2) if
/^(\d+_\S+_\S+_\S+)\s(\d+:\d+:\d+)/;
};
close $fh;
};
map {
printf "%-50s %s %s %s\n", $_,
i2t($hash{$_}{'a1'}), i2t($hash{$_}{'a1'}),
i2t($hash{$_}{'a1'} - $hash{$_}{'a2'}) if
defined($hash{$_}{'a1'}) && defined($hash{$_}{'a2'});
} keys %hash;
That give: 这给了:
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50 00:00:50 00:00:00
2_shaded_mouse_spins_(GR) 00:00:21 00:00:21 00:00:01
270_shaded_view_redraws_(GR) 00:01:35 00:01:35 00:00:55
3_hidden_view_redraws_(GR) 00:01:34 00:01:34 00:00:57
120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:40 00:00:40 00:00:56
6_Fast_HLR_activations_(CP) 00:01:09 00:01:09 00:00:57
Or sorted and better parted: 或排序和更好的分手:
#!/usr/bin/perl -w
use strict;
my %joinHash;
my %files=('a'=>'a1.txt','b'=>'a2.txt');
sub readFile {
open my $fh,"<".$files{$_[0]} or die;
while (my $line=<$fh>) {
$joinHash{$1}{$_[0]}=timeToInteger($2) if
$line =~ /^(\d+_\S+_\S+_\S+)\s(\d+:\d+:\d+)/;
};
close $fh;
};
sub timeToInteger {
my ($hour,$mins,$secs)=split(":",$_[0]);
return $hour*3600+$mins*60+$secs;
};
sub integerToTime {
return sprintf "%02d:%02d:%02d", $_[0]/3600,$_[0]/60%60,$_[0]%60;
};
foreach my $fileKey (keys %files) { readFile $fileKey };
map {
my ($aVal,$bVal)=(0,0);
$aVal=$joinHash{$_}{'a'} if defined $joinHash{$_}{'a'};
$bVal=$joinHash{$_}{'b'} if defined $joinHash{$_}{'b'};
printf "%-50s %s %s %s\n", $_,
integerToTime($aVal), integerToTime($bVal),
integerToTime($aVal-$bVal);
} sort {
(my $x=$a)=~s/_.*$//g;
(my $y=$b)=~s/_.*$//g;
$x<=>$y
} keys %joinHash;
Give numeric sorted output (null filled empty values) 给出数字排序输出(空填充空值)
2_shaded_mouse_spins_(GR) 00:00:21 00:00:20 00:00:01
3_hidden_view_redraws_(GR) 00:01:34 00:01:37 00:00:57
6_Fast_HLR_activations_(CP) 00:01:09 00:01:12 00:00:57
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50 00:00:50 00:00:00
120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:40 00:00:44 00:00:56
240_realtime_rendered_redraws_(GR)_1 00:00:00 00:13:24 00:47:36
270_shaded_view_redraws_(GR) 00:01:35 00:01:40 00:00:55
300_wireframe_view_redraws_(GR) 00:01:55 00:00:00 00:01:55
Edit 3 Full useable tool! 编辑3完全可用的工具!
There is now a tool that could be run with files as argument and some switchs for sort control 现在有一个工具可以用文件作为参数运行,一些开关用于排序控制
#!/usr/bin/perl -w
# Demo of parsing via hash variable
# using Getopt and different sort methods
# (C) 2012 F-Hauri.ch - Use, copy , distribute or modify via License LGPL V3.
use strict;
use Getopt::Std;
my $formatString="> %-45s%-20s%-20s%s\n";
my @files=qw|a1.txt a2.txt|;
my %opt;
my %joinHash;
sub usage {
print <<eousage ;
Usage: $0 [-a|-b|-r|-c|-n] [file1] [file2]
-a Sort by file A times
-b Sort by file B times
-r Sort by result times
-c Sort alphabeticaly by case name
-C Sort alphabeticaly by case name (Case insensitive)
-n Sort numericaly by case num (default)
-R Reverse sort order
file1 and file2 are by default: '$files[0]' and '$files[1]'.
eousage
exit 0;
}
sub mydie {
printf STDERR "Error: %s\n",$_[0];
usage();
}
sub readFile {
open my $fh,"<".$files[$_[0]] or mydie "Can't open '$files[$_[0]]'.";
while (my $line=<$fh>) {
$joinHash{$1}[$_[0]]=timeToInt($2) if
$line =~ /^(\d+_\S+_\S+_\S+)\s(\d+:\d+:\d+)/;
};
close $fh;
};
sub timeToInt {
my ($hour,$mins,$secs)=split(":",$_[0]);
return $hour*3600+$mins*60+$secs;
};
sub intToTime {
my $sign=' ';
$sign='-' if $_[0] < 0;
return sprintf "%s%02d:%02d:%02d", $sign, $_[0]/3600,$_[0]/60%60,$_[0]%60;
};
sub getJoined {
# $_0 = caseName, $_1 = filenr ( 0,1 ) or result (2), $_2 = flag: toNumber
my $asNumber=$_[2];
my $default=do{$asNumber ? 9e9 : ' N/A' };
return map { getJoined($_[0],$_,$asNumber) } (0..2) unless defined $_[1];
my $index =$_[1];
my @crtLine=@{$joinHash{$_[0]}};
return do { defined $crtLine[$index] ?
do { $asNumber ?
$crtLine[$index] : intToTime($crtLine[$index] ) }
: $default } if $index lt 2;
return $default unless defined($crtLine[0]) && defined($crtLine[1]);
return do { $asNumber ? $crtLine[0] - $crtLine[1] :
intToTime($crtLine[0] - $crtLine[1]) };
}
sub sortByOpt {
my ($x,$y)=@_;
if ($opt{'c'} || $opt{'C'}) { # sort by Case name
$x =~ s/^\d+_//g; $y =~ s/^\d+_//g;
if ($opt{'C'}) {
$x=~tr|a-z|A-Z|;
$y=~tr|a-z|A-Z|;
};
($y,$x)=($x,$y) if $opt{'R'};
return $x cmp $y;
} elsif ($opt{'a'}||$opt{'b'}||$opt{'r'}) { # sort by times
my $abr=0; # default to `a`
$abr=1 if $opt{'b'};
$abr=2 if $opt{'r'};
$x = getJoined($x,$abr,1);
$y = getJoined($y,$abr,1);
} else { # sort numericaly by case number
$x =~ s/_.*$//g; $y =~ s/_.*$//g;
};
($y,$x)=($x,$y) if $opt{'R'};
return $x<=>$y;
}
getopts('abCchnRr',\%opt) or mydie 'Unknow option.';
usage if ($opt{'h'});
foreach my $fileKey (0..1) {
if (defined($ARGV[$fileKey])) {
mydie 'Arg "'.$ARGV[$fileKey].'" is not a file.' unless
-f $ARGV[$fileKey];
$files[$fileKey]=$ARGV[$fileKey];
};
readFile $fileKey
};
my @fileNames=map {s/.txt$//;$_} @files;
my $headLine=sprintf $formatString, 'Test Cases',
map {' '.$_.'(hh:mm:ss)'} @fileNames, 'delta ('.join("-",@fileNames).')';
print $headLine.('-' x ( length($headLine) - 1) )."\n";
map {
printf $formatString, $_, getJoined($_);
} sort { sortByOpt($a,$b) } keys %joinHash;
Where: 哪里:
Usage: ./mycode.pl [-a|-b|-r|-c|-n] [file1] [file2]
-a Sort by file A times
-b Sort by file B times
-r Sort by result times
-c Sort alphabeticaly by case name
-C Sort alphabeticaly by case name (Case insensitive)
-n Sort numericaly by case num (default)
-R Reverse sort order
file1 and file2 are by default: 'a1.txt' and 'a2.txt'.
so: 所以:
./mycode.pl -RC d1.txt d2.txt
> Test Cases d1(hh:mm:ss) d2(hh:mm:ss) delta (d1-d2)(hh:mm:ss)
---------------------------------------------------------------------------------------------------------------
> 80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50 00:00:50 00:00:00
> 300_wireframe_view_redraws_(GR) N/A 00:01:55 N/A
> 270_shaded_view_redraws_(GR) 00:01:40 00:01:35 00:00:05
> 2_shaded_mouse_spins_(GR) 00:00:20 00:00:21 -00:00:59
> 240_realtime_rendered_redraws_(GR)_1 00:13:24 N/A N/A
> 6_Last_HLR_activations_(CP) 00:01:12 00:01:09 00:00:03
> 120_hidden_view_redraws_with_Last_HLR_(GR) 00:00:44 00:00:40 00:00:04
> 3_hidden_view_redraws_(GR) 00:01:37 00:01:34 00:00:03
Nota: I've copied a1.txt
to d2.txt
and a2.txt
to d1.txt
and modified (with sed) s/Fast/Last/
for having a first upper char later in alphabet than first lower... 诺塔:我已经复制
a1.txt
到d2.txt
和a2.txt
到d1.txt
和修改(与SED) s/Fast/Last/
在字母后有一个第一上焦炭比第一下...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.