[英]Perl - sorting complex data structure
我有一個舊的 perl 項目,一個事件日志的文本解析器,並收到了按事件 ID 對輸出進行排序和刪除重復事件的請求。 因此解析器讀取一個文本文件並將每個事件放入一個數組中。 數組上的每個字段都包含一個帶有多個鍵 -> 值對的散列。 一個鍵稱為序列,它包含事件的編號。 我現在想根據每個數組字段的序列值對數組進行排序。 其次,我想從數組中刪除重復的相同序列號。
下面是我如何創建數組和散列的一些代碼,以便您了解數據結構:
open (my $mel, "<", $in_filename) or die "\nFile '$in_filename' does not exist or is not readable.\n";
my $i=0;
my $eventcount = 0;
while (<$mel>) {
# Separate events by "Date/Time" :
if (/^$/) {
next;
}
if (/^Date\/Time:\s(.*)$/) {
if ($eventcount >0) {
$i++;
}
$eventcount++; # eventcount initialized with ‘0’
}
# Gathering information of the MEL event :
if (/^Date\/Time:\s(.*)$/) {$MEL[$i]{date} = $1; next;}
if (/^Sequence number:\s(\d+)$/) {$MEL[$i]{sequence} = $1; next;}
if (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {$MEL[$i]{type} = lc $1; next;}
if (/^Event category:\s(\w+)$/) {$MEL[$i]{category} = $1; next;}
if (/^Priority:\s(\w+)/) {$MEL[$i]{priority} = $1; next;}
if (/^Description:\s(.*)$/) {$MEL[$i]{description} = $1; next;}
if (/^Event specific codes:\s(.*)$/) {$MEL[$i]{code} = $1; next;}
if (/^Component location:\s(.*)$/) {$MEL[$i]{location} = $1; next;}
if (/^Logged by:\s.*(.)$/) {$MEL[$i]{logged_by} = $1; next;}
if (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {$MEL[$i]{version} = hex $1;}
}
文本文件中的事件示例:
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200 <==============
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
所以基本上我想根據散列中鍵的值對包含散列引用的數組進行排序。
其次,當鍵的值也存在於不同的數組字段中時,我想從數組中刪除一個字段。
我希望有人明白我需要什么:-)
這可能嗎?
您可以使用自定義排序塊對數組進行排序:
my @sorted = sort { $a->{sequence} <=> $b->{sequence} } @MEL;
但是使用散列的散列而不是散列的數組要容易得多。
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $in_filename = ... ;
open my $mel, '<', $in_filename or die $!;
my %event;
my ($current, $id);
while (<$mel>) {
next if /^$/;
if (m{^Date/Time:\s(.*)$}) {
if (defined $id) {
$event{$id} = $current;
}
$current = { date => $1 };
} elsif (/^Sequence number:\s(\d+)$/) {
$id = $1;
} elsif (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {
$current->{type} = lc $1;
} elsif (/^Event category:\s(\w+)$/) {
$current->{category} = $1;
} elsif (/^Priority:\s(\w+)/) {
$current->{priority} = $1;
} elsif (/^Description:\s(.*)$/) {
$current->{description} = $1;
} elsif (/^Event specific codes:\s(.*)$/) {
$current->{code} = $1;
} elsif (/^Component location:\s(.*)$/) {
$current->{location} = $1;
} elsif (/^Logged by:\s.*(.)$/) {
$current->{logged_by} = $1;
} elsif (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {
$current->{version} = hex $1;
}
}
for my $e (sort { $a <=> $b } keys %event) {
say 'Sequence number:', $e;
for my $k (sort keys %{ $event{$e} }) {
say "$k: $event{$e}{$k}";
}
}
通過構建一個大型正則表達式來匹配大部分細節,可以進一步簡化它:
my $regex = qr/
Event\ type:\s(?<type>[0-9|a-f|A-F]{1,6})$
|Event\ category:\s(?<category>\w+)$
|Priority:\s(?<priority>\w+)
|Description:\s(?<description>.*)$
|Event\ specific\ codes:\s(?<code>.*)$
|Component\ location:\s(?<location>.*)$
|Logged\ by:\s.*(?<logged>.)$
|4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
/x;
while (<$mel>) {
next if /^$/;
if (m{^Date/Time:\s(.*)$}) {
if (defined $id) {
$current->{type} = lc $current->{type}
if exists $current->{type};
$current->{version} = hex $current->{version}
if exists $current->{version};
$event{$id} = $current;
}
$current = { date => $1 };
} elsif (/^Sequence number:\s(\d+)$/) {
$id = $1;
} elsif (/^$regex/) {
$current->{ (keys %+)[0] } = (values %+)[0];
} else {
warn "Skipping: $_";
}
}
問題描述不完整。 不清楚這些記錄是否同質(所有類型都相同)。
好吧,如果上面的假設是正確的,那么任務就很簡單了。
將文件拆分成記錄,然后用事件號作為鍵填充哈希,並記錄為值,跳過重復項。
然后對鍵哈希進行排序並輸出記錄。
use strict;
use warnings;
use feature 'say';
my %events;
my %seen;
my $data = do { local $/; <DATA> };
$data =~ s!\n(Date/Time)!\n\n$1!g;
my @data = split '\n\n', $data;
for my $record (@data) {
my $event = get_event_n( $record );
next if $seen{$event};
$seen{$event} = 1;
$events{$event} = $record;
}
say '----- Sorted Events -----';
for my $event (sort keys %events) {
say $events{$event};
say '-' x 45; # record separator as visual indicator
}
sub get_event_n {
my $record = shift;
my $sequence;
$record =~ /Sequence number:\s+(\d+)/;
$sequence = $1;
return $sequence;
}
__DATA__
Date/Time: 2/3/20, 12:19:20 PM
Sequence number: 230
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:28 PM
Sequence number: 209
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 92, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
我的回答基於@choroba 的好正則表達式,但我認為這個更簡單:
my $key = 'sequence'; #or other fields
my $keep = 'first'; #or 'last' record with identical $key
my $regex = qr{
Date/Time: \s* (?<date>.*)
|Sequence\ number: \s* (?<sequence>\d+)
|Event\ type: \s* (?<type>[0-9|a-f|A-F]{1,6})
|Event\ category: \s* (?<category>\w+)
|Priority: \s* (?<priority>\w+)
|Description: \s* (?<description>.*)
|Event\ specific\ codes: \s* (?<code>.*)
|Component\ location: \s* (?<location>.*)
|Logged\ by: \s* (?<logged_by>.*)
|4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
}x;
my @event=();
while (<>) {
m{^Date/Time:} and push @event, {};
m{^$regex} and @{$event[-1]}{keys %+} = values %+;
}
#special treatment for type and version: hex and lc
exists $$_{type} and $$_{type} = hex $$_{type} for @event;
exists $$_{version} and $$_{version} = lc $$_{version} for @event;
#mark for deletion
my %exists; $exists{$$_{$key}}++ and $$_{delete}=1
for $keep eq 'first' ? @event
: $keep eq 'last' ? reverse(@event)
: die "keep must be first or last";
#delete those marked
@event = grep !$$_{delete}, @event;
#sort by $key
@event = sort { $$a{$key} <=> $$b{$key} } @event;
我猜那個類型應該是hex
,版本應該是lc
ed,而不是像問題中的相反。
像這樣運行:
perl script.pl input_file
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.