在Perl中，如何处理多行

Question

Say, I have a file that has the following lines with a "TIMESTAMP" "NAME": 说，我有一个文件，其中包含以下行，其中包含“TIMESTAMP”“NAME”：

10:00:00 Bob 10:00:00鲍勃
11:00:00 Tom 11:00:00汤姆
11:00:20 Fred 11:00:20弗雷德
11:00:40 George 11:00:40乔治
12:00:00 Bill 12:00:00比尔

I want to read this file, group the names that occur in each hour on a single line, then write the revised lines to a file, for example. 我想读取这个文件，将每小时出现的名称分组在一行上，然后将修改后的行写入文件中。

10:00:00 Bob 10:00:00鲍勃
11:00:00 Tom, Fred, George 11:00:00汤姆，弗雷德，乔治
12:00:00 Bill 12:00:00比尔

Answer 1

Read the file line by line in a block like this: 像这样在一个块中逐行读取文件：

while(<>) {
    # ... do something with the line in $_
    # specifically, collect the hour and name
    # ignoring malformed lines
    if (/(\d\d):\d\d:\d\d\s+(\w+)/) {
        my $hour = $1;
        my $name = $2;
    }
}

and build a hash with the first bit by inserting the following in the inner if block 并通过在内部if块中插入以下内容来构建第一位的哈希

$people{$hour} = $people{$hour} . ", " . $name

Finally, outside the loop, print the hash: 最后，在循环外部，打印哈希：

while ( my ($time, $names) = each(%people) ) {
    print $time . ":00:00 " . $names ."\n";
}

(This is untested, but this is the basic approach I would take.) （这是未经测试的，但这是我将采取的基本方法。）

Answer 2

In grouped_by_hour below, for each line from the filehandle, if it has a timestamp and a name, we push that name onto an array associated with the timestamp's hour, using sprintf to normalize the hour in case one timestamp is 03:04:05 and another is 3:9:18 . 在下面的grouped_by_hour中，对于文件句柄中的每一行，如果它有时间戳和名称，我们将该名称push送到与时间戳的小时相关联的数组上，使用sprintf将小时标准化，以防一个时间戳为03:04:05并且另一个是3:9:18 。

sub grouped_by_hour {
  my($fh) = @_;

  local $_;
  my %hour_names;

  while (<$fh>) {
    push @{ $hour_names{sprintf "%02d", $1} } => $2
      if /^(\d+):\d+:\d+\s+(.+?)\s*$/;
  }

  wantarray ? %hour_names : \%hour_names;
}

The normalized hours also allow us to sort with the default comparison. 标准化小时数也允许我们使用默认比较进行排序。 The code below places the input in the special DATA filehandle by having it after the __DATA__ token, but in real code, you might call grouped_by_hour $fh . 下面的代码通过在__DATA__标记之后将输入放在特殊的DATA文件句柄中，但在实际代码中，您可以调用grouped_by_hour $fh 。

my %hour_names = grouped_by_hour \*DATA;
foreach my $hour (sort keys %hour_names) {
  print "$hour:00:00 ", join(", " => @{ $hour_names{$hour} }), "\n";
}

__DATA__
10:00:00 Bob
11:00:00 Tom
11:00:20 Fred
11:00:40 George
12:00:00 Bill

Output: 输出：

10:00:00 Bob
11:00:00 Tom, Fred, George
12:00:00 Bill

Answer 3

Given that, per comments on the original question, all entries for the same hour are contiguous and the file is too large to fit into memory, I would dispense with the hash entirely - if the raw file is too big to fit in memory, then a hash containing all of its data will likely also be too large. 鉴于此，根据原始问题的评论，同一小时的所有条目都是连续的，文件太大而无法放入内存中，我会完全免除哈希 - 如果原始文件太大而无法放入内存中，那么包含其所有数据的哈希可能也会太大。 (Yes, it's compressing the data a bit, but the hash itself adds substantial overhead.) （是的，它正在压缩数据，但哈希本身会增加大量开销。）

My solution, then: 我的解决方案，然后：

#!/usr/bin/env perl

use strict;
use warnings;

my $current_hour = -1;
my @names;

while (my $line = <DATA>) {
  my ($hour, $name) = $line =~ /(\d{2}):\d{2}:\d{2} (.*)/;
  next unless $hour;

  if ($hour != $current_hour) {
    print_hour($current_hour, @names);
    @names = ();
    $current_hour = $hour;
  }

  push @names, $name;
}

print_hour($current_hour, @names);

exit;

sub print_hour {
  my ($hour, @names) = @_;
  return unless @names;

  print $hour, ':00:00 ', (join ', ', @names), "\n";
}

__DATA__
10:00:00 Bob
11:00:00 Tom
11:00:20 Fred
11:00:40 George
12:00:00 Bill

Answer 4

Here's the full solution how to do it. 这是完整的解决方案。

my @readings = (
    "10:00:00 Bob",
    "11:00:00 Tom",
    "11:00:20 Fred",
    "11:00:40 George",
    "12:00:00 Bill",
);

my %hours;

for my $line (@readings) {
    $line =~ /^(\d{2}).*?([a-zA-Z]+)/;
    push(@{$hours{$1}}, $2);
}

for my $hour (sort keys %hours) {
    print "$hour:00:00 ";
    print join ", ", @{$hours{$hour}};
    print "\n";
}

This results in: 这导致：

10:00:00 Bob
11:00:00 Tom, Fred, George
12:00:00 Bill

在Perl中，如何处理多行

问题描述

4 个解决方案

解决方案1
2 2010-07-10 11:28:04

解决方案2
2 2010-07-10 14:59:43

解决方案3
2 已采纳 2010-07-11 10:01:53

解决方案4
0 2010-07-10 11:47:33

在Perl中，如何处理多行

问题描述

4 个解决方案

解决方案1 2 2010-07-10 11:28:04

解决方案2 2 2010-07-10 14:59:43

解决方案3 2 已采纳 2010-07-11 10:01:53

解决方案4 0 2010-07-10 11:47:33

解决方案1
2 2010-07-10 11:28:04

解决方案2
2 2010-07-10 14:59:43

解决方案3
2 已采纳 2010-07-11 10:01:53

解决方案4
0 2010-07-10 11:47:33