简体   繁体   English

修改数组perl中的多个元素

[英]modifying multiple elements in array perl

Say I were to have a file with a name field and 3 date fields and wanted to reformat the dates. 假设我要创建一个带有名称字段和3个日期字段的文件,并想重新格式化日期。 I could go like this: 我可以这样:

while (<DATA>) {
  my @lines = split(/\|/); ##splitting DATA by '|'

  my @dates = split( /\/|[-]/, $lines[0] ); #splitting only the first element of array and performing modifications below.
  if ( $dates[2] =~ /^[0-1][0-9]$/gi ) { $dates[2] = $dates[2] + 2000 }
  elsif ( $dates[2] =~ /^[2-9][0-9]$/gi ) {
    $dates[2] = $dates[2] + 1900;
  }
  if ( $dates[1] =~ /^\d$/gi ) { $dates[1] = "0" . $dates[1] }
  if ( $dates[0] =~ /^\d$/gi ) { $dates[0] = "0" . $dates[0] }
  my $date = join "-", @dates[ 2, 0, 1 ]; #joining the dates to be in yyyy-mm-dd format.
  print $date, "\n"; #double check
  print $date, ",", ( join ",", @lines[ 1 .. $#lines ] ), "\n"; appending date to print the join of @lines.
}

Instead of having to split and join each $line[0] through $lines[2], is there a way to perform the modifications on all of the deisired fields at once? 不必通过$ lines [2]拆分和连接每个$ line [0],有没有办法立即对所有需要的字段执行修改? ($lines[0] through $lines[2]). ($ lines [0]至$ lines [2])。

__DATA__
12/23/2014|2/20/1995|3/25/1905|josh

Your script gives nasty output with the input you provided. 您的脚本提供了令人讨厌的输出以及您提供的输入。 The output of the following seems much more logical to me: 以下输出对我来说似乎更合乎逻辑:

#!/usr/bin/env perl

use strict;
use warnings;

while (my $line = <DATA>) {
    next unless $line =~ /\S/;
    my ($name, @dates) = reverse split qr{\|}, $line;
    @dates = reverse map sprintf('%04d-%02d-%02d', (split qr{/})[2,0,1]), @dates;
    print join(',', @dates, $name), "\n";
}
__DATA__
12/23/2014|2/20/1995|3/25/1905|josh

Output: 输出:

2014-12-23,1995-02-20,1905-03-25,josh

If this not the output you want then describe the exact output you are trying to get. 如果这不是您想要的输出,请描述您要获取的确切输出。

A few points: 几点:

  • while (<DATA>) reads a single line from DATA . while (<DATA>)DATA读取一行。 Assign it to a meaningful variable to make your code easier to read. 将其分配给有意义的变量,以使您的代码更易于阅读。

  • Skip processing on empty lines 跳过空行处理

  • Don't fall victim to LTS : /\\|/ is much harder to distinguish than qr{\\|} or even qr{ \\| }x 不要成为LTS的受害者: /\\|/甚至比qr{\\|}甚至qr{ \\| }x区分 qr{ \\| }x . qr{ \\| }x

  • The reverse s make the code easier to read, but if you have a ton of fields, they may become a real bottleneck. reverse s使代码更易于阅读,但是如果您有很多字段,它们可能会成为真正的瓶颈。 In that case, pop and push . 在这种情况下, poppush

You can use regular expressions to help match your data line various bits and pieces and do away with the multiple splits. 您可以使用正则表达式来帮助您的数据行各不相同,并且消除多重拆分。 Using regular expressions can also verify your line format. 使用正则表达式还可以验证您的行格式。 Do you have a 3 digit month? 您有3位数的月份吗? Do you have three dates? 你有三个约会吗? It's always a good idea to verify your input: 验证输入始终是一个好主意:

#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);

my $date_re = qr(
        ^(?<month1>\d{1,2})/
        (?<day1>\d{1,2})/
        (?<year1>\d{2,4})
        \|                    # Separator between date1 and date2
        (?<month2>\d{1,2})/
        (?<day2>\d{1,2})/
        (?<year2>\d{2,4})
        \|                    # Separator between date2 and date3
        (?<month3>\d{1,2})/
        (?<day3>\d{1,2})/
        (?<year3>\d{2,4})
        \|                    # Separator between date3 and name
        (?<name>.*)
    )x;
while ( my $line = <DATA> ) {
    my @array;
    if ( not @array = $line =~ m^$date_re^ ) {
        say "Something's wrong";
    }
    else {
        say "First Date: Year = $+{year1}  Month = $+{month1}  Day = $+{day1}";
        say "Second Date: Year = $+{year2}  Month = $+{month2}  Day = $+{day2}";
        say "Third Date: Year = $+{year3}  Month = $+{month3}  Day = $+{day3}";
        say "Name = $+{name}";
    }
}

__DATA__
12/23/2014|2/20/1995|3/25/1905|josh

Running this program prints out: 运行此程序将输出:

First Date: Year = 2014  Month = 12  Day = 23
Second Date: Year = 1995  Month = 2  Day = 20
Third Date: Year = 1905  Month = 3  Day = 25
Name = josh

This is using some advanced features of regular expressions: 这使用了正则表达式的一些高级功能

  • qr/.../ can be used to define regular expressions. qr/.../可用于定义正则表达式。 Since you have slashes in the regular expression, I decided to use parentheses to delimit my regular expression, so it's qr(...) . 由于您在正则表达式中有斜线,因此我决定使用括号来分隔正则表达式,因此它是qr(...)

  • The )x at the end means I can use white space to make my regular expression easier to understand. 最后的)x表示我可以使用空格使正则表达式更易于理解。 For example, I broke out each of the dates onto three lines (month, day, year). 例如,我将每个日期分成三行(月,日,年)。

  • (?<name>...) names your capture group which makes it easier to refer back to a particular capture group. (?<name>...)为捕获组命名,这样可以更轻松地引用特定的捕获组。 I can use the %+ hash to recall my capture groups. 我可以使用%+哈希值来调用捕获组。 For example (?<month1>\\d{1,2}) says that I expect a 1 to two digit month. 例如(?<month1>\\d{1,2})表示我期望一个1到两位数的月份。 I store this in the capture group month1 , and I can refer back to this by using $+{month1} . 我将其存储在捕获组month1中 ,并且可以使用$+{month1}

    One of the nice things about using named capture groups is that it documents what you're attempting to capture. 使用命名捕获组的好处之一是,它记录了您要捕获的内容。

  • The {M,N} is a repeat. {M,N}是重复的。 I expect the previous regular expression to happen from M to N times. 我希望以前的正则表达式发生MN次。 \\d{1,2} means I'm expecting one or two digits. \\d{1,2}表示我期望一个或两个数字。

You have to keep the spliting of the fields and the join, but you may reduce the substitutions to: 您必须保留字段和联接的拆分,但是可以将替换减少为:

$dates[2] =~ s/^([01]\d)$/20$1/;
$dates[2] =~ s/^([2-9]\d)$/19$1/;
$dates[1] =~ s/^(\d)$/0$1/;
$dates[0] =~ s/^(\d)$/0$1/;

It's ugly, but it does it all the fields in one pass as you requested. 这很丑陋,但可以按照您的要求一口气完成所有字段。

while (<DATA>) {
    s/
      (?:^|\|)\K # start after a leading start-of-line or pipe
      (\d{1,2})
      [\/-]
      (\d{1,2})
      [\/-]
      (\d\d(?:\d\d)?)
      (?=\||\z) # look-ahead to see trailing pipe or end-of-string
     /
        sprintf('%04d-%02d-%02d',
            $3 <  20 ? $3 + 2000
          : $3 < 100 ? $3 + 1900
          : $3,
            $1,
            $2
        )
     /gex;
    print;

}

__DATA__
1/23/14|2/20/95|3/25/1905|josh

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM