[英]Parse txt file using perl
我想解析一個包含數據的文件,如下所示:
05\/26\/2013 06:09:47 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=radash@abc.com\,ou=People\,o=zeb.com - 06:09:47 - http - uizweb_zam - - 2uid=bolched@abc.com
05\/26\/2013 06:09:48 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=rad-ash2s@abc.com\,ou=People\,o=zeb.com - 06:09:48 - http - uizweb_zam - - 2uid=bolchedssd@abc.com
05\/26\/2013 06:09:49 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.43 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=sjhsjdh@abc.com\,ou=People\,o=zeb.com - 06:09:49 - http - uizweb_zam - - 2uid=kjsdsdjhjsh@abc.com
並得到:
05/26/2013 06:09:49 and uid=radash@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:48 and uid=rad-ash2s@abc.com,ou=People,o=zeb.com
我嘗試了split('-'),但它不起作用split('-'),因為如您所見:像上面第二行一樣,有些行之間有:rad-ash2s@abc.com('-')。 有時,數據的其他部分也有“-”。
請幫忙。
您最好使用正則表達式。 使用正則表達式,我可以使用(...)
快速獲取想要的字符串部分。 請參閱有關正則表達式的Perldoc,以了解各種正則表達式元字符的含義。
#! /usr/bin/env perl
use 5.12.0;
use warnings;
use autodie;
while ( my $line = <DATA> ) {
chomp $line;
$line =~ s/\\//g; #Remove all backslashes
$line =~ /^(.+?) -.+?(uid=\S+)/;
my $date = $1;
my $uid = $2;
say qq($date and $uid);
}
__DATA__
05\/26\/2013 06:09:47 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=radash@abc.com\,ou=People\,o=zeb.com - 06:09:47 - http - uizweb_zam - - 2uid=bolched@abc.com
05\/26\/2013 06:09:48 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=rad-ash2s@abc.com\,ou=People\,o=zeb.com - 06:09:48 - http - uizweb_zam - - 2uid=bolchedssd@abc.com
05\/26\/2013 06:09:49 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.43 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=sjhsjdh@abc.com\,ou=People\,o=zeb.com - 06:09:49 - http - uizweb_zam - - 2uid=kjsdsdjhjsh@abc.com
該程序可以滿足您的要求。 看起來像字段分隔符是' - '
,即,連字符左右兩側都有一個空格,給出了一個倒數第二個字段(第11個)。
該程序期望輸入文件的名稱作為命令行上的參數。
use strict;
use warnings;
while (<>) {
chomp;
tr/\\//d;
my @fields = split /\x20-\x20/;
printf "%s and %s\n", @fields[0,6];
}
使用您自己的數據,這將產生
05/26/2013 06:09:47 -0700 and uid=radash@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:48 -0700 and uid=radash2s@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:49 -0700 and uid=sjhsjdh@abc.com,ou=People,o=zeb.com
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.