[英]Unix pattern datetime match
我想編輯這一行:
1987,4,12,31,4,1987-12-31 00:00:00.0000000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.0000000,519494350
我希望輸出是:
1987,4,12,31,4,1987-12-31 00:00:00.000 ,UA,19977,UA ,, 631,12197,1219701,31703,HPN,White Plains,NY,NY,36,New York, 22,13930,1393001,30977,ORD,Chicago \\,IL,IL,17,伊利諾伊州,41,756,802,483.2,6,6,0,0,0700-0759 ,,,,, 914,938,600.8,24,24,1,1,0900 -0959,0,,0,138,156,,1,738,3 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, US1NJBG0005 US1ILCK0027 1987-12-31 08:09: 12.000,519494350
我想找到以下每種模式: ****-**-** **:**:**.0000000
並刪除最后4位數字(0000),這樣我得到****-**-** **:**:**.000.
如果有幫助,則此日期格式在第6列和n-1列中。
要獲取第六列的值並刪除最后四位數字,可以使用:
awk -F, '{print substr($6, 0, length($6)-4) }'
同樣,可以通過以下方式訪問N-1列:
awk -F, '{print substr( $(NF-1), 0, length($(NF-1))-4) }'
編輯:
要僅替換列中的值,但仍打印所有內容,請使用:
awk 'BEGIN{ FS=","; OFS=","}
{ $6=substr($6, 0, length($6)-4);
$(NF-1)=substr( $(NF-1), 0,length($(NF-1))-4);
print $0}'
格式精美,可移植的腳本:
#!/usr/bin/awk -f
BEGIN {
FS = "," # input: fields are separated by ,
OFS = "," # output: fields are separated by ,
}
{
sub(/[0-9][0-9][0-9][0-9]$/, "", $6) # remove last 4 digits from the 6th column
sub(/[0-9][0-9][0-9][0-9]$/, "", $(NF-1)) # remove last 4 digits from the n-1 column
print
}
使用gawk的單行,便攜式性較低的版本:
gawk --re-interval -F , -v OFS=, '{sub("[0-9]{4}$", "", $6); sub("[0-9]{4}$", "", $(NF-1)); print}'
注意 :傳統awk的正則表達式引擎不支持{n}
重復運算符,因此gawk版本3或更早版本需要使用--re-interval
運行。 對於其他awk風格,例如nawk ,您需要像從上面的可移植較長腳本中那樣顯式重復正則表達式。
sed -r 's/^(([^,]*,){5})([^,]+)[0-9]{4},(([^,]*,)*)([^,]+)[0-9]{4}(,[^,]*)$/\1\3\4\6\7/'
(使用GNU sed-4.2.2-6測試)
這是Perl中的解決方案。
更新-編輯以輸出完整的CSV行,其中的時間戳已被截斷的CSV替代
更新2-更新兩個時間戳列,而不僅僅是第一列
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
use Text::CSV;
my $CSV = Text::CSV->new();
while (my $line = readline(STDIN)) {
$CSV->parse($line) or die "Unable to parse line '$line'";
my @fields = $CSV->fields();
for my $f (@fields) {
$f =~ s/
^ # start of string
( # start capture to $1
\d{4} - # year
\d{2} - # month
\d{2} \s+ # day
\d{2} : # hour
\d{2} : # minute
\d{2} [.] # second
\d{3} # milisecond
) # end capture to $1
\d{4} # unwanted sub-second precision
$ # end of string
/$1/gmsx;
}
$CSV->combine(@fields);
say $CSV->string();
}
例如:
alex@yuzu:~$ cat input.txt
1987,4,12,31,4,1987-12-31 00:00:00.0000000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.0000000,519494350
alex@yuzu:~$ ./csv.pl < input.txt
1987,4,12,31,4,"1987-12-31 00:00:00.000",UA,19977,UA,,631,12197,1219701,31703,HPN,"White Plains"," NY",NY,36,"New York",22,13930,1393001,30977,ORD,Chicago\," IL",IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,"1987-12-31 08:09:12.000",519494350
在像Debian這樣的Debian系統上,您應該已經擁有Perl,並且可以使用以下命令安裝Text :: CSV:
$ sudo apt-get install libtext-csv-perl
您也可以嘗試使用此GNU sed命令,
$ sed -r 's/^.*,([^,]*)....,.*$/\1/g' file
1987-12-31 08:09:12.000
如果您只想更換,請嘗試一下,
$ sed -r 's/^(.*,)([^,]*)....(,.*)$/\1\2\3/g' file
1987,4,12,31,4,1987-12-31 00:00:00.0000000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.000,519494350
我想您希望輸出是這樣的,
$ grep -oP '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\....' file
1987-12-31 00:00:00.000
1987-12-31 08:09:12.000
更新:
$ echo '1987,4,12,31,4,1987-12-31 00:00:00.0000000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.0000000,519494350' | sed -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\....)..../\1/g'
1987,4,12,31,4,1987-12-31 00:00:00.000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.000,519494350
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.