简体   繁体   English

Unix模式日期时间匹配

[英]Unix pattern datetime match

I want to edit this line: 我想编辑这一行:

1987,4,12,31,4,1987-12-31 00:00:00.0000000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.0000000,519494350

and i want the output to be : 我希望输出是:

1987,4,12,31,4, 1987-12-31 00:00:00.000 ,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,, 1987-12-31 08:09:12.000 ,519494350 1987,4,12,31,4,1987-12-31 00:00:00.000 ,UA,19977,UA ,, 631,12197,1219701,31703,HPN,White Plains,NY,NY,36,New York, 22,13930,1393001,30977,ORD,Chicago \\,IL,IL,17,伊利诺伊州,41,756,802,483.2,6,6,0,0,0700-0759 ,,,,, 914,938,600.8,24,24,1,1,0900 -0959,0,,0,138,156,,1,738,3 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, US1NJBG0005 US1ILCK0027 1987-12-31 08:09: 12.000,519494350

I want to find each pattern of: ****-**-** **:**:**.0000000 我想找到以下每种模式: ****-**-** **:**:**.0000000

and erase the last 4 digits ( 0000 ) so I get ****-**-** **:**:**.000. 并删除最后4位数字(0000),这样我得到****-**-** **:**:**.000.

If its helpful this date format is in the 6th columns and the n-1 columns. 如果有帮助,则此日期格式在第6列和n-1列中。

To get the value of the 6th column and erase the last four digits you can use: 要获取第六列的值并删除最后四位数字,可以使用:

awk -F, '{print substr($6, 0, length($6)-4) }'

Similarly, the N-1 column can be reached by: 同样,可以通过以下方式访问N-1列:

awk -F, '{print substr( $(NF-1), 0, length($(NF-1))-4) }'

Edit: 编辑:

To only replace the values in the columns, but still print everything use: 要仅替换列中的值,但仍打印所有内容,请使用:

awk 'BEGIN{ FS=","; OFS=","} 
{ $6=substr($6, 0, length($6)-4); 
  $(NF-1)=substr( $(NF-1), 0,length($(NF-1))-4); 
  print $0}'

Awk based solution 基于awk的解决方案

Nicely formatted, portable script: 格式精美,可移植的脚本:

#!/usr/bin/awk -f
BEGIN {
    FS = ","  # input:  fields are separated by ,
    OFS = "," # output: fields are separated by ,
}

{
    sub(/[0-9][0-9][0-9][0-9]$/, "", $6)      # remove last 4 digits from the 6th column
    sub(/[0-9][0-9][0-9][0-9]$/, "", $(NF-1)) # remove last 4 digits from the n-1 column
    print
}

One-line, less portable version using gawk : 使用gawk的单行,便携式性较低的版本:

gawk --re-interval -F , -v OFS=, '{sub("[0-9]{4}$", "", $6); sub("[0-9]{4}$", "", $(NF-1)); print}'

NB The regular expression engine of the traditional awk doesn't support the {n} repetition operator, so gawk version 3 or older needs to be run with --re-interval . 注意 :传统awk的正则表达式引擎不支持{n}重复运算符,因此gawk版本3或更早版本需要使用--re-interval运行。 For other flavors of awk eg nawk , you need to explicitly repeat the regular expression as in the portable longer script from above. 对于其他awk风格,例如nawk ,您需要像从上面的可移植较长脚本中那样显式重复正则表达式。

sed based solution 基于sed的解决方案

sed -r 's/^(([^,]*,){5})([^,]+)[0-9]{4},(([^,]*,)*)([^,]+)[0-9]{4}(,[^,]*)$/\1\3\4\6\7/'

(tested with GNU sed-4.2.2-6 ) (使用GNU sed-4.2.2-6测试)

Here's a solution in Perl. 这是Perl中的解决方案。

Update - Edited to output the full CSV line with the timestamp replaced with the truncated one 更新-编辑以输出完整的CSV行,其中的时间戳已被截断的CSV替代

Update 2 - Update both timestamp columns, not just the first one 更新2-更新两个时间戳列,而不仅仅是第一列

#!/usr/bin/env perl

use strict;
use warnings;
use feature 'say';

use Text::CSV;

my $CSV = Text::CSV->new();

while (my $line = readline(STDIN)) {
    $CSV->parse($line) or die "Unable to parse line '$line'";

    my @fields = $CSV->fields();

    for my $f (@fields) {
        $f =~ s/
            ^               # start of string
            (               # start capture to $1
                \d{4} -     # year
                \d{2} -     # month
                \d{2} \s+   # day
                \d{2} :     # hour
                \d{2} :     # minute
                \d{2} [.]   # second
                \d{3}       # milisecond
            )               # end capture to $1
            \d{4}           # unwanted sub-second precision
            $               # end of string
        /$1/gmsx;
    }

    $CSV->combine(@fields);
    say $CSV->string();
}

For example: 例如:

alex@yuzu:~$ cat input.txt 
1987,4,12,31,4,1987-12-31 00:00:00.0000000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.0000000,519494350

alex@yuzu:~$ ./csv.pl < input.txt
1987,4,12,31,4,"1987-12-31 00:00:00.000",UA,19977,UA,,631,12197,1219701,31703,HPN,"White Plains"," NY",NY,36,"New York",22,13930,1393001,30977,ORD,Chicago\," IL",IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,"1987-12-31 08:09:12.000",519494350

On a Debian-like system such as Ubuntu you should already have Perl, and you can install Text::CSV with: 在像Debian这样的Debian系统上,您应该已经拥有Perl,并且可以使用以下命令安装Text :: CSV:

$ sudo apt-get install libtext-csv-perl

You could try this GNU sed command also, 您也可以尝试使用此GNU sed命令,

$ sed -r 's/^.*,([^,]*)....,.*$/\1/g' file
1987-12-31 08:09:12.000

If you want just replacing then try this, 如果您只想更换,请尝试一下,

$ sed -r 's/^(.*,)([^,]*)....(,.*)$/\1\2\3/g' file
1987,4,12,31,4,1987-12-31 00:00:00.0000000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.000,519494350

I think you want the the output to be like this, 我想您希望输出是这样的,

$ grep -oP '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\....' file
1987-12-31 00:00:00.000
1987-12-31 08:09:12.000

Update: 更新:

$ echo '1987,4,12,31,4,1987-12-31 00:00:00.0000000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.0000000,519494350' | sed -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\....)..../\1/g'
1987,4,12,31,4,1987-12-31 00:00:00.000,UA,19977,UA,,631,12197,1219701,31703,HPN,White Plains, NY,NY,36,New York,22,13930,1393001,30977,ORD,Chicago\, IL,IL,17,Illinois,41,756,802,483.2,6,6,0,0,0700-0759,,,,,914,938,600.8,24,24,1,1,0900-0959,0,,0,138,156,,1,738,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,US1NJBG0005,US1ILCK0027,,,,,,,,,,,,,1987-12-31 08:09:12.000,519494350

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM