Using AWK or sed how can I remove any line where the timestamp (first column) is not equal to 13 numeric characters while ignoring the first line.
Before:
timestamp,pageNo,description
1451317591621,01,Home Page Request
14513,Home Page Request
1451317591623,03,Home Page Request
1451317,04,Home Page Request
1451317591625,05,Home Page Request
After:
timestamp,pageNo,description
1451317591621,01,Home Page Request
1451317591623,03,Home Page Request
1451317591625,05,Home Page Request
Using sed
, pass if the line number is one or the first field consists of exactly thirteen digits; else, delete.
sed -r -e '1b' -e '/^[0-9]{13},/b' -e d file
Using Awk, similarly, print if line number is one or the first field is thirteen characters and all numbers.
awk -F , 'NR == 1 || (len($1) == 13 && $1 ~ /^[0-9]*$/)' file
Using awk
(requires gawk 4+ or 3+ with --re-interval option)
awk -F, '$1~/^[0-9]{13}$/||NR==1' file
Using sed
sed '/^[0-9]\{13\},/p;1p;d' file
awk -F, 'NR==1 || (length($1) == 13 && $1+0 == $1)' file
If Perl is an option:
perl -F, -ane 'print if $F[0] =~ /^[0-9]{13}$/ or $. == 1' file
These command-line options are used:
-n
loop around each line of the input file -a
autosplit mode – split input lines into the @F
array. Defaults to splitting on whitespace. -e
execute the perl code -F
autosplit modifier, in this case splits on ,
$.
is the line number
@F
is the array of words in each line, indexed starting with $F[0]
output:
timestamp,pageNo,description
1451317591621,01,Home Page Request
1451317591623,03,Home Page Request
1451317591625,05,Home Page Request
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.