简体   繁体   English

如何加快此日志解析器的速度?

[英]How to speed up this log parser?

I have a gigabytes-large log file of in this format: 我有一个千兆字节的日志文件,格式如下:

2016-02-26 08:06:45 Blah blah blah

I have a log parser which splits up the single file log into separate files according to date while trimming the date from the original line. 我有一个日志解析器,可以根据日期将单个文件日志拆分为单独的文件,同时从原始行中修剪日期。

I do want some form of tee so that I can see how far along the process is. 我确实想要某种形式的tee以便我能看到整个过程有多远。

The problem is that this method is mind numbingly slow. 问题是这种方法的思维速度很慢。 Is there no way to do this quickly in bash? 有没有办法在bash中快速做到这一点? Or will I have to whip up a little C program to do it? 还是我必须鞭打一点C程序才能做到这一点?

log_file=server.log
log_folder=logs

mkdir $log_folder 2> /dev/null

while read a; do
   date=${a:0:10}

   echo "${a:11}" | tee -a $log_folder/$date
done < <(cat $log_file)

read in bash is absurdly slow. bash中的read速度非常慢。 You can make it faster, but you can probably get more speed up with awk: 您可以使其速度更快,但是使用awk可以使速度更快:

#!/bin/bash

log_file=input
log_directory=${1-logs}

mkdir -p $log_directory

awk 'NF>1{d=l"/"$1; $1=""; print > d}' l=$log_directory $log_file

If you really want to print to stdout as well, you can, but if that's going to a tty it is going to slow things down a lot. 如果您真的也想打印到标准输出,则可以,但是如果要打印到tty,它将使速度大大降低。 Just use: 只需使用:

awk '{d=l"/"$1; $1=""; print > d}1' l=$log_directory $log_file

(Note the "1" after the closing brace.) (请注意右大括号后为“ 1”。)

Try this awk solution - it should be pretty fast - it shows progress - only one file is kept open - also writes lines that don't start with a date to the current date file so lines are not lost - a default initial date is set to "0000-00-00" in case log starts with lines without dates 试试这个awk解决方案-它应该非常快-它可以显示进度-仅打开一个文件-还可以将不以日期开头的行写入当前日期文件,这样行就不会丢失-设置了默认的初始日期到“ 0000-00-00”,以防日志以没有日期的行开头

any timing comparison would be much appreciated 任何时间比较将不胜感激

dir=$1
if [[ -z $dir ]]; then
  echo >&2 "Usage: $0 outdir <logfile"
  echo >&2 "outdir: directory where output files are created"
  echo >&2 "logfile: input on stdin to split into output files"
  exit 1
fi
mkdir -p $dir
echo "output directory \"$dir\""
awk -vdir=$dir '
BEGIN {
  datepat="[0-9]{4}-[0-9]{2}-[0-9]{2}"
  date="0000-00-00"
  file=dir"/"date
}
date != $1 && $1 ~ datepat {
  if(file) {
    close(file)
    print ""
  }
  print $1 ":"
  date=$1
  file=dir"/"date
}
{
  if($1 ~ datepat)
    line=substr($0,12)
  else
    line=$0
  print line
  print line >file
}
'
head -6 $dir/*

sample input log 样本输入日志

first line without date
2016-02-26 08:06:45 0 Blah blah blah
2016-02-26 09:06:45 1 Blah blah blah
2016-02-27 07:06:45 2 Blah blah blah
2016-02-27 08:06:45 3 Blah blah blah
no date line
blank lines

another no date line
2016-02-28 07:06:45 4 Blah blah blah
2016-02-28 08:06:45 5 Blah blah blah

output 输出

first line without date

2016-02-26:
08:06:45 0 Blah blah blah
09:06:45 1 Blah blah blah

2016-02-27:
07:06:45 2 Blah blah blah
08:06:45 3 Blah blah blah
no date line
blank lines

another no date line

2016-02-28:
07:06:45 4 Blah blah blah
08:06:45 5 Blah blah blah

==> tmpd/0000-00-00 <==
first line without date

==> tmpd/2016-02-26 <==
08:06:45 0 Blah blah blah
09:06:45 1 Blah blah blah

==> tmpd/2016-02-27 <==
07:06:45 2 Blah blah blah
08:06:45 3 Blah blah blah
no date line
blank lines

another no date line

==> tmpd/2016-02-28 <==
07:06:45 4 Blah blah blah
08:06:45 5 Blah blah blah

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM