简体   繁体   English

如何处理最早到最新的bash文件?

[英]How to process files oldest to newest bash?

Overview 概观

I have a bunch of log files which rollover when they reach a certain size. 我有一堆日志文件,当它们达到一定大小时会翻转。 Each line in the log file has a bunch of logger formatting and then some interesting information. 日志文件中的每一行都有一堆记录器格式,然后是一些有趣的信息。 I want to take those files and remove the formatting from the beginning of each line and then put the output of all of that into a single file. 我想获取这些文件,并从每一行的开头删除格式,然后将所有这些文件的输出放入单个文件中。 I will then eventually take that one file and load it into another application manually. 然后,我将最终获取该文件并将其手动加载到另一个应用程序中。

Details 细节

The file structure looks something like this: 文件结构如下所示:

logs
 |-- modules
 |    +-- ...
 |-- application.log
 |-- gc.log
 |-- gc.log.1
 |-- ...
 +-- gc.log.10

So logs contains subdirectories and multiple log files. 因此, logs包含子目录和多个日志文件。 The ones I am insterested are gc.log* . 我很gc.log*gc.log*

Each gc.log* file rolls over to a new file when it gets full. 每个gc.log*文件变满时都会翻转到一个新文件。 gc.log is always the newest and it goes up to gc.log.10 being the oldest (by default there are only 10, max version 9, but this is configurable). gc.log始终是最新的,并且gc.log.10最旧的gc.log.10 (默认情况下只有10,最高版本9,但这是可配置的)。

A typical gc.log* contains thousands of entries like: 典型的gc.log*包含数千个条目,例如:

INFO   | jvm 1    | 2015/05/28 04:40:58 | 1164752.977: [GC pause (young), 0.06583700 secs]
INFO   | jvm 1    | 2015/05/28 04:40:58 |    [Parallel Time:  45.2 ms]
INFO   | jvm 1    | 2015/05/28 04:40:58 |       [GC Worker Start (ms):  1164752977.7  1164752977.7  1164752977.7  1164752977.9
INFO   | jvm 1    | 2015/05/28 04:40:58 |        Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff:   0.2]
...

(Yes these are G1 GC logs from the Oracle JVM. It is these that I need in a separate file so I can graph with GCViewer) (是的,这些是来自Oracle JVM的G1 GC日志。这些是我需要在单独的文件中,以便可以使用GCViewer绘制图)

Once I have stripped out the formatting I need it to look like: 删除格式后,我需要它看起来像:

1164752.977: [GC pause (young), 0.06583700 secs]
   [Parallel Time:  45.2 ms]
      [GC Worker Start (ms):  1164752977.7  1164752977.7  1164752977.7  1164752977.9
       Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff:   0.2]

What I have so far 到目前为止我有什么

So far I have learnt that I shouldn't be using ls to get the files. 到目前为止,我了解到我不应该使用ls来获取文件。 I found this on another SO question (sorry I forgot which one) Why you shouldn't parse the output of ls(1) . 我在另一个SO问题上发现了这个问题(对不起,我忘记了哪个问题), 为什么不应该解析ls(1)的输出

I am using the following to list the files and then sort them from oldest to newest: 我正在使用以下内容列出文件,然后从最旧到最新对它们进行排序:

find "$logDir" -maxdepth 1 -type f -name 'gc.log*' | sort -Vr

Which gives me the following: 这给了我以下内容:

./gc.log.10
./gc.log.9
./gc.log.8
./gc.log.7
./gc.log.6
./gc.log.5
./gc.log.4
./gc.log.3
./gc.log.2
./gc.log.1
./gc.log

The command I have to remove the formatting is: 我必须删除格式的命令是:

sed -e 's/^.\{7\}[|].\{10\}[|].\{21\}[|] //g'

(I may just use cut -c43- ) (我可能只使用cut -c43-

Problem 问题

I'm not sure how to get the output from sort into sed . 我不确定如何将输出从sortsed

The following doesn't work when the file name (or $logDir ) has spaces: 当文件名(或$logDir )带有空格时,以下内容不起作用:

find "$logDir" -maxdepth 1 -type f -name 'gc.log*' | sort -Vr | xargs sed -e "s/^.\{7\}[|].\{10\}[|].\{21\}[|] //g"

I'm also going to need to take the output from sed and then concatenate that all together into a single file. 我还需要从sed获取输出,然后将它们全部连接到一个文件中。

Question

Finally the question: 最后的问题:

  • How can I list certain files in a directory in reverse natural number sort order, remove a pattern from the beginning of each line in those files and lastly concatenate the results into a single file (in bash)? 如何才能以自然数字逆序排列目录中的某些文件,如何从这些文件中每一行的开头删除模式,最后将结果串联到单个文件中(以bash格式)?

Since your filenames are fixed, you can simply use brace expansion: 由于文件名是固定的,因此您可以简单地使用大括号扩展名:

for wrapper in wrapper.log{.{9..1},}; do
    echo "$wrapper"
    # do whatever you want to do...
done

For your purpose, I guess, this could work too: 我想出于您的目的,这也可以工作:

$ cat wrapper.log{.{9..1},} | sed ...

A bit more generic version: 通用版本:

$ logfile="wrapper.log" # may contain spaces in filename
$ cat "$logfile"{.{9..1},} | sed ...

In this case your file names are so simple and you're doing so little with them, I'd be tempted to just use the ls output, assuming your files have the intuitive progressive modification times then all you'd need is: 在这种情况下,您的文件名是如此简单,而您却对它们做的很少,我很想只使用ls输出,假设您的文件具有直观的渐进修改时间,那么您所需要做的就是:

ls -rt gc.? gc | xargs awk -F' [|] ' '{print $NF}' > newfile

For example: 例如:

$ cat gc
INFO   | jvm 1    | 2015/05/28 04:40:58 | 1164752.977: [GC pause (young), 0.06583700 secs]
INFO   | jvm 1    | 2015/05/28 04:40:58 |    [Parallel Time:  45.2 ms]
INFO   | jvm 1    | 2015/05/28 04:40:58 |       [GC Worker Start (ms):  1164752977.7  1164752977.7  1164752977.7  1164752977.9
INFO   | jvm 1    | 2015/05/28 04:40:58 |        Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff:   0.2]
$
$ cat gc.1
INFO   | jvm 1    | 2015/05/28 04:40:58 | 1234567.977: [GC pause (young), 0.06583700 secs]
INFO   | jvm 1    | 2015/05/28 04:40:58 |    [Parallel Time:  45.2 ms]
INFO   | jvm 1    | 2015/05/28 04:40:58 |       [GC Worker Start (ms):  1164752977.7  1164752977.7  1164752977.7  1164752977.9
INFO   | jvm 1    | 2015/05/28 04:40:58 |        Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff:   0.2]
$
$ cat gc.2
INFO   | jvm 1    | 2015/05/28 04:40:58 | 8889996.977: [GC pause (young), 0.06583700 secs]
INFO   | jvm 1    | 2015/05/28 04:40:58 |    [Parallel Time:  45.2 ms]
INFO   | jvm 1    | 2015/05/28 04:40:58 |       [GC Worker Start (ms):  1164752977.7  1164752977.7  1164752977.7  1164752977.9
INFO   | jvm 1    | 2015/05/28 04:40:58 |        Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff:   0.2]

$ ls -rt gc.? gc | xargs awk -F' [|] ' '{print $NF}'
8889996.977: [GC pause (young), 0.06583700 secs]
   [Parallel Time:  45.2 ms]
      [GC Worker Start (ms):  1164752977.7  1164752977.7  1164752977.7  1164752977.9
       Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff:   0.2]
1234567.977: [GC pause (young), 0.06583700 secs]
   [Parallel Time:  45.2 ms]
      [GC Worker Start (ms):  1164752977.7  1164752977.7  1164752977.7  1164752977.9
       Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff:   0.2]
1164752.977: [GC pause (young), 0.06583700 secs]
   [Parallel Time:  45.2 ms]
      [GC Worker Start (ms):  1164752977.7  1164752977.7  1164752977.7  1164752977.9
       Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff:   0.2]

If you want to do it right (and have GNU find and sort ), tell find to write the filenames preceded by mtime and separated by NUL characters (the only character which can't exist in a file's fully qualified path on UNIX); 如果您想做对(并让GNU findsort ),请告诉find写以mtime开头并以NUL字符分隔的文件名(在UNIX上文件的完全限定路径中不能存在的唯一字符); use sort to sort by mtime (rather than trying to mess with names); 使用sort按mtime排序(而不是弄乱名称); and then read both pieces of data in: 然后读取以下两个数据:

while IFS= read -r -d ' ' mtime && IFS= read -r -d '' filename; do
  sed -e 's/^.\{7\}[|].\{10\}[|].\{21\}[|] //g' <"$filename"
done < <(find "$logDir" -maxdepth 1 -type f -printf '%T@ %P\0' | sort -nz)

This will process files in order from oldest to newest. 这将按从最早到最新的顺序处理文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM