[英]How to process files oldest to newest bash?
Overview 概观
I have a bunch of log files which rollover when they reach a certain size. 我有一堆日志文件,当它们达到一定大小时会翻转。 Each line in the log file has a bunch of logger formatting and then some interesting information.
日志文件中的每一行都有一堆记录器格式,然后是一些有趣的信息。 I want to take those files and remove the formatting from the beginning of each line and then put the output of all of that into a single file.
我想获取这些文件,并从每一行的开头删除格式,然后将所有这些文件的输出放入单个文件中。 I will then eventually take that one file and load it into another application manually.
然后,我将最终获取该文件并将其手动加载到另一个应用程序中。
Details 细节
The file structure looks something like this: 文件结构如下所示:
logs
|-- modules
| +-- ...
|-- application.log
|-- gc.log
|-- gc.log.1
|-- ...
+-- gc.log.10
So logs
contains subdirectories and multiple log files. 因此,
logs
包含子目录和多个日志文件。 The ones I am insterested are gc.log*
. 我很
gc.log*
是gc.log*
。
Each gc.log*
file rolls over to a new file when it gets full. 每个
gc.log*
文件变满时都会翻转到一个新文件。 gc.log
is always the newest and it goes up to gc.log.10
being the oldest (by default there are only 10, max version 9, but this is configurable). gc.log
始终是最新的,并且gc.log.10
最旧的gc.log.10
(默认情况下只有10,最高版本9,但这是可配置的)。
A typical gc.log*
contains thousands of entries like: 典型的
gc.log*
包含数千个条目,例如:
INFO | jvm 1 | 2015/05/28 04:40:58 | 1164752.977: [GC pause (young), 0.06583700 secs]
INFO | jvm 1 | 2015/05/28 04:40:58 | [Parallel Time: 45.2 ms]
INFO | jvm 1 | 2015/05/28 04:40:58 | [GC Worker Start (ms): 1164752977.7 1164752977.7 1164752977.7 1164752977.9
INFO | jvm 1 | 2015/05/28 04:40:58 | Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff: 0.2]
...
(Yes these are G1 GC logs from the Oracle JVM. It is these that I need in a separate file so I can graph with GCViewer) (是的,这些是来自Oracle JVM的G1 GC日志。这些是我需要在单独的文件中,以便可以使用GCViewer绘制图)
Once I have stripped out the formatting I need it to look like: 删除格式后,我需要它看起来像:
1164752.977: [GC pause (young), 0.06583700 secs]
[Parallel Time: 45.2 ms]
[GC Worker Start (ms): 1164752977.7 1164752977.7 1164752977.7 1164752977.9
Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff: 0.2]
What I have so far 到目前为止我有什么
So far I have learnt that I shouldn't be using ls
to get the files. 到目前为止,我了解到我不应该使用
ls
来获取文件。 I found this on another SO question (sorry I forgot which one) Why you shouldn't parse the output of ls(1) . 我在另一个SO问题上发现了这个问题(对不起,我忘记了哪个问题), 为什么不应该解析ls(1)的输出 。
I am using the following to list the files and then sort them from oldest to newest: 我正在使用以下内容列出文件,然后从最旧到最新对它们进行排序:
find "$logDir" -maxdepth 1 -type f -name 'gc.log*' | sort -Vr
Which gives me the following: 这给了我以下内容:
./gc.log.10
./gc.log.9
./gc.log.8
./gc.log.7
./gc.log.6
./gc.log.5
./gc.log.4
./gc.log.3
./gc.log.2
./gc.log.1
./gc.log
The command I have to remove the formatting is: 我必须删除格式的命令是:
sed -e 's/^.\{7\}[|].\{10\}[|].\{21\}[|] //g'
(I may just use cut -c43-
) (我可能只使用
cut -c43-
)
Problem 问题
I'm not sure how to get the output from sort
into sed
. 我不确定如何将输出从
sort
到sed
。
The following doesn't work when the file name (or $logDir
) has spaces: 当文件名(或
$logDir
)带有空格时,以下内容不起作用:
find "$logDir" -maxdepth 1 -type f -name 'gc.log*' | sort -Vr | xargs sed -e "s/^.\{7\}[|].\{10\}[|].\{21\}[|] //g"
I'm also going to need to take the output from sed
and then concatenate that all together into a single file. 我还需要从
sed
获取输出,然后将它们全部连接到一个文件中。
Question 题
Finally the question: 最后的问题:
Since your filenames are fixed, you can simply use brace expansion: 由于文件名是固定的,因此您可以简单地使用大括号扩展名:
for wrapper in wrapper.log{.{9..1},}; do
echo "$wrapper"
# do whatever you want to do...
done
For your purpose, I guess, this could work too: 我想出于您的目的,这也可以工作:
$ cat wrapper.log{.{9..1},} | sed ...
A bit more generic version: 通用版本:
$ logfile="wrapper.log" # may contain spaces in filename
$ cat "$logfile"{.{9..1},} | sed ...
In this case your file names are so simple and you're doing so little with them, I'd be tempted to just use the ls output, assuming your files have the intuitive progressive modification times then all you'd need is: 在这种情况下,您的文件名是如此简单,而您却对它们做的很少,我很想只使用ls输出,假设您的文件具有直观的渐进修改时间,那么您所需要做的就是:
ls -rt gc.? gc | xargs awk -F' [|] ' '{print $NF}' > newfile
For example: 例如:
$ cat gc
INFO | jvm 1 | 2015/05/28 04:40:58 | 1164752.977: [GC pause (young), 0.06583700 secs]
INFO | jvm 1 | 2015/05/28 04:40:58 | [Parallel Time: 45.2 ms]
INFO | jvm 1 | 2015/05/28 04:40:58 | [GC Worker Start (ms): 1164752977.7 1164752977.7 1164752977.7 1164752977.9
INFO | jvm 1 | 2015/05/28 04:40:58 | Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff: 0.2]
$
$ cat gc.1
INFO | jvm 1 | 2015/05/28 04:40:58 | 1234567.977: [GC pause (young), 0.06583700 secs]
INFO | jvm 1 | 2015/05/28 04:40:58 | [Parallel Time: 45.2 ms]
INFO | jvm 1 | 2015/05/28 04:40:58 | [GC Worker Start (ms): 1164752977.7 1164752977.7 1164752977.7 1164752977.9
INFO | jvm 1 | 2015/05/28 04:40:58 | Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff: 0.2]
$
$ cat gc.2
INFO | jvm 1 | 2015/05/28 04:40:58 | 8889996.977: [GC pause (young), 0.06583700 secs]
INFO | jvm 1 | 2015/05/28 04:40:58 | [Parallel Time: 45.2 ms]
INFO | jvm 1 | 2015/05/28 04:40:58 | [GC Worker Start (ms): 1164752977.7 1164752977.7 1164752977.7 1164752977.9
INFO | jvm 1 | 2015/05/28 04:40:58 | Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff: 0.2]
$ ls -rt gc.? gc | xargs awk -F' [|] ' '{print $NF}'
8889996.977: [GC pause (young), 0.06583700 secs]
[Parallel Time: 45.2 ms]
[GC Worker Start (ms): 1164752977.7 1164752977.7 1164752977.7 1164752977.9
Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff: 0.2]
1234567.977: [GC pause (young), 0.06583700 secs]
[Parallel Time: 45.2 ms]
[GC Worker Start (ms): 1164752977.7 1164752977.7 1164752977.7 1164752977.9
Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff: 0.2]
1164752.977: [GC pause (young), 0.06583700 secs]
[Parallel Time: 45.2 ms]
[GC Worker Start (ms): 1164752977.7 1164752977.7 1164752977.7 1164752977.9
Avg: 1164752977.8, Min: 1164752977.7, Max: 1164752977.9, Diff: 0.2]
If you want to do it right (and have GNU find
and sort
), tell find
to write the filenames preceded by mtime and separated by NUL characters (the only character which can't exist in a file's fully qualified path on UNIX); 如果您想做对(并让GNU
find
和sort
),请告诉find
写以mtime开头并以NUL字符分隔的文件名(在UNIX上文件的完全限定路径中不能存在的唯一字符); use sort
to sort by mtime (rather than trying to mess with names); 使用
sort
按mtime排序(而不是弄乱名称); and then read both pieces of data in: 然后读取以下两个数据:
while IFS= read -r -d ' ' mtime && IFS= read -r -d '' filename; do
sed -e 's/^.\{7\}[|].\{10\}[|].\{21\}[|] //g' <"$filename"
done < <(find "$logDir" -maxdepth 1 -type f -printf '%T@ %P\0' | sort -nz)
This will process files in order from oldest to newest. 这将按从最早到最新的顺序处理文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.