简体   繁体   English

使用Shell脚本从特定的日志文件创建CSV文件

[英]Create a CSV file from a specific log file using shell script

I am trying to convert a specific log file into CSV file using sed, awk, paste commands in Linux to be able to plot it using gnuplot or MS Excel. 我试图在Linux中使用sed,awk,paste命令将特定的日志文件转换为CSV文件,以便能够使用gnuplot或MS Excel对其进行绘制。 However, I am not able to do it in the way I want. 但是,我无法按照自己想要的方式进行操作。 Here is the sample log file: 这是示例日志文件:

Feb 15 13:57:08 Program1: The pool size: 100 [High: 80 Norm: 20 Low: 0]
Feb 15 13:58:53 Program1: The pool size: 100 [High: 0 Norm: 100 Low: 0]
Feb 15 13:58:54 Program3: The pool size: 200 [High: 0 Norm: 200 Low: 0]
Feb 15 13:58:56 Program4: The pool size: 100 [High: 0 Norm: 100 Low: 0]
Feb 15 13:58:58 Program1: The pool size: 200 [High: 0 Norm: 200 Low: 0]
Feb 15 13:58:59 Program5: The pool size: 300 [High: 100 Norm: 200 Low: 0]
Feb 15 13:59:05 Program1: The pool size: 100 [High: 0 Norm: 100 Low: 0]
Feb 15 14:00:11 Program2: The pool size: 100 [High: 0 Norm: 100 Low: 0]
Feb 15 14:00:12 Program2: The pool size: 100 [High: 0 Norm: 100 Low: 0]
Feb 15 14:00:13 Program1: The pool size: 200 [High: 0 Norm: 200 Low: 0]
Feb 15 14:00:16 Program4: The pool size: 100 [High: 0 Norm: 100 Low: 0]
Feb 15 14:00:17 Program2: The pool size: 100 [High: 50 Norm: 50 Low: 0]
Feb 15 14:02:28 Program5: The pool size: 100 [High: 0 Norm: 100 Low: 0]
Feb 15 14:02:31 Program1: The pool size: 100 [High: 0 Norm: 100 Low: 0]
Feb 15 14:11:01 Program1: The pool size: 100 [High: 0 Norm: 100 Low: 0]

I am trying to convert the above data into a CSV file such that I would have the data at specific point of time. 我正在尝试将上述数据转换为CSV文件,以便在特定时间获取数据。 The output CSV I expect should be in the following format: 我期望的输出CSV应该采用以下格式:

TimeStamp,Program1_Total,Program1_High,Program1_Norm,Program1_Low,Program2_Total,Program2_High,Program2_Norm,Program2_Low,Program3_Total,Program3_High,Program3_Norm,Program3_Low,Program4_Total,Program4_High,Program4_Norm,Program4_Low

Feb 15 13:57:08,100,80,20,0,0,0,0,0,0,0,0,0,0,0,0,0
Feb 15 13:58:53,100,0,100,0,0,0,0,0,0,0,0,0,0,0,0,0
...
...

What did I try? 我尝试了什么?

I tried grepping for specific program and create separate smaller files specific to that program in the following way: 我尝试对特定程序进行grepping,并通过以下方式创建特定于该程序的单独的较小文件:

grep "Program1" sample.log > Program1.log
grep "Program2" sample.log > Program2.log

I tried using paste command to join them. 我尝试使用粘贴命令来加入他们。 However, what I am not able to figure out is how to handle these timestamps in a better way. 但是,我无法弄清楚如何更好地处理这些时间戳。

Any help will be highly appreciated. 任何帮助将不胜感激。 Thanks in advance. 提前致谢。

Use cut by using space as divider, then preserve only the fields you need. 通过将空格用作分隔符来使用cut,然后仅保留所需的字段。 Once done, use sed to replace spaces with commas. 完成后,使用sed替换逗号。

cut -d ' ' -f 1,2,3,8,10,12,14 && sed 's/ /,/g'

By using into a while .. read loop you can iterate it in each line. 通过使用一会儿.. read循环,您可以在每一行中对其进行迭代。

I think i found a 1 liner solution for your task which only uses the shell and awk , but be advised, it's not pretty at all and you need to add the header to your output file beforehand: 我认为我为您的任务找到了一种1线性解决方案,该解决方案仅使用shell和awk ,但是请注意,它根本不漂亮,您需要事先将标头添加到输出文件中:

echo "TimeStamp,P1_Total,P1_High,P1_Norm,P1_Low,P2_Total,P2_High,P2_Norm,P2_Low,P3_Total,P3_High,P3_Norm,P3_Low,P4_Total,P4_High,P4_Norm,P4_Low,P5_Total,P5_High,P5_Norm,P5_Low" >> final_output.txt

for i in `seq 1 5` 
do 
l=$((i-1))
r=$((5-i))
awk -v left_padd=${l} -v right_padd=${r} -v nb=${i} '{gsub(/]/, "", $14)} {if ($4 ~ "Program" nb) {printf $1" "$2" "$3", "; for(a=0;a<left_padd;a++) printf "0,\t 0,\t 0,\t 0,\t "; printf $8",\t "$10",\t "$12",\t "$14",\t "; for(b=0;b<right_padd;b++) printf "0,\t 0,\t 0,\t 0,\t "; print "\n"} }' sample.log
done >> final_output.txt

*** Please, note you must change the 5 in seq 1 5 to the number of Program# entries you wish to have in your output file, I used 5 as that was in your example. ***请注意,您必须将seq 1 55更改为您希望在输出文件中拥有的Program#条目的数量,我在示例中使用了5 Also, you need to change the 5 in r=$((5-i)) to the same value as well. 此外,您还需要将改变5r=$((5-i))相同的值也是如此。

Explanation: 说明:

  • The for loop passes the file every time to search for a Program# entry with awk . for循环每次都会通过文件搜索awkProgram#条目。
  • The l variable counts how many 0 values it should add at the left of your table. l变量计算应在表左侧添加多少个0值。
  • The r variable does the same as the l value only it adds 0 values to the right. r变量与l值相同,只不过它在右边增加了0个值。
  • The nb variable stores the Program # so the awk part knows which lines it should look for in the input file. nb变量存储Program #因此awk部分知道它应在输入文件中查找哪些行。
  • The awk merely prints out the values you asked for in the input file for each Program# entry as well as the preceding and trailing 0 values(4 0 s for each Program# ) for the other entries in the table. awk仅打印出您在输入文件中为每个Program#条目所要求的值,以及表中其他条目的前0值和后0值(每个Program#为4 0 s)。

Edit: 编辑:

I used \\t to delimit the values in awk so it's easier to read, but you may remove that so you only have comma separated values. 我使用\\t来分隔awk的值,因此更易于阅读,但是您可以删除它,以便仅使用逗号分隔值。 I also changed the header convention from your answer from Program#_Total to P#_Total for the same reason. 出于相同的原因,我还将标头约定从您的答案从Program#_Total更改为P#_Total

*I do realize this is not optimal at all, as the file gets parsed multiple time for each Program# entry, and you also need to add the header yourself in the output file, yet it's the best I could come up with. *我确实意识到这根本不是最佳选择,因为每个Program#条目都会多次解析该文件,并且您还需要自己在输出文件中添加标头,但这是我能想到的最好的方法。

If Perl is in the options, how about: 如果Perl在选项中,如何:

#!/bin/bash

perl -e '
while (<>) {
    if (/^(.{15}) Program(\d+): The pool size: (\d+) \[High: (\d+) Norm: (\d+) Low: (\d+)\]$/) {
        $timestamp = $1;
        $program = $2;
        $size = $3;
        $high = $4;
        $norm = $5;
        $low = $6;
        if (! defined $array{$timestamp}) {
            # it takes care of duplicate timestamps
            push(@timestamps, $timestamp);
        }
        $i = ($program - 1) * 4;
        @{$array{$timestamp}}[$i .. $i + 3] = ($size, $high, $norm, $low);
    }
}
foreach (@timestamps) {
    print "$_,", join(",", map {$_ + 0} @{$array{$_}}[0 .. 15]), "\n";
}' logfile

BTW it looks like Program5 is excluded in your desired result. 顺便说一句,似乎Program5已排除在您想要的结果中。 If you want to include it, just modify the number 15 in the 2nd last line into 19. 如果要包括它,只需将第二行中的数字15修改为19。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM