简体   繁体   English

如何使用 Grep 命令在文本文件中查找特定值

[英]How to use Grep commands to find specific value in text file

I need to grep a file called daily_fails_count.csv but only find the number of failures.我需要 grep 一个名为 daily_fails_count.csv 的文件,但只能找到失败的次数。 Inside that file is this, on a shorter scale:在那个文件里面是这个,在一个较短的范围内:

January,1,0,0
January,1,1,0
January,1,2,0
January,1,3,0
January,1,4,0
January,1,5,0
January,1,6,0
January,1,7,0
January,1,8,0

It's format is "month,day,hours,failures."它的格式是“月、日、小时、故障”。 It goes through all months.它贯穿所有月份。 The last value is the number of failures found at that time.最后一个值是当时发现的故障数。 I know here it all says 0 but that's because no failures were found there, other dates have failures.我知道这里都说 0 但那是因为那里没有发现故障,其他日期有故障。

I'm not very good with grep commands in Linux scripts, so my question is this, how do I grep to find just the last digit in the file?我不太擅长 Linux 脚本中的 grep 命令,所以我的问题是,我如何 grep 才能找到文件中的最后一个数字?

I'm writing this script in a file called make_accum_fail_counts.sh and I will run it as such:我将这个脚本写在一个名为 make_accum_fail_counts.sh 的文件中,我会这样运行它:

bash make_accum_fail_counts.sh daily_fail_counts.csv > accum_fail_counts.csv

So I'm using the daily_fail_counts.csv as the input for the new script.所以我使用 daily_fail_counts.csv 作为新脚本的输入。 Here's my script so far:到目前为止,这是我的脚本:

#!/bin/bash

if [ $# == 1 ]
then
    logFile=$1
fi

cat $logFile > tmpFile

hour=0
failure=0

while [ $hour -le 23 ]
do
    if [ $hour -le 23 ]
    then
        failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`
    fi
    echo "$hour,$failure"
    hour=$((hour+1))
    failure=0
done
rm -rf tmpFile

I just need help with my grep command:我只需要 grep 命令的帮助:

failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`

Just to find, among all the days, the failures from hour to hour.只是为了发现,在所有的日子里,一个小时又一个小时的失败。 so it's output would be:所以它的 output 将是:

0,1000
1,1040
2,2888

Where there were 1000 failures between 0:00-1:00, 1040 failures between 1:00-2:00 and so on.在 0:00-1:00 之间有 1000 次失败,在 1:00-2:00 之间有 1040 次失败,依此类推。 Thanks in advance.提前致谢。

If I understood your question correctly, could you please try following.如果我正确理解了您的问题,请您尝试以下操作。 This will provide total of failures(last field/4th field) count as per hours values and irrespective of month.这将提供按小时值计算的故障总数(最后一个字段/第 4 个字段),而与月份无关。

awk '
BEGIN{
  FS=OFS=","
}
!b[$3]++{
  c[++count]=$3
}
{
  a[$3]+=$4
}
END{
  for(i=1;i<=count;i++){
    print c[i],a[c[i]]
  }
}
'  Input_file

1 more thing, this approach will provide output in same order in which $3 is coming in Input_file.还有一件事,这种方法将提供 output,其顺序与$3进入 Input_file 的顺序相同。

Explanation: Adding explanation for above code here.说明:在此处添加对上述代码的说明。

awk '                          ##Starting awk program here.
BEGIN{                         ##Starting BEGIN section from here.
  FS=OFS=","                   ##Setting FS and OFS as comma here.
}                              ##Closing BLOCK for BEGIN section here.
!b[$3]++{                      ##Checking condition if $3 is NOT present in array b then do following + it is placing $3 in array b.
  c[++count]=$3                ##Creating an array named c whose index is variable count and value is $3, variable count value is keep increasing with 1.
}                              ##Closing BLOCK for array b condition here.
{
  a[$3]+=$4                    ##Creating an array named a with index $3 and value is $4 and its keep adding its value to its own same index value.
}
END{                           ##Starting END section of this program here.
  for(i=1;i<=count;i++){       ##Starting for loop from i=1 to till value of count variable here.
    print c[i],a[c[i]]         ##Printing array c value index variable i and printing array a value whose index is array c with index variable i.
  }                            ##Closing BLOCK for, for loop here.
}                              ##Closing BLOCK for END section of this program here.
'  Input_file                  ##Mentioning Input_file name here.
cat yourfile.csv | cut -d',' -f 4 | paste -s -d+ - | bc

To sum all the failures.总结所有的失败。 Use cut -d',' -f 4 yourfile.csv to split each line on the commas and get the 4th value, that'll give you a list of numbers, then use a shell command to sum a list of numbers .使用cut -d',' -f 4 yourfile.csv以逗号分隔每一行并获得第 4 个值,这将为您提供数字列表,然后使用 shell 命令对数字列表求和

You can grep to filter it down to the hour, something like您可以 grep 将其过滤到小时,例如

cat yourfile.csv | cut -d',' -f 3,4 | grep ^0, | cut -d',' -f 2

To get all the 0th hour failure counts.获取所有第 0 小时的失败计数。

for hour in {0..23}; do
    cat yourfile.csv | cut -d',' -f 3,4 | grep ^$hour, | cut -d',' -f 2 | paste -s -d+ - | bc
done

To get the totals for each hour.获取每个小时的总数。

If you want them grouped by day you can read about the date command, figure out how to get it to output strings like January,1, and and add an outer for loop to the above command that passes each line through a grep with the output of that date command.如果您希望将它们按天分组,您可以阅读有关date命令的信息,了解如何将其获取到 output 字符串,例如January,1,并在上述命令中添加一个外部for循环,该循环将每一行通过grep与 Z78E6221F6398F14CE6D31该date命令。

Personally, at this point I would start writing Python instead of bash.就个人而言,此时我将开始编写 Python 而不是 bash。 The pandas library is better suited for this. pandas库更适合于此。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM