Bash脚本，每十分钟在方括号中计数一次ID

Question

Having this logfile 有此日志文件

20180917084726:-
20180917085418:[111783178, 111557953, 111646835, 111413356, 111412662, 105618372, 111413557]
20180917115418:[111413432, 111633904, 111783198, 111792767, 111557948, 111413225, 111413281]
20180917105419:[111413432, 111633904, 111783198, 111792767, 111557948, 111413225, 111413281]
20180917085522:[111344871, 111394583, 111295547, 111379566, 111352520]
20180917090022:[111344871, 111394583, 111295547, 111379566, 111352520]

The format of the input log is: 输入日志的格式为：

timestamp is in format YYYYMMDDhhmmss 时间戳格式为YYYYMMDDhhmmss

I would like to know how to write a script that outputs one line for each ten minute slice of the day the count of unique IDs that were returned 我想知道如何编写一个脚本，该脚本在一天的每十分钟内输出一行，返回的唯一ID数

The result is as this one: 结果是这样的：

20180917084:0
20180917085:12
20180917115:7
20180917105:7

Answer 1

awk: Uses colon or comma as the field separator. awk：使用冒号或逗号作为字段分隔符。

awk -F '[,:]' '
    {
        key = substr($1,1,11)"0"
        count[key] += ($2 == "-" ? 0 : NF-1)
    } 
    END {
        PROCINFO["sorted_in"] = "@ind_num_asc"
        for (key in count) print key, count[key]
    }
' file

201809170840 0
201809170850 12
201809170900 5
201809171050 7
201809171150 7

To filter on today's date, you could say: 要过滤今天的日期，您可以说：

gawk -F '[,:]' '
    BEGIN {today = strftimme("%Y%m%d", systime())}
    $0 ~ "^"today { key = ...

or 要么

awk -F '[,:]' -v "today=$(date "+%Y%m%d")" '
    $0 ~ "^"today { key = ...

or pipe the existing awk code to | grep "^$(date +%Y%m%d)" 或将现有的awk代码传递给| grep "^$(date +%Y%m%d)" | grep "^$(date +%Y%m%d)"

Answer 2

Could you please try following, it will be give you output in same order in which timestamp occurrence is happening in Input_file. 您可以尝试以下操作，它将以相同的顺序输出，在Input_file中发生时间戳。

awk '
{
  val=substr($0,1,11)
}
!a[val]++{
  b[++count]=val
}
match($0,/\[.*\]/){
  num=split(substr($0,RSTART,RLENGTH),array,",")
  c[val]+=num
}
END{
  for(i=1;i<=count;i++){
    print b[i],c[b[i]]+0
  }
}'   Input_file

Output will be as follows. 输出如下。

20180917084 0
20180917085 12
20180917115 7
20180917105 7
20180917090 5

EDIT: Adding a solution in case your any of the field is having NULL value so putting a check in above code too now. 编辑：添加解决方案，以防万一您的任何字段都具有NULL值，因此现在也要检查上面的代码。

awk '
{
  val=substr($0,1,11)
}
!a[val]++{
  b[++count]=val
}
match($0,/\[.*\]/){
  count1=""
  num=split(substr($0,RSTART,RLENGTH),array,",")
  for(j=1;j<=num;j++){
    if(array[j]){
      count1++
    }
  }
  c[val]+=count1
}
END{
  for(i=1;i<=count;i++){
    print b[i],c[b[i]]+0
  }
}'  Input_file

Answer 3

your input and output are not consistent but I guess you want something like this 您的输入和输出不一致，但我想您想要这样的东西

 $ awk -F: '{k=sprintf("%10d",$1/1000); n=gsub(",",",",$2); a[k]+=(n?n+1:n)} 
        END {for(k in a) print k":"a[k] | "sort" }' file 

20180917084:0
20180917085:12
20180917090:5
20180917105:7
20180917115:7

Answer 4

Perl to the rescue! Perl进行救援！

perl -ne '
    ($timestamp, @ids) = /([0-9]+)/g;
    substr $timestamp, -3, 3, "";
    @{ $seen{$timestamp} }{@ids} = ();
    END {
        for my $timestamp (sort keys %seen) {
            print "$timestamp:", scalar keys %{ $seen{$timestamp} }, "\n";
        }
    }' < file.log

-n reads the input line by line -n逐行读取输入
substr here replaces the last three characters of the timestamp with an empty string 这里的substr用空字符串替换时间戳的后三个字符
%seen is a hash of hashes, for each timestamp the inner hash records what ids were seen %seen是散列的散列，对于每个时间戳，内部散列都会记录看到的ID
keys in scalar context return the count of the keys, in this case the number of unique ids per timestamp. 标量上下文中的key返回键的计数，在这种情况下，每个时间戳的唯一ID数。

Bash脚本，每十分钟在方括号中计数一次ID

问题描述

4 个解决方案

解决方案1
1 已采纳 2018-09-18 19:26:40

解决方案2
1 2018-09-18 19:27:07

解决方案3
1 2018-09-18 20:28:26

解决方案4
0 2018-09-18 19:19:13

Bash脚本，每十分钟在方括号中计数一次ID

问题描述

4 个解决方案

解决方案1 1 已采纳 2018-09-18 19:26:40

解决方案2 1 2018-09-18 19:27:07

解决方案3 1 2018-09-18 20:28:26

解决方案4 0 2018-09-18 19:19:13

解决方案1
1 已采纳 2018-09-18 19:26:40

解决方案2
1 2018-09-18 19:27:07

解决方案3
1 2018-09-18 20:28:26

解决方案4
0 2018-09-18 19:19:13