简体   繁体   English

使用bash和awk将日志文件分组

[英]Sorting log file into groups with bash and awk

I'm trying to sort a log file in a specific way, but I'm not sure how to perform the last step. 我正在尝试以特定方式对日志文件进行排序,但是我不确定如何执行最后一步。

My logfile has entries like this: 我的日志文件具有以下条目:

Feb 15 17:00:34 server sshd[13879]: Invalid user test from 200.242.94.133
Feb 15 17:00:35 server sshd[13780]: Invalid user ftpuser from 200.242.94.133
Feb 15 17:01:34 server sshd[13890]: Invalid user test from 200.242.94.133
Feb 15 17:01:35 server sshd[13791]: Invalid user vnc from 200.242.94.133
Feb 15 17:01:35 server sshd[13794]: Invalid user test from 50.63.172.108
Feb 15 17:01:36 server sshd[13798]: Invalid user vnc from 50.63.172.108

I use the command: 我使用以下命令:

cat logfile | grep "Invalid user" | awk '{print $8 ", " $10 }' | sort -t":" -k2,2 | uniq -c

Which outputs: 哪个输出:

 1 ftpuser, 200.242.94.133
 2 test, 200.242.94.133
 1 test, 50.63.172.108 
 1 vnc, 200.242.94.133
 1 vnc, 50.63.172.108

I'd like to get: 我想得到:

1 ftpuser, (1) 200.242.94.133
3 test, (2) 200.242.94.133, (1) 50.63.172.108
2 vnc, (1) 200.242.94.133, (1) 50.63.172.108

I'm not sure how to sum the words column while keeping the ip address counted separate and then including it with other results. 我不确定如何对单词列求和,同时保持对IP地址的计数,然后将其与其他结果一起包括在内。

Attempt with answer: 尝试回答:

# awk '/Invalid user/{user[$8]++;ip[$8][$10]++} END{for (u in user){printf "%s %s",user[u],u;for (i in ip[u])printf ", (%s) %s",ip[u][i],i;print""}}' logfile | sort -k2
awk: /Invalid user/{user[$8]++;ip[$8][$10]++} END{for (u in user){printf "%s %s",user[u],u;for (i in ip[u])printf ", (%s) %s",ip[u][i],i;print""}}
awk:                                 ^ syntax error
awk: /Invalid user/{user[$8]++;ip[$8][$10]++} END{for (u in user){printf "%s %s",user[u],u;for (i in ip[u])printf ", (%s) %s",ip[u][i],i;print""}}
awk:                                                                                                   ^ syntax error
awk: /Invalid user/{user[$8]++;ip[$8][$10]++} END{for (u in user){printf "%s %s",user[u],u;for (i in ip[u])printf ", (%s) %s",ip[u][i],i;print""}}
awk:                                                                                                                               ^ syntax error
$ awk '/Invalid user/{user[$8]++;ip[$8][$10]++} END{for (u in user){printf "%s %s",user[u],u;for (i in ip[u])printf ", (%s) %s",ip[u][i],i;print""}}' logfile
2 vnc, (1) 50.63.172.108, (1) 200.242.94.133
1 ftpuser, (1) 200.242.94.133
3 test, (1) 50.63.172.108, (2) 200.242.94.133

If you want it sorted alphabetically by user: 如果要按用户字母顺序排序:

$ awk '/Invalid user/{user[$8]++;ip[$8][$10]++} END{for (u in user){printf "%s %s",user[u],u;for (i in ip[u])printf ", (%s) %s",ip[u][i],i;print""}}' logfile | sort -k2
1 ftpuser, (1) 200.242.94.133
3 test, (1) 50.63.172.108, (2) 200.242.94.133
2 vnc, (1) 50.63.172.108, (1) 200.242.94.133

The above works with GNU awk . 以上适用于GNU awk I haven't tested with BSD. 我尚未使用BSD进行测试。

How it works 这个怎么运作

  • /Invalid user/{user[$8]++;ip[$8][$10]++}

    For any line in logfile with an invalid user, this counts up the user name, field 8, and the ip address, field 10. 对于logfile具有无效用户的任何行,这将计算用户名(字段8)和ip地址(字段10)。

  • END{for (u in user){printf "%s %s",user[u],u;for (i in ip[u])printf ", (%s) %s",ip[u][i],i;print""}}

    When we have finished reading logfile , this loops through every user that we have seen and prints the number of times that we have seen that user, followed by that user's name, followed by, for each ip address, a count for that ip followed by that ip. 当我们完成读取logfile ,这将循环遍历我们所见过的每个用户,并打印我们所见到的该用户的次数,其后的用户名,每个IP地址的计数,该IP的计数,然后是该IP。

John1024's answer is a very concise and presumably fast solution that is an option, IF: John1024的答案是一个非常简洁且快速的解决方案,它是一种选择,如果:

  • you're using GNU awk (the solution uses non-POSIX features that won't work with BSD awk (also used on OS X) or mawk, for instance). 您正在使用GNU awk(例如,该解决方案使用的非POSIX功能将不适用于BSD awk(也用于OS X)或mawk)。
  • you don't mind a seemingly random order of IP addresses (due to unsorted key enumeration of an associative array; however, in GNU awk 4.0+, you can use PROCINFO["sorted_in"] to control the enumeration order ). 您不必在意IP地址的随机顺序(由于关联数组的未排序键枚举;但是,在GNU awk 4.0+中,可以使用PROCINFO["sorted_in"]来控制枚举顺序 )。

Here is a much more pedestrian solution, which, however: 这里是一个行人解决方案,但是:

  • uses only POSIX awk features. 仅使用POSIX awk功能。
  • lists IP addresses in the order encountered in the input. 以输入中遇到的顺序列出IP地址。

It builds on a slightly simplified version of the OP's command. 它基于OP命令的稍微简化的版本。

awk '/Invalid user/ { print $8 ", " $10 }' logfile | sort -t":" -k2,2 | uniq -c |
awk '
    # Helper output function for printing an output line.
  function printLine(c, n, l) { 
    sub(/,$/, "", n); print c, n l
  }
    # End of previous block found (new username)?
  prevName != $2 {
      # Print summary line for previous block.
    if (NR>1) printLine(count, prevName, ipList)
      # Start new block.
    prevName=$2; count=0; ipList=""
  }
    # Build block summary values.
  { 
    count+=$1
    ipList=ipList ", (" $1 ") " $3
  }
    # Print summary line for last block.
  END { 
    printLine(count, prevName, ipList)
  }
  '

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM