使用正则表达式解析 Bash 脚本中的 ClamAV 日志以插入 MySQL

Question

Morning/Evening all,早/晚，

I've got a problem where I'm making a script for work that uses ClamAV to scan for malware, and then place it's results in MySQL by taking the resultant ClamAV logs using grep with awk to convert the right parts of the log to a variable.我遇到了一个问题，我正在为使用 ClamAV 扫描恶意软件的工作编写脚本，然后通过使用 grep 和 Z5E4C8DFA9E20567E23B255E 将生成的 ClamAV 日志获取到 MySQL 中多变的。 The problem I have is that whilst I have done the summary ok, the syntax of detections makes it slightly more difficult.我遇到的问题是，虽然我已经完成了总结，但检测的语法使它变得稍微困难一些。 I'm no expert at regex by all means and this is a bit of a learning experience, so there is probably a far better way of doing it than I have!我绝对不是正则表达式的专家，这是一种学习经验，所以可能有比我更好的方法！

The lines I'm trying to parse looks like these:我试图解析的行如下所示：

/net/nas/vol0/home/recep/SG4rt.exe: Worm.SomeFool.P FOUND
/net/nas/vol0/home/recep/SG4rt.exe: moved to '/srv/clamav/quarantine/SG4rt.exe'

As far as I was able to establish, I need a positive lookbehind to match what happens after and before the colon, without actually matching the colon or the space after it, and I can't see a clear way of doing it from RegExr without it thinking I'm trying to look for two colons.据我所知，我需要一个积极的后视来匹配冒号之后和之前发生的事情，而不是实际匹配冒号或它之后的空格，而且我看不到 RegExr 的明确方法它认为我正在尝试寻找两个冒号。 To make matters worse, we sometimes get these too...更糟糕的是，我们有时也会得到这些......

WARNING: Can't open file /net/nas/vol0/home/laser/samples/sample1.avi: Permission denied

The end result is that I can build a MySQL query that inserts the path, malware found and where it was moved to or if there was an error then the path, then the error encountered so as to convert each element to a variable contents in a while statement.最终结果是我可以构建一个 MySQL 查询，该查询插入路径、找到的恶意软件以及移动到的位置，或者如果有错误然后是路径，然后遇到错误，以便将每个元素转换为 a 中的变量内容while 语句。

I've done the scan summary as follows:我已经完成扫描摘要如下：

Summary looks like:摘要看起来像：

----------- SCAN SUMMARY -----------
Known viruses: 329
Engine version: 0.97.1
Scanned directories: 17350
Scanned files: 50342
Infected files: 3
Total errors: 1
Data scanned: 15551.73 MB
Data read: 16382.67 MB (ratio 0.95:1)
Time: 3765.236 sec (62 m 45 s)

Parsing like this:解析如下：

SCANNED_DIRS=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned directories" | awk '{gsub("Scanned directories: ", "");print}')
SCANNED_FILES=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned files" | awk '{gsub("Scanned files: ", "");print}')
INFECTED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Infected files" | awk '{gsub("Infected files: ", "");print}')
DATA_SCANNED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data scanned" | awk '{gsub("Data scanned: ", "");print}')
DATA_READ=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data read" | awk '{gsub("Data read: ", "");print}')
TIME_TAKEN=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Time" | awk '{gsub("Time: ", "");print}')
END_TIME=$(date +%s)
mysql -u scanner_parser --password=removed sc_live -e "INSERT INTO bs.live.bs_jobstat VALUES (NULL, '$CURRTIME', '$PID', '$IY', '$SCANNED_DIRS', '$SCANNED_FILES', '$INFECTED', '$DATA_SCANNED', '$DATA_READ', '$TIME_TAKEN', '$END_TIME');"
rm -f /srv/clamav/$IY-scan-$LOGTIME.log

Some of those variables are from other parts of the script and can be ignored.其中一些变量来自脚本的其他部分，可以忽略。 The reason I'm doing this is to save logfile clutter and have a simple web based overview of the status of the system.我这样做的原因是为了保存日志文件混乱并有一个简单的基于 web 的系统状态概述。

Any clues?有什么线索吗？ Am I going about all this the wrong way?我是不是走错了路？ Thanks for help in advance, I do appreciate it!提前感谢您的帮助，我非常感谢！

Answer 1

From what I can determine from the question, it seems like you are asking how to distinguish the lines you want from the logger lines that start with WARNING, ERROR, INFO.从我可以从问题中确定的内容来看，您似乎在问如何区分您想要的行与以 WARNING、ERROR、INFO 开头的记录器行。

You can do this without getting to fancy with lookahead or lookbehind.您可以做到这一点，而无需花心思使用前瞻或后瞻。 Just grep for lines beginning with以 grep开头的行

"/net/nas/vol0/home/recep/SG4rt.exe: "

then using awk you can extract the remainder of the line.然后使用 awk 您可以提取该行的其余部分。 Or you can gsub the prefix out like you are doing in the summary processing section.或者您可以像在摘要处理部分中所做的那样gsub前缀。

As far as the question about processing the summary goes, what strikes me most is that you are processing the entire file multiple times, each time pulling out one kind of line.至于处理摘要的问题，最让我印象深刻的是您正在多次处理整个文件，每次都提取一种行。 For tasks like this, I would use Perl, Ruby, or Python and make one pass through the file, collecting the pieces of each line after the colon, storing them in regular programming language variables (not env variables), and forming the MySQL insert string using interpolation. For tasks like this, I would use Perl, Ruby, or Python and make one pass through the file, collecting the pieces of each line after the colon, storing them in regular programming language variables (not env variables), and forming the MySQL insert使用插值的字符串。

Bash is great for some things but IMHO you are justified in using a more general scripting language (Perl, Python, Ruby come to mind). Bash 非常适合某些事情，但恕我直言，您有理由使用更通用的脚本语言（想到 Perl、Python、Ruby）。

使用正则表达式解析 Bash 脚本中的 ClamAV 日志以插入 MySQL

问题描述

1 个解决方案

解决方案1
1 2011-07-12 09:31:02

使用正则表达式解析 Bash 脚本中的 ClamAV 日志以插入 MySQL

问题描述

1 个解决方案

解决方案1 1 2011-07-12 09:31:02

解决方案1
1 2011-07-12 09:31:02