繁体   English   中英

打印文件1与文件2的差异,而不从文件2中删除任何内容

[英]print differences of file1 to file2 without deleting anything from file2

我正在创建一个脚本,用于从a.csv日志文件中针对预定义的黑色IP列表搜索IP。

它首先导入日志文件,然后从中解析IP,然后针对预定义的黑色IP列表搜索解析的IP,最后需要询问用户(如果找到任何结果)将结果保存到导入的原始日志文件中。

文件1是代码中IP-output.csv的示例。

文件2是代码中$ filename的示例(原始导入的.csv)。

文件1:

107.147.166.60 ,SUSPICIOUS IP
107.147.167.26 ,SUSPICIOUS IP
108.48.185.186 ,SUSPICIOUS IP
108.51.114.130 ,SUSPICIOUS IP
142.255.102.68 ,SUSPICIOUS IP

档案2:

outlook.office365.com ,174.203.0.118 ,UserLoginFailed
outlook.office365.com ,107.147.166.60 ,UserLoginFailed
outlook.office365.com ,107.147.167.26 ,UserLoginFailed
outlook.office365.com ,174.205.17.24 ,UserLoginFailed
outlook.office365.com ,108.48.185.186 ,UserLoginFailed
outlook.office365.com ,174.226.15.21 ,UserLoginFailed
outlook.office365.com ,108.51.114.130 ,UserLoginFailed
outlook.office365.com ,67.180.23.93 ,UserLoginFailed
outlook.office365.com ,142.255.102.68 ,UserLoginFailed
outlook.office365.com ,164.106.75.235 ,UserLoginFailed

我想将文件2更改为:

outlook.office365.com ,174.203.0.118 ,UserLoginFailed
outlook.office365.com ,107.147.166.60 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,107.147.167.26 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.205.17.24 ,UserLoginFailed
outlook.office365.com ,108.48.185.186 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.226.15.21 ,UserLoginFailed
outlook.office365.com ,108.51.114.130 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,67.180.23.93 ,UserLoginFailed
outlook.office365.com ,142.255.102.68 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,164.106.75.235 ,UserLoginFailed

这是我创建的脚本:

#!/bin/bash
#
# IP Blacklist Checker
#Import .csv (File within working directory)
echo "Please import a .csv log file to parse/search the IP(s) and UserAgents: "
read filename
#Parsing IPs from .csv log file
echo "Parsing IP(s) from imported log file..."
grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' $filename | sort | uniq > IP-list.txt
echo 'Done'
awk 'END {print NR,"IP(s) Found in imported log file"}' IP-list.txt
echo 'IPs found in imported log file:'
cat IP-list.txt
#searches parsed ip's against blacked ip lists
echo 'Searching parsed IP(s) from pre-defined Blacked IP List Databases...'
fgrep -w -f "IP-list.txt" "IPlist.txt" > IP-output.txt
awk 'END {print NR,"IP(s) Found Blacked IP List Databases"}' IP-output.txt
echo 'Suspicious IPs found in Blacked IP List Databases:'
cat IP-output.txt
while true; do
read -p "Do you want to add results to log file?" yn
case $yn in
    [Yy]* ) grep -Ff IP-output.txt $filename | sed 's/$/ ,SUSPICIOUS IP/' > IP-output.csv && awk 'FNR==NR {m[$1]=$0; next} {for (i in m) {match($0,i); val=substr($0, RSTART, RLENGTH); if (val) {sub(val, m[i]); print; next}};} 1' IP-output.csv $filename > $filename; break;;
    [Nn]* ) break;;
    * ) echo "Please answer yes or no.";;
esac
done
echo "Finished searching parsed IP(s) from pre-defined Blacked IP List Databases."
rm IP-list.txt IP-output.csv IP-output.txt 

我要导入的日志文件真的很长,只有15到20列,并且IPlist.txt(涂黑的IP)中包含超过15000个IP。 将结果保存到相同的日志文件后,.csv文件将为空,如果我将其保存为其他名称,则所有列均乱序,并且IP列旁边会出现“ SUSPICIOUS IP”列,而是需要它位于最后一列(行的末尾)。

我还不知道如何仅在发现任何内容后才提示保存文件,如果不仅提示什么也没有提示!

我得到的结果:

 outlook.office365.com ,174.203.0.118 ,UserLoginFailed
 outlook.office365.com ,107.147.166.60 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,107.147.167.26 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,174.205.17.24 ,UserLoginFailed
 outlook.office365.com ,108.48.185.186 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,174.226.15.21 ,UserLoginFailed
 outlook.office365.com ,108.51.114.130 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,67.180.23.93 ,UserLoginFailed
 outlook.office365.com ,142.255.102.68 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,164.106.75.235 ,UserLoginFailed

您的意思是这样的:

awk 'FNR==NR { m[$1]=$0; next; } { for (i in m) { idx = index($0, i); if (idx > 0) { print substr($0, 1, idx-1) m[i]; next; } } } 1' file1.txt file2.txt > newfile2.txt

它基本上按顺序处理file1.txtfile2.txt 对于第一个文件中的所有行, FNR==NR为true,其中映射m用替换模式构建(第一个空间映射到整行之前的所有内容)。 对于第二个文件,将检查每行中m的匹配项。 如果存在匹配项(使用index() ),则脚本将在匹配项之前打印所有内容,然后打印m的值。 哦,最后1将打印file2中不匹配的行。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM