[英]print differences of file1 to file2 without deleting anything from file2

I'm creating a script that searches IPs from a.csv log file against pre-defined blacked IP lists. 我正在创建一个脚本,用于从a.csv日志文件中针对预定义的黑色IP列表搜索IP。

It first imports the log file, then parses IPs from it, then searches the parsed IPs against pre-defined blacked IP list, finally it needs to ask user (if any result found) to save the results to the original log file that was imported. 它首先导入日志文件,然后从中解析IP,然后针对预定义的黑色IP列表搜索解析的IP,最后需要询问用户(如果找到任何结果)将结果保存到导入的原始日志文件中。

File 1 is a example of IP-output.csv in the code. 文件1是代码中IP-output.csv的示例。

File 2 is a example of $filename in the code (original imported .csv). 文件2是代码中$ filename的示例(原始导入的.csv)。


File 2: 档案2:

outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed

I wanna change File 2 to this: 我想将文件2更改为:

outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com , ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com , ,UserLoginFailed
outlook.office365.com , ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com , ,UserLoginFailed

This is the script I created: 这是我创建的脚本:

# IP Blacklist Checker
#Import .csv (File within working directory)
echo "Please import a .csv log file to parse/search the IP(s) and UserAgents: "
read filename
#Parsing IPs from .csv log file
echo "Parsing IP(s) from imported log file..."
grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' $filename | sort | uniq > IP-list.txt
echo 'Done'
awk 'END {print NR,"IP(s) Found in imported log file"}' IP-list.txt
echo 'IPs found in imported log file:'
cat IP-list.txt
#searches parsed ip's against blacked ip lists
echo 'Searching parsed IP(s) from pre-defined Blacked IP List Databases...'
fgrep -w -f "IP-list.txt" "IPlist.txt" > IP-output.txt
awk 'END {print NR,"IP(s) Found Blacked IP List Databases"}' IP-output.txt
echo 'Suspicious IPs found in Blacked IP List Databases:'
cat IP-output.txt
while true; do
read -p "Do you want to add results to log file?" yn
case $yn in
    [Yy]* ) grep -Ff IP-output.txt $filename | sed 's/$/ ,SUSPICIOUS IP/' > IP-output.csv && awk 'FNR==NR {m[$1]=$0; next} {for (i in m) {match($0,i); val=substr($0, RSTART, RLENGTH); if (val) {sub(val, m[i]); print; next}};} 1' IP-output.csv $filename > $filename; break;;
    [Nn]* ) break;;
    * ) echo "Please answer yes or no.";;
echo "Finished searching parsed IP(s) from pre-defined Blacked IP List Databases."
rm IP-list.txt IP-output.csv IP-output.txt 

The log file I'm importing is really long with 15-20 columns, and the IPlist.txt (blacked IPs) has over 15000 IPs in it. 我要导入的日志文件真的很长,只有15到20列,并且IPlist.txt(涂黑的IP)中包含超过15000个IP。 After saving the results to the same log file, .csv file gets empty, and if I save it under a different name, all the columns go out of order, and the ", SUSPICIOUS IP" column appears next to the IP column, I need it instead to be at the last column (end of the line). 将结果保存到相同的日志文件后,.csv文件将为空,如果我将其保存为其他名称,则所有列均乱序,并且IP列旁边会出现“ SUSPICIOUS IP”列,而是需要它位于最后一列(行的末尾)。

I also don't know how to prompt to save for a file only if anything was found, if not only prompt nothing found! 我还不知道如何仅在发现任何内容后才提示保存文件,如果不仅提示什么也没有提示!

The results i'm getting: 我得到的结果:

 outlook.office365.com , ,UserLoginFailed
 outlook.office365.com , ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com , ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com , ,UserLoginFailed
 outlook.office365.com , ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com , ,UserLoginFailed
 outlook.office365.com , ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com , ,UserLoginFailed
 outlook.office365.com , ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com , ,UserLoginFailed

You mean something like this: 您的意思是这样的:

awk 'FNR==NR { m[$1]=$0; next; } { for (i in m) { idx = index($0, i); if (idx > 0) { print substr($0, 1, idx-1) m[i]; next; } } } 1' file1.txt file2.txt > newfile2.txt

It basically processes file1.txt and file2.txt sequentially. 它基本上按顺序处理file1.txtfile2.txt FNR==NR is true for all lines from the first file, where a map m is build up with replacement patterns (everything before the first space is mapped to the entire line). 对于第一个文件中的所有行, FNR==NR为true,其中映射m用替换模式构建(第一个空间映射到整行之前的所有内容)。 For the second file, each line will be checked for a match in m . 对于第二个文件,将检查每行中m的匹配项。 If there's a match (using index() ), the script prints everything before the match and then the value from m . 如果存在匹配项(使用index() ),则脚本将在匹配项之前打印所有内容,然后打印m的值。 Oh, and the final 1 will print non-matching lines from file2. 哦,最后1将打印file2中不匹配的行。

