简体   繁体   English

打印文件1与文件2的差异,而不从文件2中删除任何内容

[英]print differences of file1 to file2 without deleting anything from file2

I'm creating a script that searches IPs from a.csv log file against pre-defined blacked IP lists. 我正在创建一个脚本,用于从a.csv日志文件中针对预定义的黑色IP列表搜索IP。

It first imports the log file, then parses IPs from it, then searches the parsed IPs against pre-defined blacked IP list, finally it needs to ask user (if any result found) to save the results to the original log file that was imported. 它首先导入日志文件,然后从中解析IP,然后针对预定义的黑色IP列表搜索解析的IP,最后需要询问用户(如果找到任何结果)将结果保存到导入的原始日志文件中。

File 1 is a example of IP-output.csv in the code. 文件1是代码中IP-output.csv的示例。

File 2 is a example of $filename in the code (original imported .csv). 文件2是代码中$ filename的示例(原始导入的.csv)。

File 1: 文件1:

107.147.166.60 ,SUSPICIOUS IP
107.147.167.26 ,SUSPICIOUS IP
108.48.185.186 ,SUSPICIOUS IP
108.51.114.130 ,SUSPICIOUS IP
142.255.102.68 ,SUSPICIOUS IP

File 2: 档案2:

outlook.office365.com ,174.203.0.118 ,UserLoginFailed
outlook.office365.com ,107.147.166.60 ,UserLoginFailed
outlook.office365.com ,107.147.167.26 ,UserLoginFailed
outlook.office365.com ,174.205.17.24 ,UserLoginFailed
outlook.office365.com ,108.48.185.186 ,UserLoginFailed
outlook.office365.com ,174.226.15.21 ,UserLoginFailed
outlook.office365.com ,108.51.114.130 ,UserLoginFailed
outlook.office365.com ,67.180.23.93 ,UserLoginFailed
outlook.office365.com ,142.255.102.68 ,UserLoginFailed
outlook.office365.com ,164.106.75.235 ,UserLoginFailed

I wanna change File 2 to this: 我想将文件2更改为:

outlook.office365.com ,174.203.0.118 ,UserLoginFailed
outlook.office365.com ,107.147.166.60 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,107.147.167.26 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.205.17.24 ,UserLoginFailed
outlook.office365.com ,108.48.185.186 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.226.15.21 ,UserLoginFailed
outlook.office365.com ,108.51.114.130 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,67.180.23.93 ,UserLoginFailed
outlook.office365.com ,142.255.102.68 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,164.106.75.235 ,UserLoginFailed

This is the script I created: 这是我创建的脚本:

#!/bin/bash
#
# IP Blacklist Checker
#Import .csv (File within working directory)
echo "Please import a .csv log file to parse/search the IP(s) and UserAgents: "
read filename
#Parsing IPs from .csv log file
echo "Parsing IP(s) from imported log file..."
grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' $filename | sort | uniq > IP-list.txt
echo 'Done'
awk 'END {print NR,"IP(s) Found in imported log file"}' IP-list.txt
echo 'IPs found in imported log file:'
cat IP-list.txt
#searches parsed ip's against blacked ip lists
echo 'Searching parsed IP(s) from pre-defined Blacked IP List Databases...'
fgrep -w -f "IP-list.txt" "IPlist.txt" > IP-output.txt
awk 'END {print NR,"IP(s) Found Blacked IP List Databases"}' IP-output.txt
echo 'Suspicious IPs found in Blacked IP List Databases:'
cat IP-output.txt
while true; do
read -p "Do you want to add results to log file?" yn
case $yn in
    [Yy]* ) grep -Ff IP-output.txt $filename | sed 's/$/ ,SUSPICIOUS IP/' > IP-output.csv && awk 'FNR==NR {m[$1]=$0; next} {for (i in m) {match($0,i); val=substr($0, RSTART, RLENGTH); if (val) {sub(val, m[i]); print; next}};} 1' IP-output.csv $filename > $filename; break;;
    [Nn]* ) break;;
    * ) echo "Please answer yes or no.";;
esac
done
echo "Finished searching parsed IP(s) from pre-defined Blacked IP List Databases."
rm IP-list.txt IP-output.csv IP-output.txt 

The log file I'm importing is really long with 15-20 columns, and the IPlist.txt (blacked IPs) has over 15000 IPs in it. 我要导入的日志文件真的很长,只有15到20列,并且IPlist.txt(涂黑的IP)中包含超过15000个IP。 After saving the results to the same log file, .csv file gets empty, and if I save it under a different name, all the columns go out of order, and the ", SUSPICIOUS IP" column appears next to the IP column, I need it instead to be at the last column (end of the line). 将结果保存到相同的日志文件后,.csv文件将为空,如果我将其保存为其他名称,则所有列均乱序,并且IP列旁边会出现“ SUSPICIOUS IP”列,而是需要它位于最后一列(行的末尾)。

I also don't know how to prompt to save for a file only if anything was found, if not only prompt nothing found! 我还不知道如何仅在发现任何内容后才提示保存文件,如果不仅提示什么也没有提示!

The results i'm getting: 我得到的结果:

 outlook.office365.com ,174.203.0.118 ,UserLoginFailed
 outlook.office365.com ,107.147.166.60 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,107.147.167.26 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,174.205.17.24 ,UserLoginFailed
 outlook.office365.com ,108.48.185.186 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,174.226.15.21 ,UserLoginFailed
 outlook.office365.com ,108.51.114.130 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,67.180.23.93 ,UserLoginFailed
 outlook.office365.com ,142.255.102.68 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,164.106.75.235 ,UserLoginFailed

You mean something like this: 您的意思是这样的:

awk 'FNR==NR { m[$1]=$0; next; } { for (i in m) { idx = index($0, i); if (idx > 0) { print substr($0, 1, idx-1) m[i]; next; } } } 1' file1.txt file2.txt > newfile2.txt

It basically processes file1.txt and file2.txt sequentially. 它基本上按顺序处理file1.txtfile2.txt FNR==NR is true for all lines from the first file, where a map m is build up with replacement patterns (everything before the first space is mapped to the entire line). 对于第一个文件中的所有行, FNR==NR为true,其中映射m用替换模式构建(第一个空间映射到整行之前的所有内容)。 For the second file, each line will be checked for a match in m . 对于第二个文件,将检查每行中m的匹配项。 If there's a match (using index() ), the script prints everything before the match and then the value from m . 如果存在匹配项(使用index() ),则脚本将在匹配项之前打印所有内容,然后打印m的值。 Oh, and the final 1 will print non-matching lines from file2. 哦,最后1将打印file2中不匹配的行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM