簡體   English   中英

打印文件1與文件2的差異,而不從文件2中刪除任何內容

[英]print differences of file1 to file2 without deleting anything from file2

我正在創建一個腳本,用於從a.csv日志文件中針對預定義的黑色IP列表搜索IP。

它首先導入日志文件,然后從中解析IP,然后針對預定義的黑色IP列表搜索解析的IP,最后需要詢問用戶(如果找到任何結果)將結果保存到導入的原始日志文件中。

文件1是代碼中IP-output.csv的示例。

文件2是代碼中$ filename的示例(原始導入的.csv)。

文件1:

107.147.166.60 ,SUSPICIOUS IP
107.147.167.26 ,SUSPICIOUS IP
108.48.185.186 ,SUSPICIOUS IP
108.51.114.130 ,SUSPICIOUS IP
142.255.102.68 ,SUSPICIOUS IP

檔案2:

outlook.office365.com ,174.203.0.118 ,UserLoginFailed
outlook.office365.com ,107.147.166.60 ,UserLoginFailed
outlook.office365.com ,107.147.167.26 ,UserLoginFailed
outlook.office365.com ,174.205.17.24 ,UserLoginFailed
outlook.office365.com ,108.48.185.186 ,UserLoginFailed
outlook.office365.com ,174.226.15.21 ,UserLoginFailed
outlook.office365.com ,108.51.114.130 ,UserLoginFailed
outlook.office365.com ,67.180.23.93 ,UserLoginFailed
outlook.office365.com ,142.255.102.68 ,UserLoginFailed
outlook.office365.com ,164.106.75.235 ,UserLoginFailed

我想將文件2更改為:

outlook.office365.com ,174.203.0.118 ,UserLoginFailed
outlook.office365.com ,107.147.166.60 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,107.147.167.26 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.205.17.24 ,UserLoginFailed
outlook.office365.com ,108.48.185.186 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.226.15.21 ,UserLoginFailed
outlook.office365.com ,108.51.114.130 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,67.180.23.93 ,UserLoginFailed
outlook.office365.com ,142.255.102.68 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,164.106.75.235 ,UserLoginFailed

這是我創建的腳本:

#!/bin/bash
#
# IP Blacklist Checker
#Import .csv (File within working directory)
echo "Please import a .csv log file to parse/search the IP(s) and UserAgents: "
read filename
#Parsing IPs from .csv log file
echo "Parsing IP(s) from imported log file..."
grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' $filename | sort | uniq > IP-list.txt
echo 'Done'
awk 'END {print NR,"IP(s) Found in imported log file"}' IP-list.txt
echo 'IPs found in imported log file:'
cat IP-list.txt
#searches parsed ip's against blacked ip lists
echo 'Searching parsed IP(s) from pre-defined Blacked IP List Databases...'
fgrep -w -f "IP-list.txt" "IPlist.txt" > IP-output.txt
awk 'END {print NR,"IP(s) Found Blacked IP List Databases"}' IP-output.txt
echo 'Suspicious IPs found in Blacked IP List Databases:'
cat IP-output.txt
while true; do
read -p "Do you want to add results to log file?" yn
case $yn in
    [Yy]* ) grep -Ff IP-output.txt $filename | sed 's/$/ ,SUSPICIOUS IP/' > IP-output.csv && awk 'FNR==NR {m[$1]=$0; next} {for (i in m) {match($0,i); val=substr($0, RSTART, RLENGTH); if (val) {sub(val, m[i]); print; next}};} 1' IP-output.csv $filename > $filename; break;;
    [Nn]* ) break;;
    * ) echo "Please answer yes or no.";;
esac
done
echo "Finished searching parsed IP(s) from pre-defined Blacked IP List Databases."
rm IP-list.txt IP-output.csv IP-output.txt 

我要導入的日志文件真的很長,只有15到20列,並且IPlist.txt(塗黑的IP)中包含超過15000個IP。 將結果保存到相同的日志文件后,.csv文件將為空,如果我將其保存為其他名稱,則所有列均亂序,並且IP列旁邊會出現“ SUSPICIOUS IP”列,而是需要它位於最后一列(行的末尾)。

我還不知道如何僅在發現任何內容后才提示保存文件,如果不僅提示什么也沒有提示!

我得到的結果:

 outlook.office365.com ,174.203.0.118 ,UserLoginFailed
 outlook.office365.com ,107.147.166.60 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,107.147.167.26 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,174.205.17.24 ,UserLoginFailed
 outlook.office365.com ,108.48.185.186 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,174.226.15.21 ,UserLoginFailed
 outlook.office365.com ,108.51.114.130 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,67.180.23.93 ,UserLoginFailed
 outlook.office365.com ,142.255.102.68 ,SUSPICIOUS IP ,UserLoginFailed
 outlook.office365.com ,164.106.75.235 ,UserLoginFailed

您的意思是這樣的:

awk 'FNR==NR { m[$1]=$0; next; } { for (i in m) { idx = index($0, i); if (idx > 0) { print substr($0, 1, idx-1) m[i]; next; } } } 1' file1.txt file2.txt > newfile2.txt

它基本上按順序處理file1.txtfile2.txt 對於第一個文件中的所有行, FNR==NR為true,其中映射m用替換模式構建(第一個空間映射到整行之前的所有內容)。 對於第二個文件,將檢查每行中m的匹配項。 如果存在匹配項(使用index() ),則腳本將在匹配項之前打印所有內容,然后打印m的值。 哦,最后1將打印file2中不匹配的行。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM