简体   繁体   English

Bash脚本从文件中打印某些数据

[英]Bash script to print certain data from a file

I have this file that is constantly gathering data from website visitors: 我有一个不断收集网站访问者数据的文件:

IP-ADDR : DATE : BITCOIN-ADDR

I was wondering if there is a way to find lines that have the same IP-ADDR but different BITCOIN-ADDR and print them. 我想知道是否可以找到具有相同IP-ADDR但不同BITCOIN-ADDR的行并打印它们。

For example, running the script on this file: 例如,在此文件上运行脚本:

11.11.11.11 : 19-04-2017 08:01:33am  : 3N1zXzkjYYNcUSZHD98wcG7UXjNxkCXXXX
22.22.22.22 : 19-04-2017 08:01:35am  : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
12.12.12.12 : 19-04-2017 08:02:24am  : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBYYYY

every line is different, no output is printed. 每行都不相同,不会输出任何输出。

Also, is very important that running on 另外,在

11.11.11.11 : 19-04-2017 08:01:33am  : 3N1zXzkjYYNcUSZHD98wcG7UXjNxkCXXXX
22.22.22.22 : 19-04-2017 08:01:35am  : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
22.22.22.22 : 19-04-2017 08:02:24am  : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
22.22.22.22 : 19-04-2017 08:01:35am  : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
22.22.22.22 : 19-04-2017 08:02:24am  : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX

won't print anything. 将不会打印任何内容。

BUT, running on 但是,继续

11.11.11.11 : 19-04-2017 08:01:33am  : 3N1zXzkjYYNcUSZHD98wcG7UXjNxkCXXXX
22.22.22.22 : 19-04-2017 08:01:35am  : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
22.22.22.22 : 19-04-2017 08:02:24am  : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBYYYY

will see that IP 22.22.22.22 has a different bitcoin address and will print: 将会看到IP 22.22.22.22具有不同的比特币地址,并显示:

1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
1HSJDWp5gLybnhowBZcnoYTBBmuJxBYYYY

I'm using a code someone here helped me with a while ago: 我使用的代码是某人在不久前帮助我的:

awk -F " : " '{ printf "%s_%s\n" , $1, $3 }' test.txt | sort | sed 's/\(\s*\)\(.*\)\(\s\)/\2/' | uniq | perl -pe 's/(\s*)(.*?)_(.*)/\2/' | uniq -d

which, if run on last example, will print 如果在上一个示例中运行,它将打印

22.22.22.22

but i can't wrap my head around it to make it work for bitcoin addresses. 但我无法绕开它以使其适用于比特币地址。

Here are three more examples: 这是另外三个示例:

1.1.1.1 : 19-04-2017 08:01:33am : aaaaa
2.2.2.2 : 19-04-2017 08:01:33am : bbbbb

3.3.3.3 : 19-04-2017 08:01:33am : ccccc
3.3.3.3 : 19-04-2017 08:01:33am : ccccc

4.4.4.4 : 19-04-2017 08:01:33am : ddddd
4.4.4.4 : 19-04-2017 08:01:33am : eeeee

First example, every ip and btc is different, i don't mind. 第一个示例,每个ip和btc都不相同,我不介意。

Second example, same ip but also same btc, i don't mind that either, it's just a honest returning visitor that's using the same btc over and over, i don't want the script to show that either. 第二个示例,相同的ip,但也相同的btc,我也不介意,这只是一个诚实的回头客,一次又一次地使用相同的btc,我也不希望脚本显示这两者。

Now, third example, there is a visitor that is abusing the rules and uses different btc addr from the same ip addr. 现在,第三个示例中,有一个访问者正在滥用规则,并使用了与同一个IP地址不同的btc地址。 Using the script I have posted, i am able to print his ip and, through another script, to add it to an iptables firewall. 使用我发布的脚本,我可以打印他的IP,并通过另一个脚本将其添加到iptables防火墙。 But i need another script (the one i'm asking for help here) to print me the following output: 但是我需要另一个脚本(我在这里寻求帮助的脚本)才能向我输出以下输出:

ddddd
eeeee

So i can use another script and block his access. 因此,我可以使用其他脚本并阻止他的访问。

Some help, please? 请帮忙吗? Thanks! 谢谢!

LE: Found the solution (thanks to @danielbmartin): LE:找到了解决方案(感谢@danielbmartin):

awk '{if (index(a[$1],$NF)==0) a[$1]=a[$1]" " $NF}
  END{for (j in a)
  {n=split(a[j],b);
   if (n>1) print j" references "a[j]}}' \
$InFile >$OutFile
$ cat ip.txt 
1.1.1.1 : 19-04-2017 08:01:33am : aaaaa
2.2.2.2 : 19-04-2017 08:01:33am : bbbbb

3.3.3.3 : 19-04-2017 08:01:33am : ccccc
3.3.3.3 : 19-04-2017 08:01:33am : ccccc

4.4.4.4 : 19-04-2017 08:01:33am : ddddd
4.4.4.4 : 19-04-2017 08:01:33am : eeeee

$ awk -F: '($1 in a) && a[$1]!=$NF{print $1} {a[$1]=$NF}' ip.txt 
4.4.4.4 
  • -F: use : as field separator -F:使用:作为字段分隔符
  • {a[$1]=$NF} create an array with first column as key and last column as value {a[$1]=$NF}创建一个数组,第一列为键,最后一列为值
  • ($1 in a) && a[$1]!=$NF if first column is already present as key but the value doesn't match ($1 in a) && a[$1]!=$NF如果第一列已作为键出现但值不匹配
    • print $1 print first column print $1打印第一列


To print last column 打印最后一列

$ awk -F: '($1 in a) && a[$1]!=$NF{print a[$1]"\n"$NF} {a[$1]=$NF}' ip.txt 
 ddddd
 eeeee

Note: this code doesn't take into consideration more than one mismatch 注意:此代码不考虑多个不匹配项

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM