如何使用grep从文件输出唯一的代码行？

Question

I have a large log file that contains lines such as: 我有一个大型日志文件，其中包含以下行：

82.117.22.206 - - [08/Mar/2013:20:36:42 +0000] "GET /key/0/www.mysite.org.uk/ HTTP/1.0" 200 0 "-" "-"

And i want to extract from each line that matches the above pattern only the ip 82.117.22.206 followed by a space and the text www.mysite.org.uk from it. 我想从与上述模式匹配的每一行中仅提取IP 82.117.22.206后跟一个空格和文本www.mysite.org.uk 。 The ip and text can differ. ip和文本可以不同。 So given the above line the line in the output file would be: 因此，鉴于上述行，输出文件中的行将为：

82.117.22.206 www.mysite.org.uk

How can I use grep or other commands in bash to make the output unique so that the output file won't contain two identical lines? 如何在bash中使用grep或其他命令使输出唯一，以使输出文件不会包含两行相同的行？ Can someone refer me to a good place to start learnning more about this kind of shell scripting? 有人可以将我引到一个不错的地方来开始学习更多有关这种shell脚本的信息吗？

Answer 1

With perl you can capture the parts 使用perl，您可以捕获零件

use strict;
use warnings;

if (m/^(\d+\.\d+\.\d+\.\d+)\s+-\s+-\s+\[.+?\]\s+\"GET\s+\/key\/0\/(.+?)\//) {
    print "$1 $2\n";
}

and call this as 并称其为

perl -n script.pl logfile.txt | sort -u

This extracts the needed fields, sorts and eliminates duplicate lines. 这将提取所需的字段，进行排序并消除重复的行。

Answer 2

if you figure out the regex to use, you could do something like: 如果您知道要使用的正则表达式，则可以执行以下操作：

echo "Hello World" | grep "Hell" | sed 's/\(Hell\).*\(World\)/\1 \2/'

only, you'd cat your log, instead of echoing a string. 只是，您需要记录日志，而不是回显字符串。

Answer 3

grep -Po "^[\d.]*|[^/]*(?=/ HTTP)" file|sed 'N;s/\n/ /'

如何使用grep从文件输出唯一的代码行？

问题描述

3 个解决方案

解决方案1
2 已采纳 2013-03-08 21:02:53

解决方案2
0 2013-03-08 21:07:23

解决方案3
0 2013-03-08 21:12:17

如何使用grep从文件输出唯一的代码行？

问题描述

3 个解决方案

解决方案1 2 已采纳 2013-03-08 21:02:53

解决方案2 0 2013-03-08 21:07:23

解决方案3 0 2013-03-08 21:12:17

解决方案1
2 已采纳 2013-03-08 21:02:53

解决方案2
0 2013-03-08 21:07:23

解决方案3
0 2013-03-08 21:12:17