[英]Need to remove the count from the output when using "uniq -c" command
I am trying to read a file and sort it by number of occurrences of a particular field.我正在尝试读取一个文件并按特定字段的出现次数对其进行排序。 Suppose i want to find out the most repeated date from a log file then i use uniq -c option and sort it in descending order.
假设我想从日志文件中找出重复次数最多的日期,然后我使用 uniq -c 选项并按降序对它进行排序。 something like this
像这样的
uniq -c | sort -nr
This will produce some output like this -这将产生一些像这样的 output -
809 23/Dec/2008:19:20
the first field which is actually the count is the problem for me.... i want to get ony the date from the above output but m not able to get this.第一个实际上是计数的字段对我来说是个问题....我想从上面的 output 中获取日期,但我无法得到这个。 I tried to use cut command and did this
我尝试使用 cut 命令并执行了此操作
uniq -c | sort -nr | cut -d' ' -f2
but this just prints blank space... please can someone help me on getting the date only and chop off the count.但这只会打印出空白...请有人帮我只获取日期并减少计数。 I want only
我只要
23/Dec/2008:19:20
Thanks谢谢
The count from uniq
is preceded by spaces unless there are more than 7 digits in the count, so you need to do something like:来自
uniq
的计数前面有空格,除非计数中有超过 7 位数字,因此您需要执行以下操作:
uniq -c | sort -nr | cut -c 9-
to get columns (character positions) 9 upwards.获得列(字符位置)9 向上。 Or you can use
sed
:或者您可以使用
sed
:
uniq -c | sort -nr | sed 's/^.\{8\}//'
or:要么:
uniq -c | sort -nr | sed 's/^ *[0-9]* //'
This second option is robust in the face of a repeat count of 10,000,000 or more;面对 10,000,000 或更多的重复计数,第二种选择是稳健的; if you think that might be a problem, it is probably better than the
cut
alternative.如果您认为这可能是个问题,那么它可能比
cut
替代方案更好。 And there are undoubtedly other options available too.毫无疑问,还有其他选择。
Caveat: the counts were determined by experimentation on Mac OS X 10.7.3 but using GNU uniq
from coreutils
8.3.警告:计数是通过在 Mac OS X 10.7.3 上进行的实验确定的,但使用的是来自
coreutils
8.3 的 GNU uniq
。 The BSD uniq -c
produced 3 leading spaces before a single digit count. BSD
uniq -c
在单个数字计数之前产生了 3 个前导空格。 The POSIX spec says the output from uniq -c
shall be formatted as if with: POSIX 规范说来自
uniq -c
的 output 应该被格式化为:
printf("%d %s", repeat_count, line);
which would not have any leading blanks.不会有任何前导空白。 Given this possible variance in output formats, the
sed
script with the [0-9]
regex is the most reliable way of dealing with the variability in observed and theoretical output from uniq -c
:鉴于 output 格式的这种可能差异,带有
[0-9]
正则表达式的sed
脚本是处理来自uniq -c
的观察到的和理论上的 output 的可变性的最可靠方法:
uniq -c | sort -nr | sed 's/^ *[0-9]* //'
Instead of cut -d' ' -f2
, try而不是
cut -d' ' -f2
,尝试
awk '{$1="";print}'
Maybe you need to remove one more blank in the beginning:也许您需要在开始时再删除一个空白:
awk '{$1="";print}' | sed 's/^.//'
or completly with sed, preserving original whitspace:或完全使用 sed,保留原始空白:
sed -r 's/^[^0-9]*[0-9]+//'
Following awk
may help you here. awk
或许能帮到你。
awk '{a[$0]++} END{for(i in a){print a[i],i | "sort -k2"}}' Input_file
Solution 2nd: In case you want order of output to be same as input but not as sort.解决方案 2:如果您希望 output 的顺序与输入相同但与排序不同。
awk '!a[$0]++{b[++count]=$0} {c[$0]++} END{for(i=1;i<=count;i++){print c[b[i]],b[i]}}' Input_file
an alternative solution is this:另一种解决方案是:
uniq -c | sort -nr | awk '{print $1, $2}'
also you may easily print a single field.您也可以轻松打印单个字段。
use(since you use -f2 in the cut in your question)使用(因为你在你的问题中使用 -f2 )
cat file |sort |uniq -c | awk '{ print $2; }'
If you want to work with the count field downstream, following command will reformat it to a 'pipe friendly' tab delimited format without the left padding:如果您想在下游使用计数字段,以下命令会将其重新格式化为“管道友好”制表符分隔格式,不带左填充:
.. | sort | uniq -c | sed -r 's/^ +([0-9]+) /\1\t/'
For the original task it is a bit of an overkill, but after reformatting, cut
can be used to remove the field, as OP intended:对于原始任务来说,这有点矫枉过正,但在重新格式化后,可以使用
cut
来删除字段,正如 OP 所期望的那样:
.. | sort | uniq -c | sed -r 's/^ +([0-9]+) /\1\t/' | cut -d $'\t' -f2-
Add tr -s
to the pipe chain to "squeeze" multiple spaces into one space delimiter:将
tr -s
添加到 pipe 链以将多个空格“压缩”为一个空格分隔符:
uniq -c | tr -s ' ' | cut -d ' ' -f3
tr
is very useful in some obscure places. tr
在一些不起眼的地方非常有用。 Unfortunately it doesn't get rid of the first leading space, hence the -f3
不幸的是它没有摆脱第一个前导空间,因此
-f3
You could make use of sed
to strip both the leading spaces and the numbers printed by uniq -c
您可以使用
sed
前导空格和uniq -c
打印的数字
sort file | uniq -c | sed 's/^ *[0-9]* //'
I would illustrate this with an example.我会用一个例子来说明这一点。 Consider a file
考虑一个文件
winebottles.mkv
winebottles.mov
winebottles.xges
winebottles.xges~
winebottles.mkv
winebottles.mov
winebottles.xges
winebottles.xges~
The command命令
sort file | uniq -c | sed 's/^ *[0-9]* //'
would return会回来
winebottles.mkv
winebottles.mov
winebottles.xges
winebottles.xges~
first solution第一个解决方案
just using sort
when input repetition has not been taken into consideration.仅在未考虑输入重复时使用
sort
。 sort
has unique option -u
sort
有唯一的选项-u
sort -u file
sort -u < file
Ex.:前任。:
$ cat > file
a
b
c
a
a
g
d
d
$ sort -u file
a
b
c
d
g
second solution第二种解决方案
if sort
ing based on repetition is important如果基于重复的
sort
很重要
sort txt | uniq -c | sort -k1 -nr | sed 's/^ \+[0-9]\+ //g'
sort txt | uniq -c | sort -k1 -nr | perl -lpe 's/^ +[\d]+ +//g'
which has this output:其中有这个 output:
a
d
g
c
b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.