使用“uniq -c”命令时需要从 output 中删除计数

Question

I am trying to read a file and sort it by number of occurrences of a particular field.我正在尝试读取一个文件并按特定字段的出现次数对其进行排序。 Suppose i want to find out the most repeated date from a log file then i use uniq -c option and sort it in descending order.假设我想从日志文件中找出重复次数最多的日期，然后我使用 uniq -c 选项并按降序对它进行排序。 something like this像这样的

uniq -c | sort -nr

This will produce some output like this -这将产生一些像这样的 output -

809 23/Dec/2008:19:20

the first field which is actually the count is the problem for me.... i want to get ony the date from the above output but m not able to get this.第一个实际上是计数的字段对我来说是个问题....我想从上面的 output 中获取日期，但我无法得到这个。 I tried to use cut command and did this我尝试使用 cut 命令并执行了此操作

uniq -c | sort -nr | cut -d' ' -f2

but this just prints blank space... please can someone help me on getting the date only and chop off the count.但这只会打印出空白...请有人帮我只获取日期并减少计数。 I want only我只要

23/Dec/2008:19:20

Thanks谢谢

Answer 1

The count from uniq is preceded by spaces unless there are more than 7 digits in the count, so you need to do something like:来自uniq的计数前面有空格，除非计数中有超过 7 位数字，因此您需要执行以下操作：

uniq -c | sort -nr | cut -c 9-

to get columns (character positions) 9 upwards.获得列（字符位置）9 向上。 Or you can use sed :或者您可以使用sed ：

uniq -c | sort -nr | sed 's/^.\{8\}//'

or:要么：

uniq -c | sort -nr | sed 's/^ *[0-9]* //'

This second option is robust in the face of a repeat count of 10,000,000 or more;面对 10,000,000 或更多的重复计数，第二种选择是稳健的； if you think that might be a problem, it is probably better than the cut alternative.如果您认为这可能是个问题，那么它可能比cut替代方案更好。 And there are undoubtedly other options available too.毫无疑问，还有其他选择。

Caveat: the counts were determined by experimentation on Mac OS X 10.7.3 but using GNU uniq from coreutils 8.3.警告：计数是通过在 Mac OS X 10.7.3 上进行的实验确定的，但使用的是来自coreutils 8.3 的 GNU uniq 。 The BSD uniq -c produced 3 leading spaces before a single digit count. BSD uniq -c在单个数字计数之前产生了 3 个前导空格。 The POSIX spec says the output from uniq -c shall be formatted as if with: POSIX 规范说来自uniq -c的 output 应该被格式化为：

printf("%d %s", repeat_count, line);

which would not have any leading blanks.不会有任何前导空白。 Given this possible variance in output formats, the sed script with the [0-9] regex is the most reliable way of dealing with the variability in observed and theoretical output from uniq -c :鉴于 output 格式的这种可能差异，带有[0-9]正则表达式的sed脚本是处理来自uniq -c的观察到的和理论上的 output 的可变性的最可靠方法：

uniq -c | sort -nr | sed 's/^ *[0-9]* //'

Answer 2

Instead of cut -d' ' -f2 , try而不是cut -d' ' -f2 ，尝试

awk '{$1="";print}'

Maybe you need to remove one more blank in the beginning:也许您需要在开始时再删除一个空白：

awk '{$1="";print}' | sed 's/^.//'

or completly with sed, preserving original whitspace:或完全使用 sed，保留原始空白：

sed -r 's/^[^0-9]*[0-9]+//'

Answer 3

Following awk may help you here. awk或许能帮到你。

awk '{a[$0]++} END{for(i in a){print a[i],i | "sort -k2"}}'  Input_file

Solution 2nd: In case you want order of output to be same as input but not as sort.解决方案 2：如果您希望 output 的顺序与输入相同但与排序不同。

awk '!a[$0]++{b[++count]=$0} {c[$0]++} END{for(i=1;i<=count;i++){print c[b[i]],b[i]}}'  Input_file

Answer 4

an alternative solution is this:另一种解决方案是：

uniq -c | sort -nr | awk '{print $1, $2}'

also you may easily print a single field.您也可以轻松打印单个字段。

Answer 5

use(since you use -f2 in the cut in your question)使用（因为你在你的问题中使用 -f2 ）

cat file |sort |uniq -c | awk '{ print $2; }'

Answer 6

If you want to work with the count field downstream, following command will reformat it to a 'pipe friendly' tab delimited format without the left padding:如果您想在下游使用计数字段，以下命令会将其重新格式化为“管道友好”制表符分隔格式，不带左填充：

 .. | sort | uniq -c | sed -r 's/^ +([0-9]+) /\1\t/'

For the original task it is a bit of an overkill, but after reformatting, cut can be used to remove the field, as OP intended:对于原始任务来说，这有点矫枉过正，但在重新格式化后，可以使用cut来删除字段，正如 OP 所期望的那样：

 .. | sort | uniq -c | sed -r 's/^ +([0-9]+) /\1\t/' | cut -d $'\t' -f2-

Answer 7

Add tr -s to the pipe chain to "squeeze" multiple spaces into one space delimiter:将tr -s添加到 pipe 链以将多个空格“压缩”为一个空格分隔符：

uniq -c | tr -s ' ' | cut -d ' ' -f3

tr is very useful in some obscure places. tr在一些不起眼的地方非常有用。 Unfortunately it doesn't get rid of the first leading space, hence the -f3不幸的是它没有摆脱第一个前导空间，因此-f3

Answer 8

You could make use of sed to strip both the leading spaces and the numbers printed by uniq -c您可以使用sed前导空格和uniq -c打印的数字

sort file | uniq -c | sed 's/^ *[0-9]* //'

I would illustrate this with an example.我会用一个例子来说明这一点。 Consider a file考虑一个文件

winebottles.mkv
winebottles.mov
winebottles.xges
winebottles.xges~
winebottles.mkv
winebottles.mov
winebottles.xges
winebottles.xges~

The command命令

sort file | uniq -c | sed 's/^ *[0-9]* //'

would return会回来

winebottles.mkv
winebottles.mov
winebottles.xges
winebottles.xges~

Answer 9

first solution第一个解决方案
just using sort when input repetition has not been taken into consideration.仅在未考虑输入重复时使用sort 。 sort has unique option -u sort有唯一的选项-u

sort -u file
sort -u < file

Ex.:前任。：

$ cat > file
a
b
c
a
a
g
d
d
$ sort -u file
a
b
c
d
g

second solution第二种解决方案
if sort ing based on repetition is important如果基于重复的sort很重要

sort txt | uniq -c | sort -k1 -nr | sed 's/^ \+[0-9]\+ //g'
sort txt | uniq -c | sort -k1 -nr | perl -lpe 's/^ +[\d]+ +//g'

which has this output:其中有这个 output：

a
d
g
c
b

使用“uniq -c”命令时需要从 output 中删除计数

问题描述

9 个解决方案

解决方案1
9 已采纳 2012-04-10 06:33:34

解决方案2
5 2012-04-10 06:36:27

解决方案3
3 2018-07-08 10:25:57

解决方案4
2 2012-08-10 22:03:46

解决方案5
2 2018-07-08 10:24:56

解决方案6
1 2014-11-03 12:21:39

解决方案7
1 2017-01-13 16:46:34

解决方案8
0 2018-07-08 10:25:16

解决方案9
0 2018-07-08 10:32:25

使用“uniq -c”命令时需要从 output 中删除计数

问题描述

9 个解决方案

解决方案1 9 已采纳 2012-04-10 06:33:34

解决方案2 5 2012-04-10 06:36:27

解决方案3 3 2018-07-08 10:25:57

解决方案4 2 2012-08-10 22:03:46

解决方案5 2 2018-07-08 10:24:56

解决方案6 1 2014-11-03 12:21:39

解决方案7 1 2017-01-13 16:46:34

解决方案8 0 2018-07-08 10:25:16

解决方案9 0 2018-07-08 10:32:25

解决方案1
9 已采纳 2012-04-10 06:33:34

解决方案2
5 2012-04-10 06:36:27

解决方案3
3 2018-07-08 10:25:57

解决方案4
2 2012-08-10 22:03:46

解决方案5
2 2018-07-08 10:24:56

解决方案6
1 2014-11-03 12:21:39

解决方案7
1 2017-01-13 16:46:34

解决方案8
0 2018-07-08 10:25:16

解决方案9
0 2018-07-08 10:32:25