简体   繁体   English

bash 脚本在具有索引的文件中查找唯一值

[英]bash script to find unique value in a file with index

I have a text file,test.txt, like我有一个文本文件test.txt,比如

shekhar cbv
ravi cbv
ravi sdf
asd df
ravi Df
ravi dfg
ravi df
ravi dfg
ravi df
afas cvb
sdf hgh
sasdg cfg

I want to sort based on 2nd field and want to write 2nd field only on "result.txt" Currently I am doing following:我想根据第二个字段进行排序,并且只想在“result.txt”上写第二个字段目前我正在执行以下操作:

sort -k 2,2 test.txt|排序-k 2,2 test.txt| uniq -i -f 1 | uniq -i -f 1 | cut -d ' ' -f2 > result.txt剪切 -d ' ' -f2 > result.txt

which is writing 2n field ie这是写 2n 字段,即

cbv 
cfg 
cvb 
df  
dfg 
hgh 
sdf 

Corresponding to every unique 2nd field value I want its,all, index in original file.对应于我想要的每个唯一的第二个字段值,所有索引在原始文件中。 How this is possible?这怎么可能?

Desired Output:所需的 Output:

cbv 1 2
cfg 12
cvb 10
df 4 5 7 9 
dfg 6 8
hgh 11
sdf 3

One more thing if say there is 3rd field also.如果说还有第三场,还有一件事。 How to achieve the above only mean while sorting and finding uniqueness only 2nd field is used.如何实现上述仅意味着在排序和查找唯一性时仅使用第二个字段。

input With 3rd field输入带第三个字段

    shekhar cbv rg
    ravi cbv fdf
    ravi sdf dfh
    asd df dfhdfh
    ravi Df fgh
    ravi dfg dfh
    ravi df dfgh
    ravi dfg dfgh
    ravi df dfhg
    afas cvb fhfg
    sdf hgh cgfhfg
    sasdg cfg fgh

Desired o/p same.所需的 o/p 相同。 Thanks, Ravi谢谢,拉维

Try this command to print column with all original indexes :尝试使用此命令打印具有所有原始索引的列:

awk '{k=tolower($2); arr[k]=arr[k] " " NR} END{for(v in arr) print v, arr[v]}' test.txt | sort -f -k 1,1

OUTPUT OUTPUT

cbv  1 2
cfg  12
cvb  10
df  4 5 7 9
dfg  6 8
hgh  11
sdf  3

Update: using awk only solution更新:仅使用 awk 解决方案

awk '{k=tolower($2); arr[k]=arr[k] " " NR} END{n=asorti(arr, dest); for(i = 1; i <= n; i++) print dest[i], arr[dest[i]]}' test.txt

Your file can have any number of columns but this command will only look at 2nd column.您的文件可以有任意数量的列,但此命令只会查看第二列。

I think you want to use cut to extract the column you want and then do the sort and uniq stuff:我想你想用cut来提取你想要的列,然后做sortuniq的东西:

cut -f2 -d' ' test.txt | sort -f | uniq -i > result.txt

This assumes that the columns are separated by a single space.这假定列由一个空格分隔。

Note that you'll want -f switch on sort so that the sorting will be case insensitive, otherwise rows that differ only in case won't be beside each other and uniq -i probably won't do what you want it to do.请注意,您需要-f打开sort ,以便排序不区分大小写,否则仅在大小写不同的行不会彼此相邻,并且uniq -i可能不会执行您希望它执行的操作。

You're so close, To get the value of the second column based on what you've done so far, you should use awk .您已经很接近了,要根据到目前为止所做的工作来获取第二列的值,您应该使用awk It is made for processing a stream line by line and extracting just the parts you want.它用于逐行处理 stream 并仅提取您想要的部分。

Your code: sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 > result.txt您的代码: sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 > result.txt sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 > result.txt

With awk: sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 | awk '{print $2}' > result.txt使用 awk: sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 | awk '{print $2}' > result.txt sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 | awk '{print $2}' > result.txt

Awk will split your input by whitespace, and print $2 will take the second text block. Awk 将按空格分割您的输入,并且print $2将采用第二个文本块。 I'd recommend looking into awk - it is great for many problems.我建议查看 awk - 它非常适合解决许多问题。

For fun - perl:为了好玩 - perl:

perl -anle 'push(@{$s{$F[1]}},++$n);END{map{print "$_: @{$s{$_}}"} sort keys %s}'

or case insensitive或不区分大小写

perl -anle 'push(@{$s{lc($F[1])}},++$n);END{map{print "$_: @{$s{$_}}"} sort keys %s}'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM