[英]bash script to find unique value in a file with index
I have a text file,test.txt, like我有一个文本文件test.txt,比如
shekhar cbv
ravi cbv
ravi sdf
asd df
ravi Df
ravi dfg
ravi df
ravi dfg
ravi df
afas cvb
sdf hgh
sasdg cfg
I want to sort based on 2nd field and want to write 2nd field only on "result.txt" Currently I am doing following:我想根据第二个字段进行排序,并且只想在“result.txt”上写第二个字段目前我正在执行以下操作:
sort -k 2,2 test.txt|排序-k 2,2 test.txt| uniq -i -f 1 |
uniq -i -f 1 | cut -d ' ' -f2 > result.txt
剪切 -d ' ' -f2 > result.txt
which is writing 2n field ie这是写 2n 字段,即
cbv
cfg
cvb
df
dfg
hgh
sdf
Corresponding to every unique 2nd field value I want its,all, index in original file.对应于我想要的每个唯一的第二个字段值,所有索引在原始文件中。 How this is possible?
这怎么可能?
Desired Output:所需的 Output:
cbv 1 2
cfg 12
cvb 10
df 4 5 7 9
dfg 6 8
hgh 11
sdf 3
One more thing if say there is 3rd field also.如果说还有第三场,还有一件事。 How to achieve the above only mean while sorting and finding uniqueness only 2nd field is used.
如何实现上述仅意味着在排序和查找唯一性时仅使用第二个字段。
input With 3rd field输入带第三个字段
shekhar cbv rg
ravi cbv fdf
ravi sdf dfh
asd df dfhdfh
ravi Df fgh
ravi dfg dfh
ravi df dfgh
ravi dfg dfgh
ravi df dfhg
afas cvb fhfg
sdf hgh cgfhfg
sasdg cfg fgh
Desired o/p same.所需的 o/p 相同。 Thanks, Ravi
谢谢,拉维
Try this command to print column with all original indexes :尝试使用此命令打印具有所有原始索引的列:
awk '{k=tolower($2); arr[k]=arr[k] " " NR} END{for(v in arr) print v, arr[v]}' test.txt | sort -f -k 1,1
cbv 1 2
cfg 12
cvb 10
df 4 5 7 9
dfg 6 8
hgh 11
sdf 3
awk '{k=tolower($2); arr[k]=arr[k] " " NR} END{n=asorti(arr, dest); for(i = 1; i <= n; i++) print dest[i], arr[dest[i]]}' test.txt
Your file can have any number of columns but this command will only look at 2nd column.您的文件可以有任意数量的列,但此命令只会查看第二列。
I think you want to use cut
to extract the column you want and then do the sort
and uniq
stuff:我想你想用
cut
来提取你想要的列,然后做sort
和uniq
的东西:
cut -f2 -d' ' test.txt | sort -f | uniq -i > result.txt
This assumes that the columns are separated by a single space.这假定列由一个空格分隔。
Note that you'll want -f
switch on sort
so that the sorting will be case insensitive, otherwise rows that differ only in case won't be beside each other and uniq -i
probably won't do what you want it to do.请注意,您需要
-f
打开sort
,以便排序不区分大小写,否则仅在大小写不同的行不会彼此相邻,并且uniq -i
可能不会执行您希望它执行的操作。
You're so close, To get the value of the second column based on what you've done so far, you should use awk .您已经很接近了,要根据到目前为止所做的工作来获取第二列的值,您应该使用awk 。 It is made for processing a stream line by line and extracting just the parts you want.
它用于逐行处理 stream 并仅提取您想要的部分。
Your code: sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 > result.txt
您的代码:
sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 > result.txt
sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 > result.txt
With awk: sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 | awk '{print $2}' > result.txt
使用 awk:
sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 | awk '{print $2}' > result.txt
sort -k 2,2 test.txt| uniq -i -f 1 | cut -d ' ' -f2 | awk '{print $2}' > result.txt
Awk will split your input by whitespace, and print $2
will take the second text block. Awk 将按空格分割您的输入,并且
print $2
将采用第二个文本块。 I'd recommend looking into awk - it is great for many problems.我建议查看 awk - 它非常适合解决许多问题。
For fun - perl:为了好玩 - perl:
perl -anle 'push(@{$s{$F[1]}},++$n);END{map{print "$_: @{$s{$_}}"} sort keys %s}'
or case insensitive或不区分大小写
perl -anle 'push(@{$s{lc($F[1])}},++$n);END{map{print "$_: @{$s{$_}}"} sort keys %s}'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.