简体   繁体   English

bash中的单词列表

[英]Word list in file bash

I want to do Unix script, witch print a list of uniq words form file and print list of numbers of rows where this word appearing. 我想做Unix脚本,请女巫打印uniq单词表格文件列表,并打印此单词出现的行数列表。

For eample file.txt 对于示例file.txt

Lorem 
ipsum dolor elit,
Lorem elit.

Output 输出量

Lorem 1,3
ipsum 2
dolor 2
elit 2,3

My code: 我的代码:

cat file.txt | tr '[:space:]' '[\n*]'| tr '[:digit:]' '[\n*]'| tr '[:punct:]' '[\n*]' | grep -v "^\s*$" | sort -f | uniq 

I don't know how I can do it... Someone can help me? 我不知道该怎么办...有人可以帮助我吗?

This awk codes works for your example: 这个awk代码适用于您的示例:

awk '{for(i=1;i<=NF;i++){
        gsub(/[.,:;]/,"",$i)
        a[$i]=($i in a)?a[$i]","NR:NR}}
     END{for(x in a)print x,a[x]}' file

some write-only perl: 一些只写的perl:

perl -nE '
    push @{$refs{$_}}, $. for /(\w+)/g
  } END { 
    say $_, "\t", join(",", @{$refs{$_}}) for keys %refs
' file
elit    2,3
Lorem   1,3
ipsum   2
dolor   2

It does not output the order of words as "when they when they were encountered in the file": the order is unspecified. 它不会将单词的顺序输出为“当在文件中遇到它们时”,即未指定顺序。

Also, if a word appears twice on one line, the line number will be added twice. 同样,如果一个单词在一行中出现两次,则该行号将被添加两次。 To improve this: 要改善这一点:

perl -MList::Util=uniq -nE '
    push @{$refs{$_}}, $. for uniq /(\w+)/g
  } END { 
    say $_, "\t", join(",", @{$refs{$_}}) for keys %refs
' file

If you don't care that the words are in a different order than encountered in the file: 如果您不关心单词的顺序与文件中遇到的顺序不同:

awk -F[^[:alpha:]] '{for (i=1; i<=NF;i++) 
                       if ($i) a[$i]=a[$i]?a[$i] "," NR:NR} 
                 END {for (e in a) print e,a[e]}' file

Or, if you want in the order encountered in the file: 或者,如果要按文件中遇到的顺序进行操作:

awk -F[^[:alpha:]] 'FNR==NR{for (i=1; i<=NF;i++) 
                       if ($i) a[$i]=a[$i]?a[$i] "," NR:NR
                    next}
                    {for (i=1; i<=NF;i++){
                           if ($i in seen) continue 
                           else if ($i) {
                                   print $i,a[$i]
                                   seen[$i] } }
                     }' file file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM