简体   繁体   English

如何使用shell脚本对文本文件的内容进行排序

[英]How to sort a content of a text file using a shell script

I am new to shell scripting. 我是shell脚本的新手。 I am interested how to know how to sort a content of a file using shell scripting. 我感兴趣的是如何知道如何使用shell脚本对文件内容进行排序。

Here is an example: 这是一个例子:

fap0089-josh.baker
fap00233-adrian.edwards
fap00293-bob.boyle
fap00293-bob.jones
fap002-brian.lopez
fap00293-colby.morris
fap00293-cole.mitchell
psf0354-SKOWALSKI
psf0354-SLEE
psf0382-SLOWE
psf0391-SNOMURA
psf0354-SPATEL
psf0364-SRICHARDS
psf0354-SSEIBERT
psf0354-SSIRAH
bsi0004-STRAN
bsi0894-STURBIC
unit054-SUNDERWOOD

Considering the data above (this is a small set, I have more than 5.5 records), I would like to sort it like this: 考虑到上面的数据(这是一个小集,我有超过5.5条记录),我想这样排序:

  1. Number of entries starting with fap,psf,bsi,unit etc... 以fap,psf,bsi,unit等开头的条目数...
  2. The total number of environments for each type, ie: each numeric after the word, 0004,0382,054 etc are environments. 每种类型的环境总数,即:单词后面的每个数字,0004,0382,054等都是环境。 eg: psf has 4 unique environments. 例如:psf有4个独特的环境。
  3. The sum total 总和

Here's a Schwarzian transform to sort by 1) leading letters, then 2) digits 这是一个Schwarzian变换,用1)前导字母,然后2)数字排序

sed -r 's/^([[:alpha:]]+)([[:digit:]]+)/\1 \2 /' filename | 
sort -t ' ' -k 1,1 -k 2,2n | 
sed 's/ //; s/ //'

output: 输出:

bsi0004-STRAN
bsi0894-STURBIC
fap002-brian.lopez
fap0089-josh.baker
fap00233-adrian.edwards
fap00293-bob.boyle
fap00293-bob.jones
fap00293-colby.morris
fap00293-cole.mitchell
psf0354-SKOWALSKI
psf0354-SLEE
psf0354-SPATEL
psf0354-SSEIBERT
psf0354-SSIRAH
psf0364-SRICHARDS
psf0382-SLOWE
psf0391-SNOMURA
unit054-SUNDERWOOD

To generate the metrics you mention, I'd use perl: 要生成您提到的指标,我将使用perl:

perl -nE '
    /^([[:alpha:]]+)(\d+)/ or next;
    $count{$1}++;
    $nenv{$1}{$2}=1;
    $total+=$2
} 
END {
    say "Counts:";
    say "$_ => $count{$_}" for sort keys %count;
    say "Number of environments";
    say "$_ => ", scalar keys %{$nenv{$_}} for sort keys %nenv;
    say "Total = $total";
' filename
Counts:
bsi => 2
fap => 7
psf => 8
unit => 1
Number of environments
bsi => 2
fap => 4
psf => 4
unit => 1
Total = 5355

Without using perl, it's less efficient because you have to read the file multiple times. 不使用perl,效率较低,因为您必须多次读取文件。

echo Counts:
sed 's/[0-9].*//' filename | sort | uniq -c 
echo Number of environments:
sed -r 's/^([a-z]+)([0-9]*).*/\1 \2/' filename | sort -u | cut -d" " -f1 | uniq -c
echo Total:
{ printf "%d+" $(sed -r 's/^[a-z0]+([0-9]*).*/\1/' filename); echo 0; } | bc
Counts:
      2 bsi
      7 fap
      8 psf
      1 unit
Number of environments:
      2 bsi
      4 fap
      4 psf
      1 unit
Total:
5355

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM