简体   繁体   中英

Sorting using linux commands

I have data in the following form:

 Sub: Size:14Val: 4644613 Some long string here
 Sub: Size:2Val: 19888493 Some other long string here
 Sub: Size:1Val: 6490281 Some other long string here1
 Sub: Size:1Val: 320829337 Some other long string here2
 Sub: Size:1Val: 50281086 Some other long string here3
 Sub: Size:1Val: 209077847 Some other long string here4
 Sub: Size:3Val: 320829337 Some other long string here2
 Sub: Size:3Val: 50281086 Some other long string here3
 Sub: Size:3Val: 209077847 Some other long string here4

Now I want to extract all Size:-- information from this file. That is I want to extract the following:

Size:14
Size:2
Size:1
Size:1
Size:1
Size:1
Size:3
Size:3
Size:3

And I want to find out number of occurrences of all the values associated with size. Eg 14 occurs once, 2 occurs once, 1 occurs four times, etc. in a sorted order ((i).sorted by the number of occurrences and (ii).sorted by value associated with size)). That is want the following result in a sorted manner

(i). sorted by number of occurences
1->4
3->3
2->1
14->1

(ii). sorted by the value associated with Size:
1->4
2->1
3->3
14->1

I wrote a python program and was able to sort them. But I was thinking is there some way to do the same using linux commands like grep, etc? I am using ubuntu 12.04.

To extract the size field,

grep -o 'Size:[0-9]*' data

Sorting by unique occurrences can be done with sort | uniq -c | sort -rn sort | uniq -c | sort -rn sort | uniq -c | sort -rn and you can make some minor modifications to the first sort (ie add -t : -k2rn ) and leave off the sort -rn at the end to sort by value. Massaging the final output into the format you require can easily be performed with a simple sed script.

grep -o 'Size:[0-9]*' data |
sort -t : -k2rn | uniq -c |
sed 's/^ *//;s/\([1-9][0-9]*\) Size:\([0-9]*\)/\2->\1/'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM