简体   繁体   中英

How to grep a string in bash with letters out of order?

I have a task to do which is to find some strings (acronyms) that repeat in some specific text file.

Here follows a sample:

...
the
the
het
het
het
teh
teh
teh
teh
...

In the first step, I can count how many times each one of that appears with this command:

cat text_file.txt | sort | uniq -c | sort -gr

And the output is something like this:

2 the
3 het
4 teh

But I need also to "count/sum" these three outputs because they are using the same three characters but in a different order.

Can you guys please give me some help about this?

With GNU awk for splitting a string into chars given a null FS and sorted_in:

$ cat tst.awk
{
    split($0,chars,"")
    PROCINFO["sorted_in"] = "@val_str_asc"
    key = ""
    for (i in chars) {
        key = key chars[i]
    }
    cnt[key]++
}
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (key in cnt) {
        print key, cnt[key]
    }
}

$ cat file
the
het
teh
foobar
fobar
oofrab

$ awk -f tst.awk file
abfoor 2
abfor 1
eht 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM