[英]How to grep a string in bash with letters out of order?
I have a task to do which is to find some strings (acronyms) that repeat in some specific text file. 我要做的任务是找到在某些特定文本文件中重复的一些字符串(缩写词)。
Here follows a sample: 以下是一个示例:
...
the
the
het
het
het
teh
teh
teh
teh
...
In the first step, I can count how many times each one of that appears with this command: 在第一步中,我可以用此命令计算每次出现的次数:
cat text_file.txt | sort | uniq -c | sort -gr
And the output is something like this: 输出是这样的:
2 the
3 het
4 teh
But I need also to "count/sum" these three outputs because they are using the same three characters but in a different order. 但是我还需要“计数/求和”这三个输出,因为它们使用相同的三个字符但顺序不同。
Can you guys please give me some help about this? 你们能给我一些帮助吗?
With GNU awk for splitting a string into chars given a null FS and sorted_in: 使用GNU awk在给定FS和sorted_in的情况下将字符串拆分为chars:
$ cat tst.awk
{
split($0,chars,"")
PROCINFO["sorted_in"] = "@val_str_asc"
key = ""
for (i in chars) {
key = key chars[i]
}
cnt[key]++
}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
for (key in cnt) {
print key, cnt[key]
}
}
$ cat file
the
het
teh
foobar
fobar
oofrab
$ awk -f tst.awk file
abfoor 2
abfor 1
eht 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.