如何计算文件中的每个字母？

Question

I have a cord.txt file as shown below,我有一个cord.txt文件，如下所示，

188H,190D,245H
187D,481E,482T
187H,194E,196D
386D,388E,389N,579H
44E,60D

I need to count each letters and have to make a summary as shown below (expected output),我需要计算每个字母，并且必须进行如下所示的总结（预期输出），

H,4
D,5
E,4
T,1

I know how to count each letters by using grep "<letter>" cord.txt | wc我知道如何使用grep "<letter>" cord.txt | wc来计算每个字母grep "<letter>" cord.txt | wc grep "<letter>" cord.txt | wc . grep "<letter>" cord.txt | wc 。 But I have a huge file which contains more number of letters, therefore please help me to do the same.但是我有一个包含更多字母的大文件，因此请帮助我做同样的事情。

Thanks in advance.提前致谢。

Answer 1

You're missing the N :-)你错过了N :-)

grep -o '[[:alpha:]]' cord.txt | sort | uniq -c

grep -o only outputs the matching part. grep -o只输出匹配的部分。 With the POSIX class [[:alpha:]] , it outputs all the letters contained in the input.使用 POSIX 类[[:alpha:]] ，它输出输入中包含的所有字母。
sort groups the same letters together sort将相同的字母组合在一起
uniq -c reports unique lines with their counts. uniq -c报告独特的行及其计数。 It needs sorted input, as it only compares the current line to the previous one.它需要排序输入，因为它只将当前行与前一行进行比较。

Answer 2

The following command以下命令

Removes any character that is not an ASCII letter;删除任何不是 ASCII 字母的字符；
Places every character on its own line;将每个字符放在自己的行上；
Sorts the characters;对字符进行排序；
Counts the number of same consecutive lines.计算相同连续行的数量。

sed 's/[^a-zA-Z]//g' < input.txt | fold -w 1 -s | sort | uniq -c > output.txt
# ^                                ^              ^      ^
# 1.                               2.             3.     4.

Input:输入：

188H,190D,245H
187D,481E,482T
187H,194E,196D
386D,388E,389N,579H
44E,60D

output:输出：

 5 D
 4 E
 4 H
 1 N
 1 T

Answer 3

You might use python's collections.Counter as follows, let cord.txt content be你可以使用python的collections.Counter如下，让cord.txt内容为

188H,190D,245H
187D,481E,482T
187H,194E,196D
386D,388E,389N,579H
44E,60D

and counting.py be和counting.py是

import collections
counter = collections.Counter()
with open("cord.txt", "r") as f:
    for line in f:
        counter.update(i for i in line if i.isalpha())
for char, cnt in counter.items():
    print("{},{}".format(char,cnt))

then python counting.py output然后python counting.py输出

H,4
D,5
E,4
T,1
N,1

Note that I used for line in f where f is file-handle to avoid loading whole file into memory.请注意，我for line in f中使用for line in f其中f是文件句柄以避免将整个文件加载到内存中。 Disclaimer: I used python version 3.7 , older should work but might give other order in output, as collections.Counter is subclass of dict and these do not keep order in older python versions.免责声明：我使用了 python 版本3.7 ，旧版应该可以工作，但可能会在输出中给出其他顺序，因为collections.Counter是dict子类，并且这些在旧版 python 中不保持顺序。

Answer 4

Shortly:不久：

tr '[0-9],' \\n <input | sort | uniq -c
     43 
      5 D
      4 E
      4 H
      1 N
      1 T

Ok, there are 43 other characters... You could drop and match your request by adding sed :好的，还有 43 个其他字符...您可以通过添加sed来删除和匹配您的请求：

tr '[0-9],' \\n </tmp/so/input | sort | uniq -c |
     sed -ne 's/^ *\([0-9]\+\) \(.\)/\2,\1/p'
D,5
E,4
H,4
N,1
T,1

如何计算文件中的每个字母？

问题描述

4 个解决方案

解决方案1
6 2021-07-26 09:34:28

解决方案2
4 已采纳 2021-07-26 09:41:25

解决方案3
2 2021-07-26 09:36:37

解决方案4
0 2021-07-26 12:31:09

如何计算文件中的每个字母？

问题描述

4 个解决方案

解决方案1 6 2021-07-26 09:34:28

解决方案2 4 已采纳 2021-07-26 09:41:25

解决方案3 2 2021-07-26 09:36:37

解决方案4 0 2021-07-26 12:31:09

解决方案1
6 2021-07-26 09:34:28

解决方案2
4 已采纳 2021-07-26 09:41:25

解决方案3
2 2021-07-26 09:36:37

解决方案4
0 2021-07-26 12:31:09