简体   繁体   English

如何计算文件中的每个字母?

[英]How to count each letters from a file?

I have a cord.txt file as shown below,我有一个cord.txt文件,如下所示,

188H,190D,245H
187D,481E,482T
187H,194E,196D
386D,388E,389N,579H
44E,60D

I need to count each letters and have to make a summary as shown below (expected output),我需要计算每个字母,并且必须进行如下所示的总结(预期输出),

H,4
D,5
E,4
T,1

I know how to count each letters by using grep "<letter>" cord.txt | wc我知道如何使用grep "<letter>" cord.txt | wc来计算每个字母grep "<letter>" cord.txt | wc grep "<letter>" cord.txt | wc . grep "<letter>" cord.txt | wc But I have a huge file which contains more number of letters, therefore please help me to do the same.但是我有一个包含更多字母的大文件,因此请帮助我做同样的事情。

Thanks in advance.提前致谢。

You're missing the N :-)你错过了N :-)

grep -o '[[:alpha:]]' cord.txt | sort | uniq -c
  • grep -o only outputs the matching part. grep -o只输出匹配的部分。 With the POSIX class [[:alpha:]] , it outputs all the letters contained in the input.使用 POSIX 类[[:alpha:]] ,它输出输入中包含的所有字母。
  • sort groups the same letters together sort将相同的字母组合在一起
  • uniq -c reports unique lines with their counts. uniq -c报告独特的行及其计数。 It needs sorted input, as it only compares the current line to the previous one.它需要排序输入,因为它只将当前行与前一行进行比较。

The following command以下命令

  1. Removes any character that is not an ASCII letter;删除任何不是 ASCII 字母的字符;
  2. Places every character on its own line;将每个字符放在自己的行上;
  3. Sorts the characters;对字符进行排序;
  4. Counts the number of same consecutive lines.计算相同连续行的数量。
sed 's/[^a-zA-Z]//g' < input.txt | fold -w 1 -s | sort | uniq -c > output.txt
# ^                                ^              ^      ^
# 1.                               2.             3.     4.

Input:输入:

188H,190D,245H
187D,481E,482T
187H,194E,196D
386D,388E,389N,579H
44E,60D

output:输出:

 5 D
 4 E
 4 H
 1 N
 1 T

You might use python's collections.Counter as follows, let cord.txt content be你可以使用python的collections.Counter如下,让cord.txt内容为

188H,190D,245H
187D,481E,482T
187H,194E,196D
386D,388E,389N,579H
44E,60D

and counting.py becounting.py

import collections
counter = collections.Counter()
with open("cord.txt", "r") as f:
    for line in f:
        counter.update(i for i in line if i.isalpha())
for char, cnt in counter.items():
    print("{},{}".format(char,cnt))

then python counting.py output然后python counting.py输出

H,4
D,5
E,4
T,1
N,1

Note that I used for line in f where f is file-handle to avoid loading whole file into memory.请注意,我for line in f中使用for line in f其中f是文件句柄以避免将整个文件加载到内存中。 Disclaimer: I used python version 3.7 , older should work but might give other order in output, as collections.Counter is subclass of dict and these do not keep order in older python versions.免责声明:我使用了 python 版本3.7 ,旧版应该可以工作,但可能会在输出中给出其他顺序,因为collections.Counterdict子类,并且这些在旧版 python 中不保持顺序。

Shortly:不久:

tr '[0-9],' \\n <input | sort | uniq -c
     43 
      5 D
      4 E
      4 H
      1 N
      1 T

Ok, there are 43 other characters... You could drop and match your request by adding sed :好的,还有 43 个其他字符...您可以通过添加sed来删除和匹配您的请求:

tr '[0-9],' \\n </tmp/so/input | sort | uniq -c |
     sed -ne 's/^ *\([0-9]\+\) \(.\)/\2,\1/p'
D,5
E,4
H,4
N,1
T,1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Unix shell 代码。 如何计算文本文件中每一行的字母数 - Unix shell code. How to count the number of letters on each line from a text file 如何从输入中找到字母的第一个字母并计算输入字母 - How do i find the first letter in alphabet from the input and count the input letters 如何计算每个术语在平面文件的单个列中的出现? - How to count the occurence of each term in a single column of a flat file? 从文件名中删除开头的 4 个字母和结尾的 4 个字母 - Remove starting 4 letters and ending 4 letters from a file name 使用脚本外壳获取文件中每行的最后两个字母 - Get the last two letters of each line in a file using script shell 如何用文件的每一行中的字母替换某些数字(根据该行第5列和第6列中存在的字母)? - How to replace some digits with letters in each line of a file (according to the letter that exist in column 5th and 6th of that line)? Linux:将字数附加到文件的每一行 - Linux: Append Word Count to Each Line of a File 如何从文件的每一行中获取元素,并将它们添加到不同文件每一行的每个条目中? - How to take elements from each line of a file, and prepend them to each entry in each line of a different file? 计算目录中每个文件中的markdown链接 - Count markdown links in each file in a directory .zip文件夹中每个文件的行数 - Row count of each file in a `.zip` folder
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM