简体   繁体   English

计算文本文件中的字母 - 频率

[英]Counting letters in a text file - frequency

I have a given text file with letters a, b, .., z with their given occurence.我有一个给定的文本文件,其中包含给定的字母 a、b、..、z。 I wrote it like this我是这样写的


"letter";"occurences"
a;105
b;29
...
z;0

I have to use this data to create a vector "freq" of length 26 containing the frequency of occurrence of each of the 26 letters from a to z.我必须使用这些数据创建一个长度为 26 的向量“freq”,其中包含从 a 到 z 的 26 个字母中的每一个的出现频率。


def letterFrequency(small_text):
    filein = open("small_text.txt", "r") # Opens the file for reading
    lines = filein.readlines() # Reads all lines into an array
    smalltxt = "".join(lines) # Joins the lines into one big string
    freq = 0
    n = 1296
    for letter in lines:
        np.count_nonzero(letter)
        freq.append(letter)
        freq = letter/n
     return freq
print(letterFrequency('small_text.txt'))

The total number of n = 1296 which is relevant for the frequency which is given in %, expected output is therefore因此,n = 1296 的总数与以 % 给出的频率相关,预期为 output

[ 8.10185185 2.23765432 2.4691358 4.55246914
12.34567901
2.00617284 1.92901235 6.71296296 7.17592593
0.07716049
1.15740741 3.39506173 1.08024691 6.71296296
7.87037037
1.46604938 0.07716049 6.01851852 5.40123457
10.95679012
2.85493827 0.92592593 2.93209877 0.
1.54320988 0. ]

Since 105/1296 = 0.081因为 105/1296 = 0.081

If anyone would want to help me and navigate me further thank you since my code isn't working!如果有人想帮助我并进一步引导我,谢谢你,因为我的代码不起作用!

you need to create a list to store the values and append into this list您需要创建一个列表来将值和 append 存储到该列表中

Also instead of hardcoding 1296 you should get the accumulated frequency and then divide by this.另外,不要硬编码 1296,您应该获得累积频率,然后除以它。

def letterFrequency(filename):
    frequencies = []
    letters = []
    accum = 0
    with open(filename, 'r') as fin:
        for line in fin:
            letter, freq = line.split(';')
            try: 
                freq = int(freq)
            except ValueError:       # to handle the first line
                continue
            accum += freq
            letters.append(letter)
            frequencies.append(freq)

        # normalize frequencies
        frequencies = [i/accum for i in frequencies]

    # you need to keep a list of letters, otherwise how do you
    # know to which letter does each frequency belong?
    return letters, frequencies

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM