簡體   English   中英

在 Python 中計算字母頻率

[英]Calculating the Letter Frequency in Python

我需要定義一個函數,該函數將根據某個字符對字符串進行切片,將這些索引相加,除以該字符在字符串中出現的次數,然后再除以文本的長度。

這是我到目前為止所擁有的:


def ave_index(char):
  passage = "string"
  if char in passage:
    word = passage.split(char)
    words = len(word)
    number = passage.count(char)
    answer = word / number / len(passage)
    return(answer)

  elif char not in passage:
    return False

到目前為止,我在運行此程序時得到的答案非常不合時宜

編輯:我們被賦予用作字符串的段落 - '叫我以實瑪利。 幾年前——不管精確到多久——我的錢包里幾乎沒有錢,也沒有什么特別讓我在岸上感興趣的東西,我想我會航行一點,看看世界的水汪汪的地方。 這是我的一種驅脾和調節循環的方法。 每當我發現自己的嘴變得嚴峻時; 每當我的靈魂里是潮濕、毛毛雨的十一月; 每當我發現自己不由自主地停在棺材倉庫前,提起我遇到的每一個葬禮的后方; 尤其是當我的弱點占了上風的時候,需要一個強有力的道德原則來防止我故意走上街頭,有條不紊地敲打人們的帽子——那么,我認為是時候盡快出海了如我所能。 這是我的手槍和球的替代品。 卡托以一種哲學上的熱情投入到他的劍上; 我悄悄地上了船。 這沒有什么令人驚訝的。 如果他們知道的話,幾乎所有的人在他們的學位,一段時間或其他時間,對海洋懷有與我幾乎相同的感情。

當 char = 's' 答案應該是 0.5809489252885479

您可以使用Counter檢查頻率:

from collections import Counter
words = 'The passage we were given to use as a string - Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people\'s hats off - then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.'

freqs = Counter(list(words)) # list(words) returns a list of all the characters in words, then Counter will calculate the frequencies 
print(float(freqs['s']) / len(words)) 

問題是你如何計算字母。 以字符串hello world為例,您正在嘗試計算有多少l 現在我們知道有 3 l ,但如果你做一個拆分:

>>> s.split('l')
['he', '', 'o wor', 'd']

這將導致計數為 4。此外,我們必須獲得字符串中每個字符實例的位置

內置的enumerate幫助我們解決了這個問題:

>>> s = 'hello world'
>>> c = 'l'  # The letter we are looking for
>>> results = [k for k,v in enumerate(s) if v == c]
>>> results
[2, 3, 9]

現在我們有了len(results)的總出現次數,以及字母在字符串中出現的位置。

這個問題的最后一個“技巧”是確保你除以一個浮點數,以獲得正確的結果。

針對您的示例文本(存儲在s ):

>>> c = 's'
>>> results = [k for k,v in enumerate(s) if v == c]
>>> results_sum = sum(results)
>>> (results_sum / len(results)) / float(len(s))
0.5804132973944295

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM