計算列表中單詞的頻率並按頻率排序

Question

我正在使用 Python 3.3

我需要創建兩個列表，一個用於唯一單詞，另一個用於單詞的頻率。

我必須根據頻率列表對唯一單詞列表進行排序，以便頻率最高的單詞在列表中排在第一位。

我有文本中的設計，但不確定如何在 Python 中實現它。

到目前為止，我發現的方法要么使用Counter ，要么使用我們尚未學過的字典。 我已經從包含所有單詞的文件中創建了列表，但不知道如何找到列表中每個單詞的頻率。 我知道我需要一個循環來執行此操作，但無法弄清楚。

這是基本設計：

 original list = ["the", "car",....]
 newlst = []
 frequency = []
 for word in the original list
       if word not in newlst:
           newlst.append(word)
           set frequency = 1
       else
           increase the frequency
 sort newlst based on frequency list

Answer 1

用這個

from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})

Answer 2

您可以使用

from collections import Counter

它支持 Python 2.7，在此處閱讀更多信息

1.

>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

使用字典

>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]

但是，您必須先讀取文件，然后轉換為 dict。

2.這是python文檔示例，使用re和Counter

# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

Answer 3

words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
    print words.count(word), word

Answer 4

熊貓回答：

import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()

如果您希望它按升序排列，則很簡單：

pd.Series(original_list).value_counts().sort_values(ascending=True)

Answer 5

不使用集合的另一種算法的另一種解決方案：

def countWords(A):
   dic={}
   for x in A:
       if not x in  dic:        #Python 2.7: if not dic.has_key(x):
          dic[x] = A.count(x)
   return dic

dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items())   # if you want it sorted

Answer 6

一種方法是制作一個列表列表，新列表中的每個子列表都包含一個單詞和一個計數：

list1 = []    #this is your original list of words
list2 = []    #this is a new list

for word in list1:
    if word in list2:
        list2.index(word)[1] += 1
    else:
        list2.append([word,0])

或者，更有效地：

for word in list1:
    try:
        list2.index(word)[1] += 1
    except:
        list2.append([word,0])

這比使用字典效率低，但它使用了更基本的概念。

Answer 7

您可以使用 reduce() - 一種功能性方式。

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

返回：

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

Answer 8

使用 Counter 將是最好的方法，但如果您不想這樣做，您可以通過這種方式自己實現。

# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
    freq[word] = word_list.count(word) / float(len(word_list))

freq 將以您已經擁有的列表中每個單詞的頻率結束。

您需要在其中使用float將其中一個整數轉換為浮點數，因此結果值將是浮點數。

編輯：

如果您不能使用 dict 或 set，這是另一種效率較低的方法：

# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
    if word not in unique_words:
        unique_words += [word]
word_frequencies = []
for word in unique_words:
    word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
    print(unique_words[i] + ": " + word_frequencies[i])

unique_words和word_frequencies的索引將匹配。

Answer 9

理想的方法是使用將單詞映射到其計數的字典。 但如果你不能使用它，你可能想要使用 2 個列表 - 1 個存儲單詞，另一個存儲單詞計數。 請注意，單詞的順序和計數在這里很重要。 實現這一點很困難，而且效率不高。

Answer 10

嘗試這個：

words = []
freqs = []

for line in sorted(original list): #takes all the lines in a text and sorts them
    line = line.rstrip() #strips them of their spaces
    if line not in words: #checks to see if line is in words
        words.append(line) #if not it adds it to the end words
        freqs.append(1) #and adds 1 to the end of freqs
    else:
        index = words.index(line) #if it is it will find where in words
        freqs[index] += 1 #and use the to change add 1 to the matching index in freqs

Answer 11

這是代碼支持您的問題 is_char() 檢查驗證字符串僅計算這些字符串，Hashmap 是 python 中的字典

def is_word(word):
   cnt =0
   for c in word:

      if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
          cnt +=1
   if cnt==len(word):
      return True
  return False

def words_freq(s):
  d={}
  for i in s.split():
    if is_word(i):
        if i in d:
            d[i] +=1
        else:
            d[i] = 1
   return d

 print(words_freq('the the sky$ is blue not green'))

Answer 12

for word in original_list:
   words_dict[word] = words_dict.get(word,0) + 1

sorted_dt = {key: value for key, value in sorted(words_dict.items(), key=lambda item: item[1], reverse=True)}

keys = list(sorted_dt.keys())
values = list(sorted_dt.values())
print(keys)
print(values)

Answer 13

簡單的方法

d = {}
l = ['Hi','Hello','Hey','Hello']
for a in l:
    d[a] = l.count(a)
print(d)
Output : {'Hi': 1, 'Hello': 2, 'Hey': 1}

Answer 14

如果需要的話，單詞和頻率

def counter_(input_list_):
  lu = []
  for v in input_list_:
    ele = (v, lc.count(v)/len(lc)) #if you don't % remove <</len(lc)>>
    if ele not in lu:
      lu.append(ele)
  return lu

counter_(['a', 'n', 'f', 'a'])

輸出：

[('a', 0.5), ('n', 0.25), ('f', 0.25)]

Answer 15

最好的辦法是：

def wordListToFreqDict(wordlist):
    wordfreq = [wordlist.count(p) for p in wordlist]
    return dict(zip(wordlist, wordfreq))

然后嘗試： wordListToFreqDict(originallist)

計算列表中單詞的頻率並按頻率排序

問題描述

15 個解決方案

解決方案1
190 2013-12-11 05:37:04

解決方案2
54 2013-12-11 05:09:28

解決方案3
20 2013-12-11 05:00:45

解決方案4
13 2019-03-21 06:24:57

解決方案5
9 2015-12-20 08:40:20

解決方案6
5 2013-12-11 05:01:47

解決方案7
5 2016-02-23 18:11:08

解決方案8
3 2013-12-11 04:58:49

解決方案9
1 2013-12-11 04:57:18

解決方案10
0 2016-10-05 23:37:44

解決方案11
0 2018-08-13 14:08:43

解決方案12
0 2021-06-16 14:35:09

解決方案13
0 2022-06-02 23:41:52

解決方案14
0 2022-07-22 11:05:51

解決方案15
-1 2016-11-06 11:37:34

計算列表中單詞的頻率並按頻率排序

問題描述

15 個解決方案

解決方案1 190 2013-12-11 05:37:04

解決方案2 54 2013-12-11 05:09:28

解決方案3 20 2013-12-11 05:00:45

解決方案4 13 2019-03-21 06:24:57

解決方案5 9 2015-12-20 08:40:20

解決方案6 5 2013-12-11 05:01:47

解決方案7 5 2016-02-23 18:11:08

解決方案8 3 2013-12-11 04:58:49

解決方案9 1 2013-12-11 04:57:18

解決方案10 0 2016-10-05 23:37:44

解決方案11 0 2018-08-13 14:08:43

解決方案12 0 2021-06-16 14:35:09

解決方案13 0 2022-06-02 23:41:52

解決方案14 0 2022-07-22 11:05:51

解決方案15 -1 2016-11-06 11:37:34

解決方案1
190 2013-12-11 05:37:04

解決方案2
54 2013-12-11 05:09:28

解決方案3
20 2013-12-11 05:00:45

解決方案4
13 2019-03-21 06:24:57

解決方案5
9 2015-12-20 08:40:20

解決方案6
5 2013-12-11 05:01:47

解決方案7
5 2016-02-23 18:11:08

解決方案8
3 2013-12-11 04:58:49

解決方案9
1 2013-12-11 04:57:18

解決方案10
0 2016-10-05 23:37:44

解決方案11
0 2018-08-13 14:08:43

解決方案12
0 2021-06-16 14:35:09

解決方案13
0 2022-06-02 23:41:52

解決方案14
0 2022-07-22 11:05:51

解決方案15
-1 2016-11-06 11:37:34