簡體   English   中英

如何只計算字典中的單詞,同時返回字典鍵名的計數

[英]How to count only the words in a dictionary, while returning a count of the dictionary key name

我想給我的 excel 文件發短信。 首先,我必須將所有行連接成一個大文本文件。 然后,掃描文本以查找字典中的單詞。 如果找到該單詞,則將其計為字典鍵名。 最后返回關系表 [word, count] 中計數的單詞列表。 我可以計算單詞,但無法讓字典部分工作。 我的問題是:

  1. 我會以正確的方式解決這個問題嗎?
  2. 它甚至可能嗎?怎么可能?

來自互聯網的調整代碼


import collections
import re
import matplotlib.pyplot as plt
import pandas as pd
#% matplotlib inline
#file = open('PrideAndPrejudice.txt', 'r')
#file = file.read()

''' Convert excel column/ rows into a string of words'''
#text_all = pd.read_excel('C:\Python_Projects\Rake\data_file.xlsx')
#df=pd.DataFrame(text_all)
#case_words= df['case_text']
#print(case_words)
#case_concat= case_words.str.cat(sep=' ')
#print (case_concat)
text_all = ("Billy was glad to see jack. Jack was estatic to play with Billy. Jack and Billy were lonely without eachother. Jack is tall and Billy is clever.")
''' done'''
import collections
import pandas as pd
import matplotlib.pyplot as plt
#% matplotlib inline
# Read input file, note the encoding is specified here 
# It may be different in your text file

# Startwords
startwords = {'happy':'glad','sad': 'lonely','big': 'tall', 'smart': 'clever'}
#startwords = startwords.union(set(['happy','sad','big','smart']))

# Instantiate a dictionary, and for every word in the file, 
# Add to the dictionary if it doesn't exist. If it does, increase the count.
wordcount = {}
# To eliminate duplicates, remember to split by punctuation, and use case demiliters.
for word in text_all.lower().split():
    word = word.replace(".","")
    word = word.replace(",","")
    word = word.replace(":","")
    word = word.replace("\"","")
    word = word.replace("!","")
    word = word.replace("“","")
    word = word.replace("‘","")
    word = word.replace("*","")
    if word  in startwords:
        if word  in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1
# Print most common word
n_print = int(input("How many most common words to print: "))
print("\nOK. The {} most common words are as follows\n".format(n_print))
word_counter = collections.Counter(wordcount)
for word, count in word_counter.most_common(n_print):
    print(word, ": ", count)
# Close the file
#file.close()
# Create a data frame of the most common words 
# Draw a bar chart
lst = word_counter.most_common(n_print)
df = pd.DataFrame(lst, columns = ['Word', 'Count'])
df.plot.bar(x='Word',y='Count')

錯誤:空“DataFrame”:沒有要繪制的數字數據

預期輸出:

  1. 快樂 1
  2. 傷心 1
  3. 大 1
  4. 聰明 1

這是一種適用於最新版本的pandas撰寫本文時為0.25.3 )的方法:

# Setup
df = pd.DataFrame({'case_text': ["Billy was glad to see jack. Jack was estatic to play with Billy. Jack and Billy were lonely without eachother. Jack is tall and Billy is clever."]})

startwords = {"happy":["glad","estatic"],
              "sad": ["depressed", "lonely"],
              "big": ["tall", "fat"],
              "smart": ["clever", "bright"]}

# First you need to rearrange your startwords dict
startwords_map = {w: k for k, v in startwords.items() for w in v}

(df['case_text'].str.lower()     # casts to lower case
 .str.replace('[.,\*!?:]', '')   # removes punctuation and special characters
 .str.split()                    # splits the text on whitespace
 .explode()                      # expands into a single pandas.Series of words
 .map(startwords_map)            # maps the words to the startwords
 .value_counts()                 # counts word occurances
 .to_dict())                     # outputs to dict

[出去]

{'happy': 2, 'big': 1, 'smart': 1, 'sad': 1}
 if word  in startwords:
    if word  in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1

這部分似乎有問題,它檢查wordstartwords ,然后進一步檢查wordcount ,如果它在wordcount ,它應該根據您的邏輯增加字數。 所以我相信你必須切換執行。

    if word in wordcount:
        //in dict, count++
        wordcount[word] += 1
    else:
        // first time, set to 1
        wordcount[word] = 1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM