如何計算文本文件中某個元素中某個單詞的出現次數？

Question

這是我到目前為止的代碼，我的問題是它遍歷文本文件中的每一個單詞，但我只希望它通過每行的最后一個單詞 go （書的類型：宗教等）

import string 

    # Open the file in read mode 
    text = open("book_data_file.txt", "r") 

    # Create an empty dictionary
    d = dict()


    # Loop through each line of the file 
    for line in text: 
            # Remove the leading spaces and newline character 
            line = line.strip() 

            # Convert the characters in line to 
            # lowercase to avoid case mismatch 
            line = line.lower() 

            # Remove the punctuation marks from the line 
            line = line.translate(line.maketrans("", "", string.punctuation)) 

            # Split the line into words 
            words = line.split(" ")

            

            # Iterate over each word in line 
            for word in words:
                    # Check if the word is already in dictionary 
                    if word in d: 
                            # Increment count of word by 1 
                            d[word] = d[word] + 1
                    else: 
                            # Add the word to dictionary with count 1 
                            d[word] = 1

    # Print the contents of dictionary 
    for key in list(d.keys()): 
            print(key, ":", d[key])

這是文本文件書文本文件的屏幕截圖

我想要的 output 是宗教：4 科學：3 小說：2 等等。

任何幫助，將不勝感激

Answer 1

使用pandas ：

df = pd.read_csv('file.txt', sep=',')
words_count = df['GENRE'].value_counts()

編輯：

只需使用索引取最后一個單詞： word = line.split(" ")[-1]忽略第一行，因為它們有標題，如果有任何新行也跳過。 通過使用：

if idx==0 or len(line)==0:
     continue

書.txt：

a, b, c, d
a1, b1, c1, d1
a2, b2, c2, d2
a3, b3, c3, d1
a4, b4, c4, d1
a5, b5, c5, d3

import string 

# Open the file in read mode 
text = open("book.txt", "r")

# Create an empty dictionary
d = dict()


# Loop through each line of the file 
for idx, line in enumerate(text): 

        # Remove the leading spaces and newline character 
        line = line.strip()
        
        if idx==0 or len(line)==0:
            continue

        # Convert the characters in line to 
        # lowercase to avoid case mismatch 
        line = line.lower() 

        # Remove the punctuation marks from the line 
        line = line.translate(line.maketrans("", "", string.punctuation)) 

        # Split the line into words 
        word = line.split(" ")[-1]

        # Check if the word is already in dictionary 
        if word in d: 
                # Increment count of word by 1 
                d[word] = d[word] + 1
        else: 
                # Add the word to dictionary with count 1 
                d[word] = 1

# Print the contents of dictionary 
for key in list(d.keys()): 
        print(key, ":", d[key])

d1 : 3
d2 : 1
d3 : 1

Answer 2

如果您不想使用 pandas，那么使用dict是正確的方法。 實際上，標准庫中有一個dict的子類，它可以完全滿足您的要求： collections.Counter 。

import string
from collections import Counter

def tokenize(line: str):
    # Remove the leading spaces and newline character 
    line = line.strip() 

    # Convert the characters in line to 
    # lowercase to avoid case mismatch 
    line = line.lower() 

    # Remove the punctuation marks from the line 
    line = line.translate(line.maketrans("", "", string.punctuation)) 

    # Split the line into words 
    words = line.split(" ")


def iter_tokens(lines):
    for line in lines:
        yield from tokenize(line)


# Open the file in read mode
with open("book.txt", "r") as text:
    counts = Counter(iter_tokens(content))

如何計算文本文件中某個元素中某個單詞的出現次數？

問題描述

2 個解決方案

解決方案1
2 已采納 2020-12-04 13:50:39

解決方案2
1 2020-12-04 14:47:32

如何計算文本文件中某個元素中某個單詞的出現次數？

問題描述

2 個解決方案

解決方案1 2 已采納 2020-12-04 13:50:39

解決方案2 1 2020-12-04 14:47:32

解決方案1
2 已采納 2020-12-04 13:50:39

解決方案2
1 2020-12-04 14:47:32