[英]How to count occurences of a word in a certain element in a text file?
這是我到目前為止的代碼,我的問題是它遍歷文本文件中的每一個單詞,但我只希望它通過每行的最后一個單詞 go (書的類型:宗教等)
import string
# Open the file in read mode
text = open("book_data_file.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
這是文本文件書文本文件的屏幕截圖
我想要的 output 是宗教:4 科學:3 小說:2 等等。
任何幫助,將不勝感激
使用pandas
:
df = pd.read_csv('file.txt', sep=',')
words_count = df['GENRE'].value_counts()
編輯:
只需使用索引取最后一個單詞: word = line.split(" ")[-1]
忽略第一行,因為它們有標題,如果有任何新行也跳過。 通過使用:
if idx==0 or len(line)==0:
continue
書.txt:
a, b, c, d
a1, b1, c1, d1
a2, b2, c2, d2
a3, b3, c3, d1
a4, b4, c4, d1
a5, b5, c5, d3
import string
# Open the file in read mode
text = open("book.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for idx, line in enumerate(text):
# Remove the leading spaces and newline character
line = line.strip()
if idx==0 or len(line)==0:
continue
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
word = line.split(" ")[-1]
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
d1 : 3
d2 : 1
d3 : 1
如果您不想使用 pandas,那么使用dict
是正確的方法。 實際上,標准庫中有一個dict
的子類,它可以完全滿足您的要求: collections.Counter
。
import string
from collections import Counter
def tokenize(line: str):
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
words = line.split(" ")
def iter_tokens(lines):
for line in lines:
yield from tokenize(line)
# Open the file in read mode
with open("book.txt", "r") as text:
counts = Counter(iter_tokens(content))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.