[英]JSON File: Count Unique Words Instead of Single Letters with Python
对于当前的研究项目,我计划计算 JSON 文件中特定 object 的唯一词。 然而,代码只是计算文件 object "Text_Main"
第一行中的单个字母。
如果不包含text = data[0]["Text Main"]
JSON object 规范,则该代码适用于完整的单词。 是否有任何巧妙的调整让代码计算单词而不是字母?
Output 目前显示(摘要):
JSON 文件具有以下结构:
[
{"Stock Symbol":"A",
"Date":"05/11/2017",
"Text Main":"I have been working",
"Text Pro":"Text sample 2",
"Text Con":"Text sample 3"}
]
相应的代码如下所示:
# Import relevant libraries
import string
import json
import csv
import textblob
# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
text = data[0]["Text Main"]
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
# Save results as CSV
with open('Glassdoor_A.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(["Word", "Occurrences"])
writer.writerows([key, d[key])
如果我理解正确,您应该遍历您的data
,获取每个 object (我称之为row
),获取其数据元素Text Main
并执行 rest 的处理。
# your importing code, etc...
# processing:
for row in data:
line = row['Text Main']
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.