JSON 文件：使用 Python 计算唯一单词而不是单个字母

Question

对于当前的研究项目，我计划计算 JSON 文件中特定 object 的唯一词。 然而，代码只是计算文件 object "Text_Main"第一行中的单个字母。

如果不包含text = data[0]["Text Main"] JSON object 规范，则该代码适用于完整的单词。 是否有任何巧妙的调整让代码计算单词而不是字母？

Output 目前显示（摘要）：

JSON 文件具有以下结构：

[
{"Stock Symbol":"A",
"Date":"05/11/2017",
"Text Main":"I have been working",
"Text Pro":"Text sample 2",
"Text Con":"Text sample 3"}
]

相应的代码如下所示：

# Import relevant libraries
import string
import json
import csv
import textblob

# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
text = data[0]["Text Main"]

# Create an empty dictionary
d = dict()

# Loop through each line of the file
for line in text:
    # Remove the leading spaces and newline character
    line = line.strip()

    # Convert the characters in line to
    # lowercase to avoid case mismatch
    line = line.lower()

    # Remove the punctuation marks from the line
    line = line.translate(line.maketrans("", "", string.punctuation))

    # Split the line into words
    words = line.split(" ")

    # Iterate over each word in line
    for word in words:
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1

# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

# Save results as CSV
with open('Glassdoor_A.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Word", "Occurrences"])
    writer.writerows([key, d[key])

Answer 1

如果我理解正确，您应该遍历您的data ，获取每个 object （我称之为row ），获取其数据元素Text Main并执行 rest 的处理。

# your importing code, etc...

# processing:
for row in data:
    line = row['Text Main']
    # Remove the leading spaces and newline character
    line = line.strip()

    # Convert the characters in line to
    # lowercase to avoid case mismatch
    line = line.lower()

    # Remove the punctuation marks from the line
    line = line.translate(line.maketrans("", "", string.punctuation))

    # Split the line into words
    words = line.split(" ")

    # Iterate over each word in line
    for word in words:
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1

JSON 文件：使用 Python 计算唯一单词而不是单个字母

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-04 16:14:22

JSON 文件：使用 Python 计算唯一单词而不是单个字母

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-04 16:14:22

解决方案1
1 已采纳 2020-05-04 16:14:22