[英]Python Loop: TypeError: string indices must be integers
For a current research project, I am planning to read the JSON object "Main_Text" within a pre-defined time range on basis of Python/Pandas.对于当前的研究项目,我计划在 Python/Pandas 的基础上在预先定义的时间范围内阅读 JSON object "Main_Text"。 When running the word-counting loop, the code however yields the error
TypeError: string indices must be integers
for line = row['Text Main']
.但是,在运行字数统计循环时,代码会产生错误
TypeError: string indices must be integers
for line = row['Text Main']
。
Text Main
only contains strings/text and no integers. Text Main
仅包含字符串/文本,不包含整数。 I have alreay been through trouble-shooting threads but not found a solution to this problem yet.我已经通过故障排除线程,但还没有找到解决这个问题的方法。 Is there any helpful tweak to make this work?
是否有任何有用的调整来完成这项工作?
The JSON file has the following structure: JSON 文件具有以下结构:
[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]
And the relevant code excerpt looks like this:相关代码摘录如下所示:
import string
import json
import csv
import pandas as pd
import datetime
import numpy as np
# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])
# Create an empty dictionary
d = dict()
# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"
after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date
between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]
print(filtered_dates)
# Processing
for row in filtered_dates:
line = row['Text Main']
filtered_dates
will return an iterator on the column names which are strings. filtered_dates
将在列名上返回一个迭代器,这些列名是字符串。 If you want to iterate over the rows you should use iterrows().如果你想遍历行,你应该使用 iterrows()。
Something like that should work:像这样的东西应该工作:
import string
import json
import csv
import pandas as pd
import datetime
import numpy as np
# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])
# Create an empty dictionary
d = dict()
# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"
after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date
between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]
# Processing
for index, row in filtered_dates.iterrows():
line = row['Text Main']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.