Python 循环：类型错误：字符串索引必须是整数

Question

For a current research project, I am planning to read the JSON object "Main_Text" within a pre-defined time range on basis of Python/Pandas.对于当前的研究项目，我计划在 Python/Pandas 的基础上在预先定义的时间范围内阅读 JSON object "Main_Text"。 When running the word-counting loop, the code however yields the error TypeError: string indices must be integers for line = row['Text Main'] .但是，在运行字数统计循环时，代码会产生错误TypeError: string indices must be integers for line = row['Text Main'] 。

Text Main only contains strings/text and no integers. Text Main仅包含字符串/文本，不包含整数。 I have alreay been through trouble-shooting threads but not found a solution to this problem yet.我已经通过故障排除线程，但还没有找到解决这个问题的方法。 Is there any helpful tweak to make this work?是否有任何有用的调整来完成这项工作？

The JSON file has the following structure: JSON 文件具有以下结构：

[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]

And the relevant code excerpt looks like this:相关代码摘录如下所示：

import string
import json
import csv

import pandas as pd
import datetime

import numpy as np


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()


# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"

after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date

between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

print(filtered_dates)


# Processing
for row in filtered_dates:
    line = row['Text Main']

Answer 1

filtered_dates will return an iterator on the column names which are strings. filtered_dates将在列名上返回一个迭代器，这些列名是字符串。 If you want to iterate over the rows you should use iterrows().如果你想遍历行，你应该使用 iterrows()。

Something like that should work:像这样的东西应该工作：

import string
import json
import csv

import pandas as pd
import datetime

import numpy as np


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()


# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"

after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date

between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

# Processing
for index, row in filtered_dates.iterrows():
    line = row['Text Main']

Python 循环：类型错误：字符串索引必须是整数

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-13 10:05:50

Python 循环：类型错误：字符串索引必须是整数

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-13 10:05:50

解决方案1
2 已采纳 2020-05-13 10:05:50