简体   繁体   English

Python 循环:类型错误:字符串索引必须是整数

[英]Python Loop: TypeError: string indices must be integers

For a current research project, I am planning to read the JSON object "Main_Text" within a pre-defined time range on basis of Python/Pandas.对于当前的研究项目,我计划在 Python/Pandas 的基础上在预先定义的时间范围内阅读 JSON object "Main_Text"。 When running the word-counting loop, the code however yields the error TypeError: string indices must be integers for line = row['Text Main'] .但是,在运行字数统计循环时,代码会产生错误TypeError: string indices must be integers for line = row['Text Main']

Text Main only contains strings/text and no integers. Text Main仅包含字符串/文本,不包含整数。 I have alreay been through trouble-shooting threads but not found a solution to this problem yet.我已经通过故障排除线程,但还没有找到解决这个问题的方法。 Is there any helpful tweak to make this work?是否有任何有用的调整来完成这项工作?

The JSON file has the following structure: JSON 文件具有以下结构:

[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]

And the relevant code excerpt looks like this:相关代码摘录如下所示:

import string
import json
import csv

import pandas as pd
import datetime

import numpy as np


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()


# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"

after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date

between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

print(filtered_dates)


# Processing
for row in filtered_dates:
    line = row['Text Main']

filtered_dates will return an iterator on the column names which are strings. filtered_dates将在列名上返回一个迭代器,这些列名是字符串。 If you want to iterate over the rows you should use iterrows().如果你想遍历行,你应该使用 iterrows()。

Something like that should work:像这样的东西应该工作:

import string
import json
import csv

import pandas as pd
import datetime

import numpy as np


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()


# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"

after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date

between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

# Processing
for index, row in filtered_dates.iterrows():
    line = row['Text Main']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM