這是 python 在我嘗試標記推文時返回的內容：TypeError: list indices must be integers or slices, not str

Question

我正在嘗試標記我之前保存在 json 文件中的所有推文。 我遵循這個例子： https://marcobonzanini.com/2015/03/09/mining-twitter-data-with-python-part-2/

import re
import json
 
emoticons_str = r"""
    (?:
        [:=;] # Eyes
        [oO\-]? # Nose (optional)
        [D\)\]\(\]/\\OpP] # Mouth
    )"""
 
regex_str = [
    emoticons_str,
    r'<[^>]+>', # HTML tags
    r'(?:@[\w_]+)', # @-mentions
    r"(?:\#+[\w_]+[\w\'_\-]*[\w_]+)", # hash-tags
    r'http[s]?://(?:[a-z]|[0-9]|[$-_@.&amp;+]|[!*\(\),]|(?:%[0-9a-f][0-9a-f]))+', # URLs
 
    r'(?:(?:\d+,?)+(?:\.?\d+)?)', # numbers
    r"(?:[a-z][a-z'\-_]+[a-z])", # words with - and '
    r'(?:[\w_]+)', # other words
    r'(?:\S)' # anything else
]
    
tokens_re = re.compile(r'('+'|'.join(regex_str)+')', re.VERBOSE | re.IGNORECASE)
emoticon_re = re.compile(r'^'+emoticons_str+'$', re.VERBOSE | re.IGNORECASE)
 
def tokenize(s):
    return tokens_re.findall(s)
 
def preprocess(s, lowercase=False):
    tokens = tokenize(s)
    if lowercase:
        tokens = [token if emoticon_re.search(token) else token.lower() for token in tokens]
    return tokens

當我在最后添加這個時，一切正常：

tweet = 'RT @marcobonzanini: just an example! :D http://example.com #NLP'
print(preprocess(tweet))

我想標記我保存在 JSON 文件中的推文，網站建議這樣做：

with open('mytweets.json', 'r') as f:
    for line in f:
        tweet = json.loads(line)
        tokens = preprocess(tweet['text'])
        do_something_else(tokens)

這就是我試圖打開我的 JSON 文件的方式：

with open('data/digitalhealth.json', 'r') as f:
...     for line in f:
...         tweet = json.loads(line)
...         tokens = preprocess(tweet['text'])
...         do_something_else(tokens)

這就是 python 返回的內容：

Traceback (most recent call last):
File "<stdin>", line 4, in <module>
TypeError: list indices must be integers, not str

有誰知道如何解決這個問題？ 我對這一切都很陌生，我真的不知道該怎么做。

這是我從 Twitter 的 API 收集數據的代碼：

import tweepy
import json
API_KEY = 'xxx'
API_SECRET = 'xxx'
TOKEN_KEY = 'xxx'
TOKEN_SECRET = 'xxx'


auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(TOKEN_KEY, TOKEN_SECRET)



api = tweepy.API(auth, wait_on_rate_limit=True)

query = '#digitalhealth'
cursor = tweepy.Cursor(api.search, q=query, lang="en")


for page in cursor.pages():
    tweets = []
    for item in page:
        tweets.append(item._json)

with open('Twitter_project/digitalhealth.json', 'wb') as outfile:
    json.dump(tweets, outfile)

我現在如何更改它，以便我只有字典？ 感謝大家的幫助！ 對此，我真的非常感激

Answer 1

出於某種原因，您將 JSON 字典存儲在列表中...您應該嘗試將它們存儲為字典，因為這對您來說會容易得多，但是如果您現在想訪問它們，那么只需執行以下操作： tweet[0] to訪問字典，然后您可以從那里訪問字典數據，例如tweet[0]['text'] 。 不過，請考慮正確重新格式化 JSON。

這是 python 在我嘗試標記推文時返回的內容：TypeError: list indices must be integers or slices, not str

問題描述

1 個解決方案

解決方案1
0 2020-12-14 00:22:31

這是 python 在我嘗試標記推文時返回的內容：TypeError: list indices must be integers or slices, not str

問題描述

1 個解決方案

解決方案1 0 2020-12-14 00:22:31

解決方案1
0 2020-12-14 00:22:31