簡體   English   中英

如何讀取json文件並擬合lstm模型?

[英]How to read json file and fit lstm model?

我正在嘗試在 HP 新聞數據集上應用 LSTM。 數據采用 JSON 格式 ( https://www.kaggle.com/rmisra/news-category-dataset )。 我已經嘗試過這段代碼並出現錯誤。 不知道這段代碼有什么問題?

from keras.layers import LSTM, Activation, Dense, Dropout, Input, Embedding
from keras.optimizers import RMSprop
from keras.preprocessing.text import Tokenizer
from keras.preprocessing import sequence
from keras.utils import to_categorical
from keras.callbacks import EarlyStopping
import json
from sklearn.preprocessing import LabelBinarizer


with open('News_Category_Dataset_v2.json', 'r') as f:
    train = json.load(f)
Y_train = list(train.values())
lb = LabelBinarizer()
X_train = lb.fit_transform(list(train.keys()))
##
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.15)
##
max_words = 1000
max_len = 150
tok = Tokenizer(num_words=max_words)
tok.fit_on_texts(X_train)
sequences = tok.texts_to_sequences(X_train)
sequences_matrix = sequence.pad_sequences(sequences,maxlen=max_len)
def RNN():
    inputs = Input(name='inputs',shape=[max_len])
    layer = Embedding(max_words,50,input_length=max_len)(inputs)
    layer = LSTM(64)(layer)
    layer = Dense(256,name='FC1')(layer)
    layer = Activation('relu')(layer)
    layer = Dropout(0.5)(layer)
    layer = Dense(1,name='out_layer')(layer)
    layer = Activation('softmax')(layer)
    model = Model(inputs=inputs,outputs=layer)
    return model
model = RNN()
model.summary()
model.compile(loss='binary_crossentropy',optimizer=RMSprop(),metrics=['accuracy'])
model.fit(sequences_matrix,Y_train,batch_size=128,epochs=10,
          validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss',min_delta=0.0001)])

得到這些錯誤

 Traceback (most recent call last):
     Traceback (most recent call last):
  File ".\Hpnews.py", line 30, in <module>
    train = json.load(f)
  File "C:\Users\a\Anaconda3\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
  File "C:\Users\a\Anaconda3\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "C:\Users\a\Anaconda3\lib\json\decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 366)

這是我的 json 文件格式

"root":{6 items 
"category":string"CRIME" 
"headline":string"There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV" 
"authors":string"Melissa Jeltsen" 
"link":string"huffingtonpost.com/entry/…" "short_description":string"She left her husband. He killed their children. Just another day in America."
 "date":string"2018-05-26" } 

JSON 不是典型的 JSON,而是 ndJSON(“換行符分隔的 JSON ”),它不會被json.load打開。

您應該使用熊貓來加載您的數據:

import pandas as pd
data = pd.read_json('News_Category_Dataset_v2.json', lines=True)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM