Keras LSTM (TensorFlow 2.x) 中的動態批量大小

Question

我有一個包含 27k 記錄作為訓練集的數據框和另一個包含 4k 記錄的測試數據集。 兩個數據集各有 25 個特征。

x_train shape: (27000, 25), 
x_test shape: (4000, 25)

訓練集中的數據示例：

|Subject ID|Feat_1|Feat_2|Feat_X|Hr_count|Label|
|s0001     |    89| 31   |  43  |   1    |  0  |
|s0001     |    94| 32   |  68  |   2    |  0  |
|s0001     |    38| 90   |  86  |   3    |  0  |
|s0001     |    79| 34   |  78  |   4    |  1  |
|s0001     |    85| 24   |  70  |   5    |  1  |
|s0002     |    7 | 9    |  32  |   1    |  0  |
|s0002     |    60| 56   |  72  |   2    |  0  |
|s0002     |    68| 72   |  23  |   3    |  0  |
|s0003     |    26| 88   |  1   |   1    |  0  |
|s0004     |    45| 27   |  22  |   1    |  0  |
|s0004     |    10| 80   |  67  |   2    |  0  |
|s0004     |    71| 48   |  21  |   3    |  0  |
|s0004     |    58| 9    |  60  |   4    |  1  |

Hr_count：代表每個受試者在實驗中停留的小時數

Label：這是我構建分類器時的目標變量。 它表示受試者在實驗中停留后收到的標志

我在定義如下的 LSTM RNN model 上訓練了數據：

model = Sequential()
model.add(LSTM(100, activation='tanh', return_sequences=True, input_shape=(1, 25)))
model.add(LSTM(49, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))
 
model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
    batch_size=32,
    epochs=200)

問題：

由於數據的順序性，我想在擬合 model 時將動態 batch_size 參數定義為訓練中每個主題的最大 Hr_count 數，以便 LSTM 可以分別獲取每個主題的數據之間的關系（每批將僅包含每個主題的數據）。 這意味着每批包含 1 個主題的樣本，按 Hr_count 排序。

在 Keras 或 TensorFlow v2.x 中似乎沒有動態 batch_size 的靈活性（與 TensorFlow v1.x 相反）...

如何為 batch_size 參數定義批量大小是動態的？

Answer 1

您可以為每個主題創建一個調用 model.fit() function 的循環，然后根據當前的 Hr_count 設置批量大小

for subject in list_of_subjects:
    hr_count,data = subject
    x_train,y_train = data
    model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
    batch_size=hr_count,
    epochs=200)

此代碼運行的 list_subject 必須具有以下形狀

[[Hr_count,[x_triain,y_train]]

Answer 2

我過濾了主題 ID，然后將數據段提供給 model.fit()。 model 似乎學得很快。 在更大的數據集上嘗試。 代碼被泛化以允許更多功能。

import pandas as pd
from io import StringIO
import io
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
from sklearn.preprocessing import LabelEncoder

data="""SubjectID,Feat_1,Feat_2,Feat_X,Hr_count,Label
s0001,89,31,43,1,0
s0001,94,32,68,2,0
s0001,38,90,86,3,0
s0001,79,34,78,4,1
s0001,85,24,70,5,1
s0002,7 ,9 ,32,1,0
s0002,60,56,72,2,0
s0002,68,72,23,3,0
s0003,26,88,1 ,1,0
s0004,45,27,22,1,0
s0004,10,80,67,2,0
s0004,71,48,21,3,0
s0004,58,9 ,60,4,1
"""

df=pd.read_csv(io.StringIO(data),sep=",")
df.drop(columns='Hr_count',inplace=True)
encoder=LabelEncoder()
df['SubjectID']=encoder.fit_transform(df['SubjectID'])

print(df)

X_columns=[x for x in df.columns if x!='Label']
features=len(X_columns)
model = Sequential()
model.add(LSTM(100, activation='tanh', return_sequences=True, input_shape=(1, features)))
model.add(LSTM(49, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer="rmsprop", loss='binary_crossentropy', metrics=['accuracy'])


grouped=df.groupby('SubjectID')



for group in grouped:
    df_batch=pd.DataFrame(columns=['SubjectID','Feat_1','Feat_2','Feat_X','Hr_count','Label'])
    for subjectID in group:
        filter=df['SubjectID']==subjectID 
        for key,item in df[filter].dropna().iterrows():
            df_batch=df_batch.append({'SubjectID':item['SubjectID'],'Feat_1':item['Feat_1'],'Feat_2':item['Feat_2'],'Feat_X':item['Feat_X'],'Label':item['Label']},ignore_index=True)
    #print("\n",df_batch)
    X=df_batch[X_columns]
    X = np.resize(X,(X.shape[0],1,X.shape[1]))
    y=df_batch['Label']
    print("\n",X)
    model.fit(X,y,batch_size=len(X), 
          epochs=10)

output：

  Epoch 10/10
  1/1 [==============================] - 0s 13ms/step - loss: 0.0588 - accuracy: 1.0000

Keras LSTM (TensorFlow 2.x) 中的動態批量大小

問題描述

2 個解決方案

解決方案1
0 2022-01-18 15:18:01

解決方案2
0 2022-01-19 16:32:49

Keras LSTM (TensorFlow 2.x) 中的動態批量大小

問題描述

2 個解決方案

解決方案1 0 2022-01-18 15:18:01

解決方案2 0 2022-01-19 16:32:49

解決方案1
0 2022-01-18 15:18:01

解決方案2
0 2022-01-19 16:32:49