[英]How to model structured parameters in Tensorflow Probability distributions?
[英]TensorFlow Probability - want NN to output multiple distributions
我有一個簡單的 model,它當前輸出一個單一的數值,我已經適應了 output 一個使用 TFP(均值 + 標准偏差)的分布,所以我可以改為了解模型對預測的信心。
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape=[len(df.columns),], activation='relu'), # Should only be one input, so [1,]
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(2 * len(target.columns)), # there are 2 outputs, so we want a mean + standard deviation for EACH of the outputs
tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:]))
)
])
當前的 2 個 Dense 輸出指向 output 分布的均值+標准差。
在我的真實數據集中,我嘗試根據輸入數據預測兩個數值。 如何制作 model output 兩個分布? 我認為最終的 Dense 層需要有 4 個節點(2 個均值和 2 個標准差),但我不確定如何使它與分布 Lambda 一起正常工作。我希望有一個 model 可以預測這個而不是必須為每個目標 output 訓練一個 model。
編輯:我創建了這個協作,讓人們更容易地看到我正在做的事情。 我稍微簡化了這個例子,希望它更容易解釋我想要完成的事情:
https://colab.research.google.com/drive/1Wlucked4V0z-Bm_ql8XJnOJL0Gm4EwnE?usp=sharing
查看有關 TFP 中形狀的指南: https://www.tensorflow.org/probability/examples/Understanding_TensorFlow_Distributions_Shapes
IIUC 你會想要 output 一個 batch_shape = [2] 的分布。 這實際上是同一系列的 2 個分布,具有不同的參數。 使用這批分布(樣本、pdf/log_pdf 評估)完成的計算將被矢量化(並行運行)。
IIUC 並假設您想保留tfp.layers.DistributionLambda
原樣,您有幾個選項,您可以嘗試:
選項 1 :使用兩個Dense
層和Keras
函數 API:
# Your code
#[.....]
tfd = tfp.distributions
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
outputs1 = tf.keras.layers.Dense(len(target.columns))(x)
outputs2 = tf.keras.layers.Dense(len(target.columns))(x) # there are 2 outputs, so we want a mean + standard deviation for EACH of the outputs
outputs1 = sample_layer(outputs1)
outputs2 = sample_layer(outputs2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
model = get_df_model()
model.summary()
model.fit(df, target, epochs=10)
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 1)] 0 []
dense_24 (Dense) (None, 10) 20 ['input_1[0][0]']
dense_25 (Dense) (None, 10) 110 ['dense_24[0][0]']
dense_26 (Dense) (None, 2) 22 ['dense_25[0][0]']
dense_27 (Dense) (None, 2) 22 ['dense_25[0][0]']
distribution_lambda_10 (Distri ((None, 1), 0 ['dense_26[0][0]',
butionLambda) (None, 1)) 'dense_27[0][0]']
==================================================================================================
Total params: 174
Trainable params: 174
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/10
157/157 [==============================] - 1s 2ms/step - loss: 522.2677 - distribution_lambda_10_loss: 247.8716 - distribution_lambda_10_1_loss: 274.3961
Epoch 2/10
157/157 [==============================] - 1s 3ms/step - loss: 20.3496 - distribution_lambda_10_loss: 9.5429 - distribution_lambda_10_1_loss: 10.8067
Epoch 3/10
157/157 [==============================] - 1s 6ms/step - loss: 13.7444 - distribution_lambda_10_loss: 6.6085 - distribution_lambda_10_1_loss: 7.1359
Epoch 4/10
157/157 [==============================] - 1s 7ms/step - loss: 11.3713 - distribution_lambda_10_loss: 5.5506 - distribution_lambda_10_1_loss: 5.8206
Epoch 5/10
157/157 [==============================] - 1s 4ms/step - loss: 10.2081 - distribution_lambda_10_loss: 5.0250 - distribution_lambda_10_1_loss: 5.1830
Epoch 6/10
157/157 [==============================] - 0s 3ms/step - loss: 9.5528 - distribution_lambda_10_loss: 4.7256 - distribution_lambda_10_1_loss: 4.8272
Epoch 7/10
157/157 [==============================] - 0s 2ms/step - loss: 9.1495 - distribution_lambda_10_loss: 4.5393 - distribution_lambda_10_1_loss: 4.6102
Epoch 8/10
157/157 [==============================] - 1s 6ms/step - loss: 8.8837 - distribution_lambda_10_loss: 4.4159 - distribution_lambda_10_1_loss: 4.4678
Epoch 9/10
157/157 [==============================] - 0s 3ms/step - loss: 8.7027 - distribution_lambda_10_loss: 4.3319 - distribution_lambda_10_1_loss: 4.3708
Epoch 10/10
157/157 [==============================] - 0s 3ms/step - loss: 8.5743 - distribution_lambda_10_loss: 4.2724 - distribution_lambda_10_1_loss: 4.3019
<keras.callbacks.History at 0x7f51001c2f50>
請注意文檔state 關於使用DistributionLambda
時的分布情況:
默認情況下,分布通過隨機抽取表示為張量,例如 tfp.distributions.Distribution.sample
選項 2 :使用一個Dense
層並將 output 一分為二:
def get_df_model():
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
x = tf.keras.layers.Dense(2 * len(target.columns))(x)
x1, x2 = tf.split(x, num_or_size_splits=2, axis=-1)
outputs1 = sample_layer(x1)
outputs2 = sample_layer(x2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
選項 3 :使用 slice:2
# Your code
#[.....]
tfd = tfp.distributions
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :2],
scale=1e-3 + tf.math.softplus(0.05 * t[...,2:])))
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
outputs = tf.keras.layers.Dense(2*len(target.columns))(x)
outputs = sample_layer(outputs)
model = tf.keras.Model(inputs, [outputs])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
model = get_df_model()
model.summary()
model.fit(df, target, epochs=10)
另外:如果您想根據參數x1
和x2
顯式使用獨立分布,請嘗試:
def get_df_model():
inputs = tf.keras.layers.Input(shape=[len(df.columns),])
x = tf.keras.layers.Dense(10, activation='relu')(inputs)
x = tf.keras.layers.Dense(10, activation='relu')(x)
x = tf.keras.layers.Dense(2 * len(target.columns))(x)
x1, x2 = tf.split(x, num_or_size_splits=2, axis=-1)
outputs1 = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))(x1)
outputs2 = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))(x2)
model = tf.keras.Model(inputs, [outputs1, outputs2])
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
return model
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.