使用相關系數作為 R keras/tensorflow 中的損失 function

Question

我正在嘗試使用 RNN 生成與另一個向量y正交的向量x （非常簡化的情況），並進一步推斷感興趣的參數。 合理的是，如果您知道生成變量/向量的真實分布，理想情況下只有很少的參數，您可以網格搜索可能的參數空間。 例如，如果變量x是由f_x(.;a)生成的，給定參數a的范圍，我們可以通過小步長插入不同的a值。 一旦我找到了創建正交向量的 a，我就聲稱a是真正的參數。

通常真正的 a可以是其他變量的 function 而不是常數，因此我想我可以使用神經網絡來 model a 。 我對找到正確的形式不感興趣，而是對a做出正確的預測。 數據是縱向的，因此a也可能隨時間變化。 這讓我使用了 RNN。 使用return_sequence='T'選項，我希望可以使用每個時間點的返回值來計算預測，那么損失 function 將是y_pred=f_x(.;a)和y_true=y的相關性。

我讀了一些帖子說相關系數可以用作NN中的損失function，例如here和here 。 所以我在 go 前面編碼了自定義的損失 function。 我在想如果我的y_true和y_pred大小都是[100,3,1] ，那么損失應該是大小[3,1] 。 但是，我注意到返回的損失始終是一個標量，我檢查了諸如此類的其他討論，似乎最終損失是批次的平均值。 也就是說，將對每個樣本計算損失，並對整個批次進行平均。 最重要的是，它可能是不同時間點的進一步總和/平均值以返回標量。 我對后者完全沒問題，因為我的最佳目標是最小化所有時間點的相關性。 但是我迷失了每個樣本的損失。 如果損失是 MSE/MAE，那就很容易理解了。 對我來說，我希望直接按批次計算損失，而不是按樣本然后平均計算。 這是不可行的嗎？ 我在某處讀到說你不能使用正率作為損失，因為它對於每個樣本都是不可計算的，所以優化問題不是凸的。 但是要更新網絡，我只需要為每批提供一個標量損失，對嗎？ 網絡應該能夠知道 go 到哪里，以盡量減少損失。

下面是我使用的自定義loss function。 如果我輸入一個[4,3,1]張量，它會給我一個[3,1]張量，並且與直接 R function 的結果一致。

> x=array(c(2, 3,  4,
+           5, 6,  7,
+           2, 5,  0,
+           1,0,1), dim=c(4,3,1))
> y=array(c(1,2,3,0,
+           1,6,5,2,
+           2,3,4,1), dim=c(4,3,1))
> 
> x[1,,]
[1] 2 6 0
> y[1,,]
[1] 1 1 2
> 
> ##4 subjects, 3 time point, at each collected one covariate
> (cor(x[,1,],y[,1,]))^2
[1] 0.04
> (cor(x[,2,],y[,2,]))^2
[1] 0.01680672
> (cor(x[,3,],y[,3,]))^2
[1] 0.2
> 
> ##the correlation is calculated at each time point
> correlation_coefficient_loss<-function(y_true, y_pred){
+   shape<-k_cast(k_shape(x)[1],dtype='float32')
+   x = y_true
+   y = y_pred
+   mx = k_sum(x,axis=1)/shape
+   my = k_sum(y,axis=1)/shape
+   xm = x-mx
+   ym = y-my
+   r_num = k_sum(xm*ym,axis=1)
+   r_den = k_sqrt(k_sum(k_square(xm),axis=1)*k_sum(k_square(ym),axis=1))
+   r = r_num / (r_den+k_epsilon())
+   return (k_square(r))
+ }
> 
> k_eval(correlation_coefficient_loss(x,y))
           [,1]
[1,] 0.04000000
[2,] 0.01680672
[3,] 0.19999998

當我在神經網絡中使用這個 function 作為損失 function 時，相關性似乎是按樣本計算的，這是絕對錯誤的。 我用來定義順序 model 的代碼是：

k_clear_session()
input_cs <- layer_input(shape = c(Nt, 3), 
                        dtype = 'float64', 
                        name = 'cs_input')

output_y = input_cs %>%
  # layer_masking() %>%
  layer_dense(units=1) 


model_gest <- keras_model(
  inputs = c(input_cs),
  outputs = c(output_y))

summary(model_gest)
Model: "model"
______________________________________________________________________________________________________
Layer (type)                                 Output Shape                             Param #         
======================================================================================================
cs_input (InputLayer)                        [(None, 3, 3)]                           0               
______________________________________________________________________________________________________
dense (Dense)                                (None, 3, 1)                             4               
======================================================================================================
Total params: 4
Trainable params: 4
Non-trainable params: 0
______________________________________________________________________________________________________

我現在只使用最簡單的密集層來計算 output 向量。 然后我將任意x輸入到 model 中，並使用一些初始權重 model output y_pred並計算f(x)和y之間的相關性。 您可能會得到不同的數字，但代碼提供了一個最小的示例。 我只是無法恢復訓練和驗證損失。

> ##the correlation is calculated at each time point
> ##but need to consider masking - later
> correlation_coefficient_loss<-function(y_true, y_pred){
+   shape<-k_cast(k_shape(x)[1],dtype='float64')
+   x = y_true
+   y = y_pred
+   mx = k_sum(x,axis=1)/shape
+   my = k_sum(y,axis=1)/shape
+   xm = x-mx
+   ym = y-my
+   r_num = k_sum(xm*ym,axis=1)
+   r_den = k_sqrt(k_sum(k_square(xm),axis=1)*k_sum(k_square(ym),axis=1))
+   r = r_num / (r_den+k_epsilon())
+   r = k_maximum(k_minimum(r, 1.0), -1.0)
+   return (k_square(r))
+ }
> 
> model_gest %>%
+   compile(loss=correlation_coefficient_loss,
+           optimizer=optimizer_rmsprop(learning_rate=0.01))
> 
> x=array(c(rnorm(36)), dim=c(4,3,3))
> 
> ##create validation data, train on three samples and validate on 1
> val_x=x[4,,]
> val_y=y[4,,]
> dim(val_x) <- c(1,dim(val_x))
> dim(val_y) <- c(1,3,1)
> 
> model_gest%>%fit(x=list(x[1:3,,]),
+                  y=list(y[1:3,,]),
+                  epochs=1,
+                  validation_data = list(val_x,val_y)
+ )
Train on 3 samples, validate on 1 samples
3/3 [==============================] - 0s 153ms/sample - loss: 0.0414 - val_loss: 0.6667
> 
> weights1<-keras::get_weights(model_gest)
> 
> ##after one epoch of training, manually calculate the loss
> psi_output <- predict(model_gest,list(x))
> ##sum or average of this tensor is not the training loss
> (cor(psi_output[c(1:3),1,],y[c(1:3),1,]))^2
[1] 0.05621052
> (cor(psi_output[c(1:3),2,],y[c(1:3),2,]))^2
[1] 0.2522947
> (cor(psi_output[c(1:3),3,],y[c(1:3),3,]))^2
[1] 0.3354063
> ##I am validating on one sample, the correlation coefficient should be NA, how the val_loss
> ##is calculated in keras? if it's calculable because of the fuzz factor, the result should be 0
> (cor(psi_output[4,1,],y[4,1,]))^2
[1] NA
> (cor(psi_output[4,2,],y[4,2,]))^2
[1] NA
> (cor(psi_output[4,3,],y[4,3,]))^2
[1] NA

對不起我的長描述。 我的問題總結為：

相關性可以用作損失function嗎？
如果可以，我的代碼有什么問題？ 一般來說，我不希望按樣本計算損失，而是按批次計算。 那可能嗎？ 如果我使用的是 R keras，我該怎么辦？ 我看到損失 function 被包裹在 Python 中進行訓練，也許我必須直接在 Python 中編碼才能解決這個問題？
這是一個最小的例子，我也想考慮屏蔽。 在常規損失計算中，每個樣本損失乘以掩蔽張量之前的總和/平均。 我也許能夠在 R keras 的某處生成一個掩蔽張量，並將其插入我的損失 function。 但是稍后會再次提取屏蔽信息，因為我定義的所有內容似乎都適用於單個樣本，如果有任何屏蔽信息結轉， losses.py將再次處理它。

Answer 1

抱歉，從分析的角度來看，您不能。 當我們優化損失函數時，我們在深度學習中所做的最小化 2 個數據點的校正（根據預測值的基本事實）是無關緊要的。

此外，對於要與 NN 一起使用的任何損失 function，您必須確保它是可微的。 因此，即使您決定 go 與 corrcoef 作為損失 function，它是 function 可區分的嗎？

使用相關系數作為 R keras/tensorflow 中的損失 function

問題描述

1 個解決方案

解決方案1
0 2022-01-12 20:50:29

使用相關系數作為 R keras/tensorflow 中的損失 function

問題描述

1 個解決方案

解決方案1 0 2022-01-12 20:50:29

解決方案1
0 2022-01-12 20:50:29