如何控制输入特征是否仅对 Tensorflow 神经网络后续层中的一个神经元有贡献？

Question

我正在尝试制作最基本的基本神经网络，以熟悉 Tensorflow 2.x 中的功能 API。

基本上我想要做的是我的简化虹膜数据集（即 setosa 与否）

使用 4 个特征作为输入
密集层 3
Sigmoid 激活 function
2 层密集层（每类一个）
Softmax 激活
二进制交叉熵/对数损失作为我的损失 function

但是，我不知道如何控制 model 的一个关键方面。 也就是说，如何确保来自输入层的每个特征仅对后续密集层中的一个神经元有贡献？ 另外，我怎样才能允许一个特征对多个神经元做出贡献？

从文档中我不清楚这一点。

# Load data
from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
X, y = load_iris(return_X_y=True, as_frame=True)
X = X.astype("float32")
X.index = X.index.map(lambda i: "iris_{}".format(i))
X.columns = X.columns.map(lambda j: j.split(" (")[0].replace(" ","_"))
y.index = X.index
y = y.map(lambda i:iris.target_names[i])
y_simplified = y.map(lambda i: {True:1, False:0}[i == "setosa"])
y_simplified = pd.get_dummies(y_simplified, columns=["setosa", "not_setosa"])

# Traing test split
from sklearn.model_selection import train_test_split
seed=0
X_train,X_test, y_train,y_test= train_test_split(X,y_simplified, test_size=0.3, random_state=seed)

# Simple neural network
import tensorflow as tf
tf.random.set_seed(seed)


# Input[4 features] -> Dense layer of 3 neurons -> Activation function -> Dense layer of 2 (one per class) -> Softmax
inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)
x = tf.keras.layers.Activation(tf.nn.sigmoid)(x)
x = tf.keras.layers.Dense(2)(x)
outputs = tf.keras.layers.Activation(tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="simple_binary_iris")
model.compile(loss="binary_crossentropy", metrics=["accuracy"] )
model.summary()

history = model.fit(X_train, y_train, batch_size=64, epochs=10, validation_split=0.2)

test_scores = model.evaluate(X_test, y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])

结果：

Model: "simple_binary_iris"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_44 (InputLayer)        [(None, 4)]               0         
_________________________________________________________________
dense_96 (Dense)             (None, 3)                 15        
_________________________________________________________________
activation_70 (Activation)   (None, 3)                 0         
_________________________________________________________________
dense_97 (Dense)             (None, 2)                 8         
_________________________________________________________________
activation_71 (Activation)   (None, 2)                 0         
=================================================================
Total params: 23
Trainable params: 23
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
2/2 [==============================] - 0s 40ms/step - loss: 0.6344 - accuracy: 0.6667 - val_loss: 0.6107 - val_accuracy: 0.7143
Epoch 2/10
2/2 [==============================] - 0s 6ms/step - loss: 0.6302 - accuracy: 0.6667 - val_loss: 0.6083 - val_accuracy: 0.7143
Epoch 3/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6278 - accuracy: 0.6667 - val_loss: 0.6056 - val_accuracy: 0.7143
Epoch 4/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6257 - accuracy: 0.6667 - val_loss: 0.6038 - val_accuracy: 0.7143
Epoch 5/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6239 - accuracy: 0.6667 - val_loss: 0.6014 - val_accuracy: 0.7143
Epoch 6/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6223 - accuracy: 0.6667 - val_loss: 0.6002 - val_accuracy: 0.7143
Epoch 7/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6209 - accuracy: 0.6667 - val_loss: 0.5989 - val_accuracy: 0.7143
Epoch 8/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6195 - accuracy: 0.6667 - val_loss: 0.5967 - val_accuracy: 0.7143
Epoch 9/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6179 - accuracy: 0.6667 - val_loss: 0.5953 - val_accuracy: 0.7143
Epoch 10/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6166 - accuracy: 0.6667 - val_loss: 0.5935 - val_accuracy: 0.7143
2/2 [==============================] - 0s 607us/step - loss: 0.6261 - accuracy: 0.6444
Test loss: 0.6261375546455383
Test accuracy: 0.644444465637207

Answer 1

如何确保输入层的每个特征仅对后续密集层中的一个神经元有贡献？

每个特征有一个输入层，并将每个输入层馈送到一个单独的密集层。 稍后您可以连接所有密集层的 output 并继续。

注意：一个神经元可以接受任何大小的输入（在这种情况下，输入大小为 1，因为您希望神经元使用一个特征）和 output 大小（如果始终为 1）。具有n单元的密集层将具有n神经元和所以将有 output 大小为n 。

工作样本

import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Model architecutre 
x1 = tf.keras.Input(shape=(1,))
x2 = tf.keras.Input(shape=(1,))
x3 = tf.keras.Input(shape=(1,))
x4 = tf.keras.Input(shape=(1,))

x1_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x1)
x2_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x2)
x3_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x3)
x4_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x4)

merged = tf.keras.layers.concatenate([x1_, x2_, x3_, x4_])
merged = tf.keras.layers.Dense(16, activation=tf.nn.relu)(merged)
outputs = tf.keras.layers.Dense(3, activation=tf.nn.softmax)(merged)

model = tf.keras.Model(inputs=[x1,x2,x3,x4], outputs=outputs)
model.compile(loss="sparse_categorical_crossentropy", metrics=["accuracy"] )

# Load and prepare data
iris = load_iris()
X = iris.data
y = iris.target
X_train,X_test, y_train,y_test= train_test_split(X,y, test_size=0.3)

# Fit the model
model.fit([X_train[:,0],X_train[:,1],X_train[:,2],X_train[:,3]], y_train, batch_size=64, epochs=100, validation_split=0.25)

# Evaluate the model
test_scores = model.evaluate([X_test[:,0],X_test[:,1],X_test[:,2],X_test[:,3]], y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])

Output：

Epoch 1/100
2/2 [==============================] - 0s 75ms/step - loss: 1.6446 - accuracy: 0.4359 - val_loss: 1.6809 - val_accuracy: 0.5185
Epoch 2/100
2/2 [==============================] - 0s 10ms/step - loss: 1.4151 - accuracy: 0.6154 - val_loss: 1.4886 - val_accuracy: 0.5556
Epoch 3/100
2/2 [==============================] - 0s 9ms/step - loss: 1.2725 - accuracy: 0.6795 - val_loss: 1.3813 - val_accuracy: 0.5556
Epoch 4/100
2/2 [==============================] - 0s 9ms/step - loss: 1.1829 - accuracy: 0.6795 - val_loss: 1.2779 - val_accuracy: 0.5926
Epoch 5/100
2/2 [==============================] - 0s 10ms/step - loss: 1.0994 - accuracy: 0.6795 - val_loss: 1.1846 - val_accuracy: 0.5926
Epoch 6/100
.................. [ Truncated ] 
Epoch 100/100
2/2 [==============================] - 0s 2ms/step - loss: 0.4049 - accuracy: 0.9333
Test loss: 0.40491223335266113
Test accuracy: 0.9333333373069763

上述model架构示意图

Answer 2

Keras/TF 中的密集层是全连接层。 例如，当您使用如下密集层时

inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)

所有 4 个连接的输入神经元都连接到所有 3 个 output 神经元。

Keras/TF 中没有任何预定义层来指定如何连接输入和 output 神经元。 但是，Keras/TF 非常灵活，它允许您轻松定义自定义层。

借用这个答案的想法，您可以定义一个CustomConnected层，如下所示：

class CustomConnected(tf.keras.layers.Dense):

    def __init__(self, units, connections, **kwargs):
        self.connections = connections
        super(CustomConnected, self).__init__(units, **kwargs)

    def call(self, inputs):
        self.kernel = self.kernel * self.connections
        return super(CustomConnected, self).call(inputs)

使用这一层，您可以通过connections参数指定两层之间的连接。 例如：

inputs = tf.keras.Input(shape=(4))
connections = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1]])
x = CustomConnected(3, connections)(inputs)

这里，第一个、第二个和第三个输入神经元分别连接到第一个、第二个和第三个 output 神经元。 此外，第 4 个输入神经元连接到第 3 个 output 神经元。

更新：如评论部分所述，自适应方法（例如，仅使用每个 output 神经元的最大权重）也是可能的，但不推荐。 您可以通过以下层实现这一点：

class CustomSparse(tf.keras.layers.Dense):

    def __init__(self, units, **kwargs):
        super(CustomSparse, self).__init__(units, **kwargs)

    def call(self, inputs):
        nb_in, nb_out = self.kernel.shape
        argmax = tf.argmax(self.kernel, axis=0)  # Shape=(nb_out,)
        argmax_onehot = tf.transpose(tf.one_hot(argmax, depth=nb_in))  # Shape=(nb_in, nb_out)
        kernel_max = self.kernel * argmax_onehot
        # tf.print(kernel_max)  # Uncomment this line to print the weights
        out = tf.matmul(inputs, kernel_max)

        if self.bias is not None:
            out += self.bias

        if self.activation is not None:
            out = self.activation(out)

        return out

这种方法的主要问题是您不能通过argmax所需的最大权重的 argmax 操作传播梯度。 结果，网络只会在选择的权重不再是最大权重时“切换输入神经元”。

如何控制输入特征是否仅对 Tensorflow 神经网络后续层中的一个神经元有贡献？

问题描述

2 个解决方案

解决方案1
1 2020-08-05 09:08:10

工作样本

上述model架构示意图

解决方案2
0 2020-08-02 15:41:12

如何控制输入特征是否仅对 Tensorflow 神经网络后续层中的一个神经元有贡献？

问题描述

2 个解决方案

解决方案1 1 2020-08-05 09:08:10

工作样本

上述model架构示意图

解决方案2 0 2020-08-02 15:41:12

解决方案1
1 2020-08-05 09:08:10

解决方案2
0 2020-08-02 15:41:12