Model 性能在联邦学习训练期间没有提高

Question

I have followed this emnist tutorial to create an image classification experiment (7 classes) with the aim of training a classifier on 3 silos of data with the TFF framework.我按照这个 emnist 教程创建了一个图像分类实验（7 个类），目的是使用 TFF 框架在 3 个数据孤岛上训练分类器。

Before training begins, I convert the model to a tf keras model using tff.learning.assign_weights_to_keras_model(model,state.model) to evaluate on my validation set. Before training begins, I convert the model to a tf keras model using tff.learning.assign_weights_to_keras_model(model,state.model) to evaluate on my validation set. Regardless of the label, the model only predicts one class.不管 label，model 只预测一个 class。 This is to be expected as no training of the model has occurred yet.这是可以预料的，因为尚未对 model 进行培训。 However, I repeat this step after each federated averaging round and the problem persists.但是，我在每轮联合平均后重复此步骤，但问题仍然存在。 All validation images are predicted to one class.所有验证图像都预测为一个 class。 I also save the tf keras model weights after each round and make predictions on the test set - no changes.我还在每一轮之后保存了 tf keras model 权重，并对测试集进行预测 - 没有变化。

Some of the steps I have taken to check the source of the issue:我已采取一些步骤来检查问题的根源：

Checked if the tf keras model weights are updating when the FL model is converted after each round - they are updating.检查 tf keras model 权重是否在每轮后转换 FL model 时更新 - 它们正在更新。
Ensured that the buffer size is greater than the training dataset size for each client.确保缓冲区大小大于每个客户端的训练数据集大小。
Compared the predictions to the class distribution in the training datasets.将预测与训练数据集中的 class 分布进行比较。 There is a class imbalance but the one class that the model predicts is not necessarily the majority class.存在 class 不平衡，但 model 预测的 class 不一定是大多数 ZA2F2ED4F8EBC2CBB64C21A29 Also, it is not always the same class.此外，它并不总是相同的 class。 For the most part, it predicts only class 0.在大多数情况下，它仅预测 class 0。
Increased the number of rounds to 5 and epochs per round to 10. This is computationally very intensive as it is quite a large model being trained with approx 1500 images per client.将轮数增加到 5 轮，将每轮 epoch 增加到 10。这在计算上非常密集，因为它是一个相当大的 model，每个客户端大约有 1500 个图像进行训练。
Investigated the TensorBoard logs from each training attempt.调查每次训练尝试的 TensorBoard 日志。 The training loss is decreasing as the round progresses.随着回合的进行，训练损失正在减少。
Tried a much simpler model - basic CNN with 2 conv layers.尝试了一个更简单的 model - 具有 2 个卷积层的基本 CNN。 This allowed me to greatly increase the number of epochs and rounds.这使我能够大大增加 epochs 和 rounds 的数量。 When evaluating this model on the test set, it predicted 4 different classes but the performance remains very bad.在测试集上评估这个 model 时，它预测了 4 个不同的类别，但性能仍然很差。 This would indicate that I just would need to increase the number of rounds and epochs for my original model to increase the variation in predictions.这表明我只需要增加原始 model 的轮数和历元数，以增加预测的变化。 This is difficult due the large training time that would be a result.这很困难，因为这会导致大量的训练时间。

Model details: Model 详细信息：

The model uses the XceptionNet as the base model with the weights unfrozen. model 使用 XceptionNet 作为基础 model，权重未冻结。 This performs well on the classification task when all the training images are pooled into a global dataset.当所有训练图像都汇集到一个全局数据集中时，这在分类任务上表现良好。 Our aim is to hopefully achieve a comparable performance with FL.我们的目标是希望获得与 FL 相当的性能。

base_model = Xception(include_top=False,
                      weights=weights,
                      pooling='max',
                      input_shape=input_shape)
x = GlobalAveragePooling2D()( x )
predictions = Dense( num_classes, activation='softmax' )( x )
model = Model( base_model.input, outputs=predictions )

Here is my training code:这是我的训练代码：

def fit(self):
    """Train FL model"""
    # self.load_data()
    summary_writer = tf.summary.create_file_writer(
        self.logs_dir
    )
    federated_averaging = self._construct_iterative_process()
    state = federated_averaging.initialize()
    tfkeras_model = self._convert_to_tfkeras_model( state )
    print( np.argmax( tfkeras_model.predict( self.val_data ), axis=-1 ) )
    val_loss, val_acc = tfkeras_model.evaluate( self.val_data, steps=100 )

    with summary_writer.as_default():
        for round_num in tqdm( range( 1, self.num_rounds ), ascii=True, desc="FedAvg Rounds" ):

            print( "Beginning fed avg round..." )
            # Round of federated averaging
            state, metrics = federated_averaging.next(
                state,
                self.training_data
            )
            print( "Fed avg round complete" )
            # Saving logs
            for name, value in metrics._asdict().items():
                tf.summary.scalar(
                    name,
                    value,
                    step=round_num
                )
            print( "round {:2d}, metrics={}".format( round_num, metrics ) )
            tff.learning.assign_weights_to_keras_model(
                tfkeras_model,
                state.model
            )
            # tfkeras_model = self._convert_to_tfkeras_model(
            #     state
            # )
            val_metrics = {}
            val_metrics["val_loss"], val_metrics["val_acc"] = tfkeras_model.evaluate(
                self.val_data,
                steps=100
            )
            for name, metric in val_metrics.items():
                tf.summary.scalar(
                    name=name,
                    data=metric,
                    step=round_num
                )
            self._checkpoint_tfkeras_model(
                tfkeras_model,
                round_num,
                self.checkpoint_dir
            )
def _checkpoint_tfkeras_model(self,
                              model,
                              round_number,
                              checkpoint_dir):
    # Obtaining model dir path
    model_dir = os.path.join(
        checkpoint_dir,
        f'round_{round_number}',
    )
    # Creating directory
    pathlib.Path(
        model_dir
    ).mkdir(
        parents=True
    )
    model_path = os.path.join(
        model_dir,
        f'model_file_round{round_number}.h5'
    )
    # Saving model
    model.save(
        model_path
    )

def _convert_to_tfkeras_model(self, state):
    """Converts global TFF modle of TF keras model

    Takes the weights of the global model
    and pushes them back into a standard
    Keras model

    Args:
        state: The state of the FL server
            containing the model and
            optimization state

    Returns:
        (model); TF Keras model

    """
    model = self._load_tf_keras_model()
    model.compile(
        loss=self.loss,
        metrics=self.metrics
    )
    tff.learning.assign_weights_to_keras_model(
        model,
        state.model
    )
    return model

def _load_tf_keras_model(self):
    """Loads tf keras models

    Raises:
        KeyError: A model name was not defined
            correctly

    Returns:
        (model): TF keras model object

    """
    model = create_models(
        model_type=self.model_type,
        input_shape=[self.img_h, self.img_w, 3],
        freeze_base_weights=self.freeze_weights,
        num_classes=self.num_classes,
        compile_model=False
    )

    return model

def _define_model(self):
    """Model creation function"""
    model = self._load_tf_keras_model()

    tff_model = tff.learning.from_keras_model(
        model,
        dummy_batch=self.sample_batch,
        loss=self.loss,
        # Using self.metrics throws an error
        metrics=[tf.keras.metrics.CategoricalAccuracy()] )

    return tff_model

def _construct_iterative_process(self):
    """Constructing federated averaging process"""
    iterative_process = tff.learning.build_federated_averaging_process(
        self._define_model,
        client_optimizer_fn=lambda: tf.keras.optimizers.SGD( learning_rate=0.02 ),
        server_optimizer_fn=lambda: tf.keras.optimizers.SGD( learning_rate=1.0 ) )
    return iterative_process

Answer 1

Increased the number of rounds to 5...将回合数增加到5...

Running only a few rounds of federated learning sounds insufficient.只运行几轮联邦学习听起来不够。 One of the earliest Federated Averaging papers ( McMahan 2016 ) required running for hundreds of rounds when the MNIST data had non-iid splits.当 MNIST 数据具有非 iid 分裂时，最早的联邦平均论文之一 ( McMahan 2016 ) 需要运行数百轮。 More recently ( Reddi 2020 ) required thousands of rounds for CIFAR-100.最近（ Reddi 2020 ）需要数千轮CIFAR-100。 One thing to note is that each "round" is one "step" of the global model.需要注意的一点是，每一“轮”都是全局model的一个“步”。 That step may be larger with more client epochs, but these are averaged and diverging clients may reduce the magnitude of the global step.随着客户端 epoch 的增多，该步长可能会更大，但这些都是平均的，并且不同的客户端可能会降低全局步长的幅度。

I also save the tf keras model weights after each round and make predictions on the test set - no changes.我还在每一轮之后保存了 tf keras model 权重，并对测试集进行预测 - 没有变化。

This can be concerning.这可能令人担忧。 It will be easier to debug if you could share the code used in the FL training loop.如果您可以共享 FL 训练循环中使用的代码，将更容易调试。

Answer 2

Note sure this is an answer, but more a liked observation.请注意，这是一个答案，但更多的是一个喜欢的观察。

I've been trying to characterize the learning process (accuracy and loss) on the Federated Learning for Image Classification notebook tutorial with TFF.我一直在尝试使用 TFF 来描述 Federated Learning for Image Classification notebook 教程中的学习过程（准确性和损失）。

I'm seeing major improvements in speed of convergence by modifying the epoch hyperparameter.通过修改 epoch 超参数，我看到了收敛速度的重大改进。 Changing epochs from 5, 10, 20 etc. But I'm also seeing major increase in training accuracy.从 5、10、20 等更改 epoch。但我也看到训练准确度的大幅提高。 I suspect overfitting is occurring, though then I evaluate on the test set accuracy is still high.我怀疑正在发生过度拟合，但我评估测试集的准确性仍然很高。

Wondering what is going on.想知道发生了什么。 ? ?

My understanding is that the epoch param controls the # of forward/back prop on each client per round of training.我的理解是 epoch 参数控制每轮训练每个客户端的前/后道具的数量。 Is this correct?这个对吗？ So ie 10 rounds of training on 10 clients with 10 epochs would be 10 Epochs X 10 Clients X 10 rounds.因此，即 10 轮训练 10 轮的 10 轮训练将是 10 轮 X 10 客户端 X 10 轮。 Realise a lager range of clients is needed etc but I was expecting to see poorer accuracy on the test set.意识到需要更大范围的客户等，但我希望在测试集上看到更差的准确性。

What can I do to see whats going on.我能做些什么来看看发生了什么。 Could I use the evaluation check with something like learning curves to to see if overfitting is occurring?我可以使用带有学习曲线之类的评估检查来查看是否发生过拟合吗？

test_metrics = evaluation(state.model, federated_test_data) Only appears to give a single data point, how can I get the individual test accuracy for each test example validated? test_metrics = evaluation(state.model, federated_test_data) 似乎只给出一个数据点，我怎样才能获得每个验证测试示例的单独测试准确性？

Appreciate any thoughts you may have on the matter, Colin.感谢您对此事的任何想法，科林。 . . . .

Model 性能在联邦学习训练期间没有提高

问题描述

2 个解决方案

解决方案1
0 2020-05-08 15:16:21

解决方案2
0 2020-05-28 11:29:27

Model 性能在联邦学习训练期间没有提高

问题描述

2 个解决方案

解决方案1 0 2020-05-08 15:16:21

解决方案2 0 2020-05-28 11:29:27

解决方案1
0 2020-05-08 15:16:21

解决方案2
0 2020-05-28 11:29:27