PyTorch (GPU) 慢于 CPU 慢于 keras

Question

I'm just getting started with PyTorch and I wanted to run through a few toy problems.我刚刚开始使用 PyTorch，我想解决一些玩具问题。 In the following case, I'm noticing a significant difference in how much time it takes for the model to train once over and issue one batch of predictions.在以下情况下，我注意到 model 训练一次并发布一批预测所需的时间存在显着差异。

This is the PyTorch implementation.这是 PyTorch 实现。 On the GPU, it takes ~17 seconds on my machine.在 GPU 上，在我的机器上大约需要 17 秒。 The same model on the CPU takes ~11 seconds. CPU 上相同的 model 需要大约 11 秒。

class LR(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = torch.nn.Linear(2, 20)
        self.linear2 = torch.nn.Linear(20, 1)

    def forward(self, x):
        x = torch.nn.functional.relu(self.linear1(x))
        x = torch.nn.functional.relu(self.linear2(x))
        return x


def fit_torch(df_train, df_test):
    sampler_tr = torch.utils.data.SubsetRandomSampler(df_train.index)
    train = torch.utils.data.DataLoader(
        torch.tensor(df_train.values, dtype=torch.float),
        batch_size=batch_size, sampler=sampler_tr)

    sampler_te = torch.utils.data.SubsetRandomSampler(df_test.index)
    test = torch.utils.data.DataLoader(
        torch.tensor(df_test.values, dtype=torch.float),
        batch_size=batch_size, sampler=sampler_te)

    model = LR()
    model = model.to(device)

    loss = torch.nn.MSELoss()
    optim = torch.optim.Adam(model.parameters(), lr=0.001)

    model.train()
    for _ in range(1000):
        for train_data in train:
            train_data = train_data.to(device)

            x_train = train_data[:, :2]
            y_train = train_data[:, 2]

            optim.zero_grad()

            pred = model(x_train)
            loss_val = loss(pred.squeeze(), y_train)

            loss_val.backward()
            optim.step()

    model.eval()
    with torch.no_grad():
        for test_data in test:
            test_data = test_data.to(device)

            pred = model(test_data[:, :2].float())
            break

This is the keras implementation.这是 keras 实现。 It takes approximately 9 seconds to run.运行大约需要 9 秒。

def fit_tf(df_train, df_test):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Dense(20, activation='relu'))
    model.add(tf.keras.layers.Dense(1, activation='relu'))

    model.compile(loss='mse', optimizer='adam')
    model.fit(
        df_train.values[:, :2],
        df_train.values[:, 2],
        batch_size=batch_size, epochs=1000, verbose=0)

    model.predict(df_test.iloc[:batch_size].values[:, :2])

The dataset and main functions.数据集和主要功能。

device = torch.device('cuda:0')
scaler = MinMaxScaler()

batch_size = 64

def create_dataset():
    dataset = []    
    random_x = np.random.randint(10, 1000, 1000)
    random_y = np.random.randint(10, 1000, 1000)

    for x, y in zip(random_x, random_y):
        dataset.append((x, y, 4 * x + 3 * y + 10))

    np.random.shuffle(dataset)
    df = pd.DataFrame(dataset)
    df = pd.DataFrame(scaler.fit_transform(df))

    return df

def __main__():
    df = create_dataset()
    df_train, df_test = train_test_split(df)

    start_time = time.time()
    fit_tf(df_train.reset_index(drop=True), df_test.reset_index(drop=True))
    print(time.time() - start_time)

Answer 1

PyTorch uses a dynamic computational graph by default, which is more flexible when you start to develop a neural network since it will give a more straight forward debug message. PyTorch 默认使用动态计算图，当您开始开发神经网络时会更加灵活，因为它会提供更直接的调试信息。 TensorFlow, in contrast, will produce a static computational graph, and that is why you need to compile the model before use it.相比之下，TensorFlow 将生成 static 计算图，这就是为什么在使用之前需要编译 model 的原因。 The compiler can optimize your model, but the tradeoff is the neural network becomes difficult to debug.编译器可以优化您的 model，但代价是神经网络变得难以调试。 This may cause minor difference between the performance of the two framework, but should not be a big deal.这可能会导致两个框架的性能存在细微差别，但应该不是什么大问题。

Since your network is pretty small, the overhead to copy the network between GPU memory and CPU memory and to initiate the CUDA subsystem exceeds the benefit brought by the GPU. Since your network is pretty small, the overhead to copy the network between GPU memory and CPU memory and to initiate the CUDA subsystem exceeds the benefit brought by the GPU. If you try some more complex neural network such as AlexNet, ResNet or even GoogLeNet, the benefit will be much more obvious.如果你尝试一些更复杂的神经网络，比如 AlexNet、ResNet 甚至 GoogLeNet，好处会更加明显。

PyTorch (GPU) 慢于 CPU 慢于 keras

问题描述

1 个解决方案

解决方案1
0 2020-06-25 07:23:54

PyTorch (GPU) 慢于 CPU 慢于 keras

问题描述

1 个解决方案

解决方案1 0 2020-06-25 07:23:54

解决方案1
0 2020-06-25 07:23:54