简体   繁体   中英

PyTorch (GPU) slower than CPU slower than keras

I'm just getting started with PyTorch and I wanted to run through a few toy problems. In the following case, I'm noticing a significant difference in how much time it takes for the model to train once over and issue one batch of predictions.

This is the PyTorch implementation. On the GPU, it takes ~17 seconds on my machine. The same model on the CPU takes ~11 seconds.

class LR(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = torch.nn.Linear(2, 20)
        self.linear2 = torch.nn.Linear(20, 1)

    def forward(self, x):
        x = torch.nn.functional.relu(self.linear1(x))
        x = torch.nn.functional.relu(self.linear2(x))
        return x


def fit_torch(df_train, df_test):
    sampler_tr = torch.utils.data.SubsetRandomSampler(df_train.index)
    train = torch.utils.data.DataLoader(
        torch.tensor(df_train.values, dtype=torch.float),
        batch_size=batch_size, sampler=sampler_tr)

    sampler_te = torch.utils.data.SubsetRandomSampler(df_test.index)
    test = torch.utils.data.DataLoader(
        torch.tensor(df_test.values, dtype=torch.float),
        batch_size=batch_size, sampler=sampler_te)

    model = LR()
    model = model.to(device)

    loss = torch.nn.MSELoss()
    optim = torch.optim.Adam(model.parameters(), lr=0.001)

    model.train()
    for _ in range(1000):
        for train_data in train:
            train_data = train_data.to(device)

            x_train = train_data[:, :2]
            y_train = train_data[:, 2]

            optim.zero_grad()

            pred = model(x_train)
            loss_val = loss(pred.squeeze(), y_train)

            loss_val.backward()
            optim.step()

    model.eval()
    with torch.no_grad():
        for test_data in test:
            test_data = test_data.to(device)

            pred = model(test_data[:, :2].float())
            break


This is the keras implementation. It takes approximately 9 seconds to run.

def fit_tf(df_train, df_test):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Dense(20, activation='relu'))
    model.add(tf.keras.layers.Dense(1, activation='relu'))

    model.compile(loss='mse', optimizer='adam')
    model.fit(
        df_train.values[:, :2],
        df_train.values[:, 2],
        batch_size=batch_size, epochs=1000, verbose=0)

    model.predict(df_test.iloc[:batch_size].values[:, :2])

The dataset and main functions.

device = torch.device('cuda:0')
scaler = MinMaxScaler()

batch_size = 64

def create_dataset():
    dataset = []    
    random_x = np.random.randint(10, 1000, 1000)
    random_y = np.random.randint(10, 1000, 1000)

    for x, y in zip(random_x, random_y):
        dataset.append((x, y, 4 * x + 3 * y + 10))

    np.random.shuffle(dataset)
    df = pd.DataFrame(dataset)
    df = pd.DataFrame(scaler.fit_transform(df))

    return df

def __main__():
    df = create_dataset()
    df_train, df_test = train_test_split(df)

    start_time = time.time()
    fit_tf(df_train.reset_index(drop=True), df_test.reset_index(drop=True))
    print(time.time() - start_time)

PyTorch uses a dynamic computational graph by default, which is more flexible when you start to develop a neural network since it will give a more straight forward debug message. TensorFlow, in contrast, will produce a static computational graph, and that is why you need to compile the model before use it. The compiler can optimize your model, but the tradeoff is the neural network becomes difficult to debug. This may cause minor difference between the performance of the two framework, but should not be a big deal.

Since your network is pretty small, the overhead to copy the network between GPU memory and CPU memory and to initiate the CUDA subsystem exceeds the benefit brought by the GPU. If you try some more complex neural network such as AlexNet, ResNet or even GoogLeNet, the benefit will be much more obvious.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM