Pytorch Model 预测 GPU 或 CPU 速度提升

Question

Running a Multi layer perceptron model on CPU is faster then running it on GPU在 CPU 上运行多层感知器 model 比在 GPU 上运行更快

device  = torch.device("cuda")
MODEL = MLP(num_classes=len(MODEL_META["labels"])).to(device)
checkpoint = torch.load(path,map_location=device)
MODEL.load_state_dict(checkpoint)

i run it with current code:我用当前代码运行它：

for i in data:
    v = data[i:256]
    v = v[0:1600]
    v = np.pad(v,(0,1600-256),'constant')
    x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
    with torch.no_grad():
        out = MODEL(x)

on the same data i have GPU finishing this loop in 3.1798946857452393 seconds and CPU executes in 2.5446364879608154 seconds在相同的数据上，我有 GPU 在 3.1798946857452393 秒内完成此循环，CPU 在 2.5446364879608154 秒内执行

now if i load a Convolutional neural.network model trained from same data i have GPU executing in 4.280640602111816 seconds and CPU in 8.113759756088257 seconds.现在，如果我加载从相同数据训练的卷积神经网络 model，我将在 4.280640602111816 秒内执行 GPU，在 8.113759756088257 秒内执行 CPU。

Using multithreading i can split the work when running models on CPU like this:使用多线程，我可以像这样在 CPU 上运行模型时拆分工作：

for i in range threads:
    p = multiprocessing.Process(target=my_search_function,parms))
        jobs.append(p)
        p.start()
                    


for proc in jobs:
   proc.join()

and just by using 2 CPU cores i have nearly GPU performance.仅通过使用 2 个 CPU 内核，我就有将近 GPU 的性能。

Running in virtual Machine (Proxmox): 12core cpu 3900x and GTX1060 6G (pass trough) Ubuntu 20.04.4 LTS 8g ram.在虚拟机 (Proxmox) 中运行：12 核 cpu 3900x 和 GTX1060 6G（通过）Ubuntu 20.04.4 LTS 8g 内存。

Am i doing something wrong or its the correct behaviour?我做错了什么还是正确的行为？ Or any tips to improve performance?或者任何提高性能的技巧？

Answer 1

@Zoom this is my NN model @Zoom 这是我的 NN model

class MLP(nn.Module):
def __init__(self,num_classes=6):
    super(MLP,self).__init__()
   
    hidden_1 = 512
    hidden_2 = 512
    self.fc1 = nn.Linear(1600, 128)
    self.fc2 = nn.Linear(128,256)
    self.fc3 = nn.Linear(256,512)
    self.fc4 = nn.Linear(512,512)
    self.fc5 = nn.Linear(512,num_classes)
    self.droput = nn.Dropout(0.2)
    self.relu1 = nn.ReLU()
    self.relu2 = nn.ReLU()
    self.relu3 = nn.ReLU()
    self.softmax = nn.Softmax()

   

def forward(self,x):
    # flatten image input
    x = x.view(-1,1600)
    x = self.fc1(x)
    x = self.relu1(x)
    x = self.fc2(x)
    x = self.relu2(x)
    x = self.fc3(x)
    x = self.relu3(x)
    x = self.droput(x)
    x = self.fc5(x)
    return x

and i use it like this我这样用

device  = torch.device("cuda")
MODEL = MLP(num_classes=len(MODEL_META["labels"])).to(device)
checkpoint = torch.load(path,map_location=device)
MODEL.load_state_dict(checkpoint)

for i in data:
   v = data[i:256]
   v = v[0:1600]
   v = np.pad(v,(0,1600-256),'constant')
   x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
   with torch.no_grad():
      out = MODEL(x)

by my understaning of batching i can pass a list of "x" tensors, and it will result in a list predictions like this:根据我对批处理的理解，我可以传递一个“x”张量列表，它会产生如下列表预测：

L = []
for i in data:
  v = data[i:256]
  v = v[0:1600]
  v = np.pad(v,(0,1600-256),'constant')
  x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
  L.append(x)
with torch.no_grad():
  out = MODEL(L)

where "out" will be a list of tensors of same length as input list?其中“out”将是与输入列表长度相同的张量列表？

Pytorch Model 预测 GPU 或 CPU 速度提升

问题描述

1 个解决方案

解决方案1
0 2022-04-05 09:16:27

Pytorch Model 预测 GPU 或 CPU 速度提升

问题描述

1 个解决方案

解决方案1 0 2022-04-05 09:16:27

解决方案1
0 2022-04-05 09:16:27