简体   繁体   English

Pytorch Model 预测 GPU 或 CPU 速度提升

[英]Pytorch Model Prediction on GPU or CPU speed improvement

Running a Multi layer perceptron model on CPU is faster then running it on GPU在 CPU 上运行多层感知器 model 比在 GPU 上运行更快

device  = torch.device("cuda")
MODEL = MLP(num_classes=len(MODEL_META["labels"])).to(device)
checkpoint = torch.load(path,map_location=device)
MODEL.load_state_dict(checkpoint)

i run it with current code:我用当前代码运行它:

for i in data:
    v = data[i:256]
    v = v[0:1600]
    v = np.pad(v,(0,1600-256),'constant')
    x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
    with torch.no_grad():
        out = MODEL(x)

on the same data i have GPU finishing this loop in 3.1798946857452393 seconds and CPU executes in 2.5446364879608154 seconds在相同的数据上,我有 GPU 在 3.1798946857452393 秒内完成此循环,CPU 在 2.5446364879608154 秒内执行

now if i load a Convolutional neural.network model trained from same data i have GPU executing in 4.280640602111816 seconds and CPU in 8.113759756088257 seconds.现在,如果我加载从相同数据训练的卷积神经网络 model,我将在 4.280640602111816 秒内执行 GPU,在 8.113759756088257 秒内执行 CPU。

Using multithreading i can split the work when running models on CPU like this:使用多线程,我可以像这样在 CPU 上运行模型时拆分工作:

for i in range threads:
    p = multiprocessing.Process(target=my_search_function,parms))
        jobs.append(p)
        p.start()
                    


for proc in jobs:
   proc.join()  

and just by using 2 CPU cores i have nearly GPU performance.仅通过使用 2 个 CPU 内核,我就有将近 GPU 的性能。

Running in virtual Machine (Proxmox): 12core cpu 3900x and GTX1060 6G (pass trough) Ubuntu 20.04.4 LTS 8g ram.在虚拟机 (Proxmox) 中运行:12 核 cpu 3900x 和 GTX1060 6G(通过)Ubuntu 20.04.4 LTS 8g 内存。

Am i doing something wrong or its the correct behaviour?我做错了什么还是正确的行为? Or any tips to improve performance?或者任何提高性能的技巧?

@Zoom this is my NN model @Zoom 这是我的 NN model

class MLP(nn.Module):
def __init__(self,num_classes=6):
    super(MLP,self).__init__()
   
    hidden_1 = 512
    hidden_2 = 512
    self.fc1 = nn.Linear(1600, 128)
    self.fc2 = nn.Linear(128,256)
    self.fc3 = nn.Linear(256,512)
    self.fc4 = nn.Linear(512,512)
    self.fc5 = nn.Linear(512,num_classes)
    self.droput = nn.Dropout(0.2)
    self.relu1 = nn.ReLU()
    self.relu2 = nn.ReLU()
    self.relu3 = nn.ReLU()
    self.softmax = nn.Softmax()

   

def forward(self,x):
    # flatten image input
    x = x.view(-1,1600)
    x = self.fc1(x)
    x = self.relu1(x)
    x = self.fc2(x)
    x = self.relu2(x)
    x = self.fc3(x)
    x = self.relu3(x)
    x = self.droput(x)
    x = self.fc5(x)
    return x

and i use it like this我这样用

device  = torch.device("cuda")
MODEL = MLP(num_classes=len(MODEL_META["labels"])).to(device)
checkpoint = torch.load(path,map_location=device)
MODEL.load_state_dict(checkpoint)

for i in data:
   v = data[i:256]
   v = v[0:1600]
   v = np.pad(v,(0,1600-256),'constant')
   x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
   with torch.no_grad():
      out = MODEL(x)

by my understaning of batching i can pass a list of "x" tensors, and it will result in a list predictions like this:根据我对批处理的理解,我可以传递一个“x”张量列表,它会产生如下列表预测:

L = []
for i in data:
  v = data[i:256]
  v = v[0:1600]
  v = np.pad(v,(0,1600-256),'constant')
  x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
  L.append(x)
with torch.no_grad():
  out = MODEL(L)

where "out" will be a list of tensors of same length as input list?其中“out”将是与输入列表长度相同的张量列表?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM