[英]Pytorch Model Prediction on GPU or CPU speed improvement
Running a Multi layer perceptron model on CPU is faster then running it on GPU在 CPU 上运行多层感知器 model 比在 GPU 上运行更快
device = torch.device("cuda")
MODEL = MLP(num_classes=len(MODEL_META["labels"])).to(device)
checkpoint = torch.load(path,map_location=device)
MODEL.load_state_dict(checkpoint)
i run it with current code:我用当前代码运行它:
for i in data:
v = data[i:256]
v = v[0:1600]
v = np.pad(v,(0,1600-256),'constant')
x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
with torch.no_grad():
out = MODEL(x)
on the same data i have GPU finishing this loop in 3.1798946857452393 seconds and CPU executes in 2.5446364879608154 seconds在相同的数据上,我有 GPU 在 3.1798946857452393 秒内完成此循环,CPU 在 2.5446364879608154 秒内执行
now if i load a Convolutional neural.network model trained from same data i have GPU executing in 4.280640602111816 seconds and CPU in 8.113759756088257 seconds.现在,如果我加载从相同数据训练的卷积神经网络 model,我将在 4.280640602111816 秒内执行 GPU,在 8.113759756088257 秒内执行 CPU。
Using multithreading i can split the work when running models on CPU like this:使用多线程,我可以像这样在 CPU 上运行模型时拆分工作:
for i in range threads:
p = multiprocessing.Process(target=my_search_function,parms))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
and just by using 2 CPU cores i have nearly GPU performance.仅通过使用 2 个 CPU 内核,我就有将近 GPU 的性能。
Running in virtual Machine (Proxmox): 12core cpu 3900x and GTX1060 6G (pass trough) Ubuntu 20.04.4 LTS 8g ram.在虚拟机 (Proxmox) 中运行:12 核 cpu 3900x 和 GTX1060 6G(通过)Ubuntu 20.04.4 LTS 8g 内存。
Am i doing something wrong or its the correct behaviour?我做错了什么还是正确的行为? Or any tips to improve performance?或者任何提高性能的技巧?
@Zoom this is my NN model @Zoom 这是我的 NN model
class MLP(nn.Module):
def __init__(self,num_classes=6):
super(MLP,self).__init__()
hidden_1 = 512
hidden_2 = 512
self.fc1 = nn.Linear(1600, 128)
self.fc2 = nn.Linear(128,256)
self.fc3 = nn.Linear(256,512)
self.fc4 = nn.Linear(512,512)
self.fc5 = nn.Linear(512,num_classes)
self.droput = nn.Dropout(0.2)
self.relu1 = nn.ReLU()
self.relu2 = nn.ReLU()
self.relu3 = nn.ReLU()
self.softmax = nn.Softmax()
def forward(self,x):
# flatten image input
x = x.view(-1,1600)
x = self.fc1(x)
x = self.relu1(x)
x = self.fc2(x)
x = self.relu2(x)
x = self.fc3(x)
x = self.relu3(x)
x = self.droput(x)
x = self.fc5(x)
return x
and i use it like this我这样用
device = torch.device("cuda")
MODEL = MLP(num_classes=len(MODEL_META["labels"])).to(device)
checkpoint = torch.load(path,map_location=device)
MODEL.load_state_dict(checkpoint)
for i in data:
v = data[i:256]
v = v[0:1600]
v = np.pad(v,(0,1600-256),'constant')
x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
with torch.no_grad():
out = MODEL(x)
by my understaning of batching i can pass a list of "x" tensors, and it will result in a list predictions like this:根据我对批处理的理解,我可以传递一个“x”张量列表,它会产生如下列表预测:
L = []
for i in data:
v = data[i:256]
v = v[0:1600]
v = np.pad(v,(0,1600-256),'constant')
x = torch.from_numpy(v).float().view(-1,1600).to(device=device)
L.append(x)
with torch.no_grad():
out = MODEL(L)
where "out" will be a list of tensors of same length as input list?其中“out”将是与输入列表长度相同的张量列表?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.