[英]how "data" and "target" are choosen in a federated learning? (PySyft)
i can't understand how in function train() below, the variable (data, target) are choosen.我无法理解如何在下面的函数 train() 中选择变量(数据、目标)。
def train(args, model, device, federated_train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(federated_train_loader): # <-- now it is a distributed dataset
model.send(data.location) # <-- NEW: send the model to the right location`
i guess they are 2 tensor representing 2 random images of dataset train, but then the loss function我猜它们是代表数据集训练的 2 个随机图像的 2 张量,但是损失函数
loss = F.nll_loss(output, target)
is calculated at every interaction with different target?在与不同目标的每次交互中计算?
Also i have different question: i trained the network with images of cats, then i test it with images of cars and the accuracy reached is 97%.我还有一个不同的问题:我用猫的图像训练了网络,然后用汽车的图像对其进行了测试,达到的准确率为 97%。 How is this possible?
这怎么可能? is a proper value or i'm doing something wrong?
是正确的值还是我做错了什么?
here is the entire code:这是整个代码:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import syft as sy # <-- NEW: import the Pysyft library
hook = sy.TorchHook(torch) # <-- NEW: hook PyTorch ie add extra functionalities to support Federated Learning
bob = sy.VirtualWorker(hook, id="bob") # <-- NEW: define remote worker bob
alice = sy.VirtualWorker(hook, id="alice") # <-- NEW: and alice
class Arguments():
def __init__(self):
self.batch_size = 64
self.test_batch_size = 1000
self.epochs = 2
self.lr = 0.01
self.momentum = 0.5
self.no_cuda = False
self.seed = 1
self.log_interval = 30
self.save_model = False
args = Arguments()
use_cuda = not args.no_cuda and torch.cuda.is_available()
torch.manual_seed(args.seed)
device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
federated_train_loader = sy.FederatedDataLoader( # <-- this is now a FederatedDataLoader
datasets.MNIST("C:\\users...\\train", train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
.federate((bob, alice)), # <-- NEW: we distribute the dataset across all the workers, it's now a FederatedDataset
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST("C:\\Users...\\test", train=False, download=True, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.test_batch_size, shuffle=True, **kwargs)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(args, model, device, federated_train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(federated_train_loader): # <-- now it is a distributed dataset
model.send(data.location) # <-- NEW: send the model to the right location
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
model.get() # <-- NEW: get the model back
if batch_idx % args.log_interval == 0:
loss = loss.get() # <-- NEW: get the loss back
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * args.batch_size, len(federated_train_loader) * args.batch_size,
100. * batch_idx / len(federated_train_loader), loss.item()))
def test(args, model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=args.lr) # TODO momentum is not supported at the moment
for epoch in range(1, args.epochs + 1):
train(args, model, device, federated_train_loader, optimizer, epoch)
test(args, model, device, test_loader)
if (args.save_model):
torch.save(model.state_dict(), "mnist_cnn.pt")
Consider it like this.像这样考虑。 When you hook torch, all your torch tensors will get additional functionality - methods like
.send()
, .federate()
, and attributes like .location
and ._objects
.当你钩火炬,所有的火把张量将获得额外的功能-类似的方法
.send()
.federate()
和属性,如.location
和._objects
。 Your data and target, which were once torch tensors, became pointers to tensors residing in different VirtualWorker
objects due to .federate((bob, alice))
.由于
.federate((bob, alice))
,您的数据和目标(曾经是火炬张量)变成了指向驻留在不同VirtualWorker
对象中的张量的指针。
Now data and target have additional attributes that includes .location
, which will return the location of that tensor - data/target pointed by the pointer called data/target.现在 data 和 target 有额外的属性,包括
.location
,它将返回张量的位置 - data/target 由名为 data/target 的指针指向。
Federated learning sends the global model to this location, as seen in model.send(data.location)
.联邦学习将全局模型发送到该位置,如
model.send(data.location)
。
Now, model
is a pointer residing at the same location and data
is also a pointer residing there.现在,
model
是驻留在同一位置的指针, data
也是驻留在那里的指针。 Hence when you take the output as output = model(data)
, output will also reside there and all we (the central server or in other words, the VirtualWorker called 'me'
) will get is a pointer to that output.因此,当您将输出作为
output = model(data)
,输出也将驻留在那里,而我们(中央服务器或换句话说,称为'me'
的 VirtualWorker)将获得一个指向该输出的指针。
Now, regarding your doubt on loss calculation, since output and target are both residing in that same location, calculation of loss
will also happen there.现在,关于您对损失计算的怀疑,由于输出和目标都位于同一位置,因此
loss
计算也会发生在那里。 Same goes for backprop and step.反向传播和步骤也是如此。
Finally, you can see model.get()
, here is where the central server pulls the remote model using the pointer called model
.最后,您可以看到
model.get()
,这里是中央服务器使用名为model
的指针拉取远程模型的地方。 (I'm not sure if it should be model = model.get()
though). (我不确定它是否应该是
model = model.get()
)。
So anything with .get()
will be pulled from that worker and will be returned in our python statement.所以任何带有
.get()
东西都会从那个 worker 中拉出来,并在我们的 python 语句中返回。 Also note that .get()
will remove that object from it's location when called.另请注意,
.get()
将在调用时从其位置删除该对象。 Hence use .copy().get()
if you are going to need it further.因此,如果您需要进一步使用
.copy().get()
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.