简体繁体 English

我应该在 Google Cloud Platform (GCP) 上使用哪个 GPU

[英]Which GPU should I use on Google Cloud Platform (GCP)

原文 2021-10-22 09:35:14 2 2 google-cloud-platform/ cloud/ gpu/ nvidia/ gcloud-compute

Right now, I'm working on my master's thesis and I need to train a huge Transformer model on GCP.现在，我正在写我的硕士论文，我需要在 GCP 上训练一个巨大的 Transformer model。 And the fastest way to train deep learning models is to use GPU.而训练深度学习模型最快的方法是使用 GPU。 So, I was wondering which GPU should I use among the ones provided by GCP?所以，我想知道在 GCP 提供的 GPU 中我应该使用哪个？ The ones available at the current moment are:目前可用的有：

NVIDIA® A100 NVIDIA® A100
NVIDIA® T4英伟达® T4
NVIDIA® V100英伟达® V100
NVIDIA® P100英伟达® P100
NVIDIA® P4英伟达® P4
NVIDIA® K80英伟达® K80

2 个解决方案

It all depends on what are the characteristics you're looking for.这完全取决于您要寻找的特征是什么。

First, let's collect some information about these different GPU models and see which one suits you best.首先，让我们收集一些关于这些不同 GPU 型号的信息，看看哪一个最适合您。 You can google each model's name and see its characteristics.您可以搜索每个模型的名称并查看其特征。 I did that and I created the following table:我这样做了，并创建了下表：

Model Model	FP32 (TFLOPS) FP32 (TFLOPS)	Price价格	TFLOPS/dollar TFLOPS/美元
Nvidia A100英伟达 A100	19.5 19.5	2.933908 2.933908	6.646425178 6.646425178
Nvidia Tesla T4英伟达特斯拉 T4	8.1 8.1	0.35 0.35	23.14285714 23.14285714
Nvidia Tesla P4英伟达特斯拉 P4	5.5 5.5	0.6 0.6	9.166666667 9.166666667
Nvidia Tesla V100英伟达特斯拉 V100	14 14	2.48 2.48	5.64516129 5.64516129
Nvidia Tesla P100英伟达特斯拉 P100	9.3 9.3	1.46 1.46	6.369863014 6.369863014
Nvidia Tesla K80英伟达特斯拉 K80	8.73 8.73	0.45 0.45	19.4 19.4

In the previous table, you see can the:在上表中，您可以看到：

FP32 : which stands for 32-bit floating point which is a measure of how fast this GPU card with single-precision floating-point operations. FP32 ：代表32 位浮点数，用于衡量 GPU 卡的单精度浮点运算速度。 It's measured in TFLOPS or * Tera Floating-Point Operations ... The higher, the better.它以TFLOPS或 * Tera 浮点运算来衡量……越高越好。
Price : Hourly-price on GCP. Price ：GCP 上的每小时价格。
TFLOPS/Price : simply how much operations you will get for one dollar. TFLOPS/Price ：简单来说，你将获得多少操作一美元。

From this table, you can see:从这张表中，您可以看到：

Nvidia A100 is the fastest. Nvidia A100是最快的。
Nvidia Tesla P4 is the slowest. Nvidia Tesla P4是最慢的。
Nvidia A100 is the most expensive. Nvidia A100是最贵的。
Nvidia Tesla T4 is the cheapest. Nvidia Tesla T4是最便宜的。
Nvidia Tesla T4 has the highest operations per dollar. Nvidia Tesla T4的每美元运营量最高。
Nvidia Tesla V100 has the lowest operations per dollar. Nvidia Tesla V100的每美元运营量最低。

And you can observe that clearly in the following figure:您可以在下图中清楚地观察到这一点：

I hope that was helpful我希望这有帮助

Nvidia says that using the most modern, powerful GPUs is not only faster, it also ends up being cheaper: https://developer.nvidia.com/blog/saving-time-and-money-in-the-cloud-with-the-latest-nvidia-powered-instances/ Nvidia 表示，使用最现代、最强大的 GPU 不仅速度更快，而且最终更便宜： https://developer.nvidia.com/blog/saving-time-and-money-in-the-cloud-with-最新的nvidia-powered-instances/

Google came to a similar conclusion (this was a couple of years ago before the A100 was available): https://cloud.google.com/blog/products/ai-machine-learning/your-ml-workloads-cheaper-and-faster-with-the-latest-gpus谷歌得出了类似的结论（这是几年前 A100 可用之前）： https://cloud.google.com/blog/products/ai-machine-learning/your-ml-workloads-cheaper-and -faster-with-the-late-gpus

I guess you could make an argument that both Nvidia and Google could be a little biased in making that judgement, but they are also well placed to answer the question and I see no reason not to trust them.我想你可以提出一个论点，即 Nvidia 和 Google 在做出这种判断时可能有点偏颇，但他们也很适合回答这个问题，我认为没有理由不信任他们。