Google currently announced that its second- and third-generation Cloud TPU Pods — its scalable cloud-primarily based supercomputers with up to 1,000 of its custom Tensor Processing Units — are now publicly readily available in beta.
The newest-generation v3 models are specifically strong and are liquid-cooled. Every single pod can provide up to 100 petaFLOPS. As Google notes, that raw computing energy puts it inside the leading 5 supercomputers worldwide, but you require to take that quantity with a grain of salt offered that the TPU pods operate at a far reduced numerical precision.
You don’t require to use a complete TPU Pod, even though. Google also lets developers rent ‘slices’ of these machines. Either way, even though, we’re speaking about a quite strong machine that can train a regular ResNet-50 image classification model utilizing the ImageNet dataset in two minutes.
The TPU v2 Pods function up to 512 cores and are a bit slower than the v3 ones, of course. When utilizing 265 TPUs, for instance, a v2 Pod will train the ResNet-50 model in 11.3 minutes though a v3 Pod will only take 7.1 minutes. Making use of a single TPU, that’ll take 302 minutes.
Unsurprisingly, Google argues that Pods are very best made use of when you require to speedily train a model (and theprice isn’t that considerably of an problem), require larger accuracy utilizing bigger datasets with millions of labeled examples, or when you want quickly prototype new models.