简体   繁体   中英

Is it possible to run regular python code on Google TPU?

So I'm pretty new with Google TPU. From what I've already researched, it is optimized specifically for training machine learning models written on TensorFlow. Currently, I am trying to see how the TPU performs with other types of functions. These functions are not related to machine learning. I have been trying to adapt my code so it can run on the TPU in Google Colab, but I am not sure if it is working or if this is the best approach. This is the code I have for a O(n 3 ) matrix multiplication algorithm:

import os
import numpy as np
from random import seed
from random import random
import tensorflow as tf
import time;

#check that this is running on the TPU
try:
  tpu = tf.contrib.cluster_resolver.TPUClusterResolver() # TPU detection

  print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])  
except ValueError:
  print("Running on GPU or CPU")
  tpu = None

#TPU details
if 'COLAB_TPU_ADDR' not in os.environ:
  print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
else:
  tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print ('TPU address is', tpu_address)

def multiplicationComputation():
  #size of matrix
  row_size = 128
  col_size = 128
  N = row_size*col_size

  #class for matrix
  class MatrixMultiplication: 
    matrix1 = np.empty(N) #DO NOT USE np.arange(N)
    matrix2 = np.empty(N)
    product = np.empty(N) #product size is the matrix1.columns x matrix2.rows

  #create MatrixMultiplication object
  m = MatrixMultiplication()

  #fill objects's data structures
  #seed for matrix 1
  seed(1) 
  for x in range(N):
    value = random()
    m.matrix1[x] = value

  #seed for matrix 2
  seed(7) 
  for x in range(N):
    value = random()
    m.matrix2[x] = value

  #multiply matrix1 and matrix2
  start = time.time()
  qtySaves = 0;
  for i in range(row_size):
    for j in range(col_size):
      i_col = i * col_size
      sum = 0
      for k in range(row_size):
        k_col = k * col_size
        multiplication = m.matrix1[i_col + k] * m.matrix2[k_col + j]
        sum = sum + multiplication

      m.product[i_col + j] = sum #The result of the multiplication is saved on the product matrix
      qtySaves = qtySaves + 1

  end = time.time()
  #print result
  print()
  print("Result O(n^3): ")
  for i in range(N):
    if i % row_size == 0 and i > 0:
      print()  
    print(str(m.product[i]), end =" ")

  print()
  print("For n = " + str(N) + ", time is " + str(end - start))

#rewrite computation so it can be executed on the TPU
#tpuOperation = tf.contrib.tpu.rewrite(multiplicationComputation)
tpuOperation = tf.contrib.tpu.batch_parallel(multiplicationComputation, [], num_shards=8)

#run
session = tf.Session(tpu_address, config=tf.ConfigProto(isolate_session_state=True, log_device_placement=True)) #isolate session state = True for distributed runtime
try:
  session.run(tf.contrib.tpu.initialize_system()) #initializes a distributed TPU system
  session.run(tpuOperation)
finally:
  #TPU sessions must be shutdown separately from closing the session
  session.run(tf.contrib.tpu.shutdown_system())
  session.close()

My fear is that this is not running on the TPU. When calling session.list_devices() I see that there is a CPU listed, and I am afraid that my code might actually be running on the CPU and not on the TPU. This is the output of said command:

TPU devices: 
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 10448234186946304259),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 2088593175391423031),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 1681908406791603718),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 2618396797726491975),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 14243051360425930068),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 15491507241115490455),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 9239156557030772892),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 16970377907446102335),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 6145936732121669294),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 11372860691871753999),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 12653526146081894211)]

For now, I'm not looking for advice on what accelerator to use. I want to test the TPU and make sure my code is running on it. Please help!

I am afraid the presence or absence of tensorflow has no effect on how np operations are executed.

In your example above when you specify

tpuOperation = tf.contrib.tpu.batch_parallel(multiplicationComputation, [], num_shards=8)

where multiplicationComputation has no TPU specific code to be parallelized and it will run the way it would normally run when you call a multiplicationComputation - on CPU.

You will have to rewrite your code using TF operation to allow it to run on GPU. Tensorflow will translate your operations into TPU specific code.

If you want to easily compare TPUs to other hardware, I'd suggest using the estimator api .

TPUs are optimised to fit and do inference of ML models, so they can do matrix multiplications quickly, but any code which tries to assess this using double loops seems unlikely to give you a good sense of the capability of the chip.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM