繁体   English   中英

无法在 macOS M1 Pro 芯片上运行 Tensorflow

[英]Can't get Tensorflow working on macOS M1 Pro Chip

我一直在尝试进入 ML,我想学习它的课程,但它需要 Tensorflow,我一直在努力让它在我的系统上运行。 我有配备 M1 Pro 芯片的 2021 14" 16GB Macbook Pro,我正在运行 Ventura 13.1。我一直在关注这篇文章,并深入研究如何让 Tensorflow 在 M1 上工作,但无济于事。我设法获得了 tensorflow-macos安装在我的环境中以及 tensorflow-metal 但是当我尝试在 Juyter 中运行一些示例代码时,我收到一个我不明白的错误。在 Jupyter 中,当我运行时:

import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

我得到

Num GPUs Available: 1

所以看起来我确实安装了 tensorflow 和金属,但是当我尝试运行代码的 rest 时,我得到:

TensorFlow version: 2.11.0
Num GPUs Available:  1
Metal device set to: Apple M1 Pro
WARNING:tensorflow:AutoGraph could not transform <function normalize_img at 0x14a4cec10> and will run it as-is.
Cause: Unable to locate the source code of <function normalize_img at 0x14a4cec10>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2022-12-13 13:54:33.658225: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-13 13:54:33.658309: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
WARNING:tensorflow:AutoGraph could not transform <function normalize_img at 0x14a4cec10> and will run it as-is.
Cause: Unable to locate the source code of <function normalize_img at 0x14a4cec10>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function normalize_img at 0x14a4cec10> and will run it as-is.
Cause: Unable to locate the source code of <function normalize_img at 0x14a4cec10>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Epoch 1/12
2022-12-13 13:54:34.162300: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-12-13 13:54:34.163015: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-13 13:54:35.383325: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.383350: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.389028: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.389049: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.401250: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.401274: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.405004: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
2022-12-13 13:54:35.405025: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x14a345660
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
File <timed exec>:45

File ~/conda/envs/mlp3/lib/python3.8/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/conda/envs/mlp3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     50 try:
     51   ctx.ensure_initialized()
---> 52   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     53                                       inputs, attrs, num_outputs)
     54 except core._NotOkStatusException as e:
     55   if name is not None:

NotFoundError: Graph execution error:

Detected at node 'StatefulPartitionedCall_6' defined at (most recent call last):
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel_launcher.py", line 17, in <module>
      app.launch_new_instance()
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/traitlets/config/application.py", line 992, in launch_instance
      app.start()
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 711, in start
      self.io_loop.start()
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 215, in start
      self.asyncio_loop.run_forever()
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
      self._run_once()
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
      handle._run()
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/asyncio/events.py", line 81, in _run
      self._context.run(self._callback, *self._args)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
      await self.process_one()
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 499, in process_one
      await dispatch(*args)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
      await result
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
      reply_content = await reply_content
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 411, in do_execute
      res = shell.run_cell(
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 531, in run_cell
      return super().run_cell(*args, **kwargs)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2940, in run_cell
      result = self._run_cell(
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2995, in _run_cell
      return runner(coro)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
      coro.send(None)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3194, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3373, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "/var/folders/k4/vgd34_w913ndkfkmvgssqgjr0000gn/T/ipykernel_16072/1016625245.py", line 1, in <module>
      get_ipython().run_cell_magic('time', '', 'import tensorflow as tf\nimport tensorflow_datasets as tfds\nprint("TensorFlow version:", tf.__version__)\nprint("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices(\'GPU\')))\ntf.config.list_physical_devices(\'GPU\')\n(ds_train, ds_test), ds_info = tfds.load(\n    \'mnist\',\n    split=[\'train\', \'test\'],\n    shuffle_files=True,\n    as_supervised=True,\n    with_info=True,\n)\ndef normalize_img(image, label):\n  """Normalizes images: `uint8` -> `float32`."""\n  return tf.cast(image, tf.float32) / 255., label\nbatch_size = 128\nds_train = ds_train.map(\n    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)\nds_train = ds_train.cache()\nds_train = ds_train.shuffle(ds_info.splits[\'train\'].num_examples)\nds_train = ds_train.batch(batch_size)\nds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)\nds_test = ds_test.map(\n    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)\nds_test = ds_test.batch(batch_size)\nds_test = ds_test.cache()\nds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)\nmodel = tf.keras.models.Sequential([\n  tf.keras.layers.Conv2D(32, kernel_size=(3, 3),\n                 activation=\'relu\'),\n  tf.keras.layers.Conv2D(64, kernel_size=(3, 3),\n                 activation=\'relu\'),\n  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),\n#   tf.keras.layers.Dropout(0.25),\n  tf.keras.layers.Flatten(),\n  tf.keras.layers.Dense(128, activation=\'relu\'),\n#   tf.keras.layers.Dropout(0.5),\n  tf.keras.layers.Dense(10, activation=\'softmax\')\n])\nmodel.compile(\n    loss=\'sparse_categorical_crossentropy\',\n    optimizer=tf.keras.optimizers.Adam(0.001),\n    metrics=[\'accuracy\'],\n)\nmodel.fit(\n    ds_train,\n    epochs=12,\n    validation_data=ds_test,\n)\n')
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2417, in run_cell_magic
      result = fn(*args, **kwargs)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/IPython/core/magics/execution.py", line 1321, in time
      out = eval(code_2, glob, local_ns)
    File "<timed exec>", line 45, in <module>
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/Users/imigh/conda/envs/mlp3/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_6'
could not find registered platform with id: 0x14a345660
     [[{{node StatefulPartitionedCall_6}}]] [Op:__inference_train_function_1261]

很抱歉只是转储了整个错误代码,但正如您所看到的,有些地方出了问题。 它似乎只运行第一个 Epoch,我不确定出了什么问题。 我已经按照该指南中的所有内容以及来自tensor flow-metal的说明进行操作。 我环顾四周似乎无处不在,但这是经过数小时的战斗后我所得到的。 我今天刚刚更新了我的 Mac,所以 Xcode 命令行工具应该是最新的。 任何和所有建议或帮助我破译错误代码将不胜感激。 我只想学习机器学习,但如果没有这个工作,我什至无法完成我的课程。

我已经多次为 M1 卸载并重新安装 Conda Miniforge。 我已经在空白环境中创建并尝试了这些步骤。 我已经按照上面链接的指南中列出的步骤进行了多次检查。 我最初遇到 numpy、h5py、grcio 和 protobuf 的一些问题,但在修改这些版本后,我不再收到它们的错误代码,所以我不确定这是否一切都好,但我没有看到任何明确提及。 我也跑过

conda install -c conda-forge openblas

从有类似问题的人那里查看StackOverflow 的此页面后,我仍然收到此错误。

由于 Metal 的 PluggableDevice 实现存在差距, Apple 开发者论坛上提出了一个类似的问题,并提供了使用tf.keras.optimizers.legacy.Adam()解决此问题的解决方案。

或者,在使用pip安装时指定开始使用 tensorflow-metal中提到的已发布版本。

python -m pip install tensorflow-macos==2.9.0
python -m pip install tensorflow-metal==0.5.0

祝你好运!

怀疑tensorflow-macos不太适合 GPU,使用 CPU only with

with tf.device('/cpu:0'):

或者

tf.config.set_visible_devices([], 'GPU')

可以解决我的问题。 它可以按预期处理。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM