标签[pycuda] - 堆栈内存溢出

cuLaunchKernel 失败：请求启动的资源太多 - cuLaunchKernel failed: too many resources requested for launch

我一直在尝试使用 pycuda 并行化我的代码。我需要初始化 10^5 个线程，每个线程运行大约 4000 次迭代。这应该符合我的 GPU 的块和网格限制（网格 = (98,1,1)，块 = (1024,1,1)）。但是执行该程序会出现以下错误：“cuLaunchKernel 失败：请求启动的 ...

在 windows 上安装 pycuda - Installing pycuda on windows

所以我尝试python -m pip install pycuda但它失败了（这是失败安装中的一些 output）：我已经安装了 Visual Studio 2022 构建工具，但我没有在我的路径中添加任何东西，这是我认为我必须做的所有事情，但我不知道我必须添加什么。 pycuda wiki的 ...

PyCuda C++ kernel“错误：此声明可能没有外部“C”链接” - PyCuda C++ kernel "error: this declaration may not have extern "C" linkage"

我尝试在我的 kernel 代码中使用std::tuple ，但收到很多error: this declaration may not have extern "C" linkage指向utility和tuple的外部“C”链接错误它抱怨包含。以下是我的重述。我是否需要在我的 kernel 代 ...

编译 Cuda - nvcc 找不到支持的 Microsoft Visual Studio 版本 - Compiling Cuda - nvcc cannot find a supported version of Microsoft Visual Studio

我最近将 CUDA 更新为 11.6，现在当我尝试使用 pyCuda 时，我得到了 pycuda.driver.CompileError: nvcc 编译 C:\Users\imsog\AppData\Local\Temp\tmpkgtu92cq\kernel.cu 失败 [命令：nvcc --cu ...

RuntimeError 未检测到支持 CUDA 的设备 - RuntimeError no CUDA-capable device is detected

我以前导入过 pycuda.autoinit 模块，但是当我尝试运行代码时，我仍然收到错误 ----> 5 cuda.init() 6 7 from pycuda.tools import make_default_context # noqa: E402 RuntimeError: cu ...

无法为 pycuda 制造轮子 - Could not build wheels for pycuda

在 ubuntu 上安装 pip pycuda 时收到错误如下： ...

PyCuda 索引 Numpy 整数数组中的错误 - Errors in PyCuda indexing Numpy array of integers

我正在将我的第一步移入 PyCuda 以执行一些并行计算，但遇到了我不理解的行为。我从可以在 PyCuda 官方网站上找到的非常基本的教程开始（一个简单的脚本，可以将数组https://documen.tician.de/pycuda/tutorial.html的所有元素加倍）。代码如下：很清 ...

在 scikit-cuda 中删除 FFT 计划会破坏 pycuda 上下文 - Deleting an FFT plan in scikit-cuda destroys the pycuda context

我想一起使用pycuda和scikit-cuda中的 FFT 函数。下面的代码创建一个skcuda.fft.Plan ，删除该计划，然后尝试分配一个pycuda.gpuarray.GPUArray 。 import pycuda.autoinit import numpy as ...

为什么sudo下导入pycuda.driver会导致“libcurand.so.10: cannot open shared object file: No such file or directory” - Why does importing pycuda.driver under sudo lead to "libcurand.so.10: cannot open shared object file: No such file or directory"

我正在尝试在 python 脚本中导入 pycuda-2021.1。我的操作系统是 Ubuntu 18.04。我安装了 cuda 工具包 11.2，我的 nvidia 驱动程序版本是 460.27.04。我的 python 解释器是 Python 3.8。当我执行我似乎能够在不执行 su ...

如何在 CUDA 中使用二进制创建程序？ - how Create program using binary in CUDA?

我在 OpenCL 中有代码，我在其中使用 clCreateProgramWithBinary() 从二进制文件创建程序。我正在将此应用程序移植到 CUDA，但我没有找到任何类似的 function。有人可以帮助我如何从 CUDA 中的二进制或等效的 clCreateProgramWithBi ...

从现有的 GPU 指针构造 Cupy 数组 - Cupy array construction from existing GPU pointer

我想构建 GPU 上已经存在的数组的 Cupy GPU 数组视图，我得到了以下内容：指向数组的指针。我知道数据类型和数据大小。我也得到了一个音高。如何构建数组视图（最好避免复制）？我尝试了以下内容：import cupy as cp import numpy as np shape = ...

PyCUDA——导入 pycuda.driver 时出现问题 - PyCUDA -- problems importing pycuda.driver

Windows 10 Python 3.8 CUDA 11.5 我已经从这个文件中安装了我认为是匹配的 pycuda：pycuda-2021.1+cuda115-cp38-cp38-win_amd64.whl 这个简单的例子失败了出现此错误： NVCC 在我的路上添加os.add_dll_ ...

如何在pycuda中设置stream的优先级？ - How to set the priority of a stream in pycuda?

标题说明了一切，但这是我的问题的更详细信息：我正在 python + pycuda 中实现一个应该在分布式系统上运行的有限元求解器。为了隐藏通信延迟，我试图重叠计算和通信（使用 2 个独立的流）。我的问题是用于通信的内核（在一个流上）是在主要计算 kernel 结束时执行的（见下图）。我的 ...

Pycuda 或 cuda 编译失败 - Pycuda or cuda failed to compile

当我尝试运行使用 CUDA 的 Python 代码时，我遇到了以下错误：环境设置： Python：3.6.13 PyCuda：2020.1 CUDA 工具包：10.1/ 11.1 MSVC: 2019 我使用 pycuda-2020.1+cuda101-cp36-cp36m-win_amd ...

索引错误：pycuda 中的“轴 0 中的子索引无效” - index-error: "invalid subindex in axis 0" with pycuda

在使用带有 for 循环的 PyCUDA GPUarrays 后，我遇到了一个错误。我定义了一个使用 for 循环的 function PropagatorS，它在我刚刚使用 numpy 时运行良好，但在切换到 cuda 后运行良好。尝试一些值：返回"IndexError: 轴 0 中的无效 ...

错误：未找到“pycuda”发行版，应用程序需要该发行版 - Error: The 'pycuda' distribution was not found and is required by the application

是什么导致了这个错误？ ...

安装 Cython 包错误：无法编译 - Installing Cython Packages Error: Cannot compile

我正在通过 git+https://github.com/benfred/implicit.git@f33d2e7d753f3ab4da0901485bd68e47dba7b9eb 安装implicit 我在安装过程中遇到了这个错误我使用 tensorflow/tensorflow:2.3.0-g ...

为 Pycuda 发布 memory - Release memory for Pycuda

如何在 Pycuda function 调用后释放 memory？例如在下面，我如何释放 a_gpu 使用的 memory 那么我将有足够的 memory 分配给 b_gpu 而不是出现如下错误？我尝试from pycuda.tools import PooledDeviceAllocatio ...

Pycuda在函数参数中按值声明大小的数组时返回错误 - Pycuda returns error when declaring array with size by value in function parameter

当使用int d[1];尝试以下代码时int d[1]; ，它工作正常，但使用int d[in_integer]; 或int c[in_matrix[0]]; 它导致 nvcc 编译失败。我可以看看是否有人可以建议为什么？是否可以在 pycuda 中声明数组，其大小由函数参数值决定？错误 ...

通过 GPU 内核并行化 Pandas df.iterrows() - Paralleize Pandas df.iterrows() by GPU kernel

我编写了一个 python 程序，在该程序中我需要检查给定值是否在给定数据集的列中。为此，我需要遍历每一行并检查每一行中列的相等性。这需要很多时间，因此我想在 GPU 中运行它。我有 CUDA C/C++ 的经验，但没有 PyCuda 的并行化经验。谁能帮我解决这个问题？注意：这是 ...