Numba python gpu amd

Содержание

1.3. Installation¶
1.3.2. Installing using conda on x86/x86_64/POWER Platforms¶
1.3.3. Installing using pip on x86/x86_64 Platforms¶
1.3.4. Enabling AMD ROCm GPU Support¶
1.3.5. Installing from source¶
1.3.6. Checking your installation¶
Overview¶
Terminology¶
Requirements¶
Installation¶
Device management¶
Device Selection¶
The Device List¶
Supported Python features in CUDA Python¶
Language¶
Execution Model¶
Constructs¶
Built-in types¶
Built-in functions¶
Standard library modules¶
cmath ¶
math ¶
operator ¶
Numpy support¶

1.3. Installation¶

Numba is compatible with Python 2.7 and 3.5 or later, and Numpy versions 1.7 to 1.15.

Our supported platforms are:

Linux x86 (32-bit and 64-bit)
Linux ppcle64 (POWER8)
Windows 7 and later (32-bit and 64-bit)
OS X 10.9 and later (64-bit)
NVIDIA GPUs of compute capability 2.0 and later
AMD ROC dGPUs (linux only and not for AMD Carrizo or Kaveri APU)

Automatic parallelization with @jit is only available on 64-bit platforms, and is not supported in Python 2.7 on Windows.

1.3.2. Installing using conda on x86/x86_64/POWER Platforms¶

The easiest way to install Numba and get updates is by using conda , a cross-platform package manager and software distribution maintained by Anaconda, Inc. You can either use Anaconda to get the full stack in one download, or Miniconda which will install the minimum packages required for a conda environment.

Once you have conda installed, just type:

Note that Numba, like Anaconda, only supports PPC in 64-bit little-endian mode.

To enable CUDA GPU support for Numba, install the latest graphics drivers from NVIDIA for your platform. (Note that the open source Nouveau drivers shipped by default with many Linux distributions do not support CUDA.) Then install the cudatoolkit package:

$ conda install cudatoolkit

You do not need to install the CUDA SDK from NVIDIA.

Читайте также: Как оформить button css

1.3.3. Installing using pip on x86/x86_64 Platforms¶

Binary wheels for Windows, Mac, and Linux are also available from PyPI. You can install Numba using pip :

This will download all of the needed dependencies as well. You do not need to have LLVM installed to use Numba (in fact, Numba will ignore all LLVM versions installed on the system) as the required components are bundled into the llvmlite wheel.

To use CUDA with Numba installed by pip , you need to install the CUDA SDK from NVIDIA. Then you may need to set the following environment variables so Numba can locate the required libraries:

NUMBAPRO_CUDA_DRIVER — Path to the CUDA driver shared library file
NUMBAPRO_NVVM — Path to the CUDA libNVVM shared library file
NUMBAPRO_LIBDEVICE — Path to the CUDA libNVVM libdevice directory which contains .bc files

1.3.4. Enabling AMD ROCm GPU Support¶

The ROCm Platform allows GPU computing with AMD GPUs on Linux. To enable ROCm support in Numba, conda is required, so begin with an Anaconda or Miniconda installation with Numba 0.40 or later installed. Then:

Follow the ROCm installation instructions.
Install roctools conda package from the numba channel:

$ conda install -c numba roctools

See the roc-examples repository for sample notebooks.

1.3.5. Installing from source¶

Installing Numba from source is fairly straightforward (similar to other Python packages), but installing llvmlite can be quite challenging due to the need for a special LLVM build. If you are building from source for the purposes of Numba development, see Build environment for details on how to create a Numba development environment with conda.

If you are building Numba from source for other reasons, first follow the llvmlite installation guide. Once that is completed, you can download the latest Numba source code from Github:

$ git clone git://github.com/numba/numba.git

Source archives of the latest release can also be found on PyPI. In addition to llvmlite , you will also need:

A C compiler compatible with your Python installation. If you are using Anaconda, you can install the Linux compiler conda packages gcc_linux-64 and gxx_linux-64 , or macOS packages clang_osx-64 and clangxx_osx-64 .
NumPy

Then you can build and install Numba from the top level of the source tree:

1.3.6. Checking your installation¶

You should be able to import Numba from the Python prompt:

$ python Python 2.7.15 |Anaconda custom (x86_64)| (default, May 1 2018, 18:37:05) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numba >>> numba.__version__ '0.39.0+0.g4e49566.dirty'

You can also try executing the numba -s command to report information about your system capabilities:

$ numba -s System info: -------------------------------------------------------------------------------- __Time Stamp__ 2018-08-28 15:46:24.631054 __Hardware Information__ Machine : x86_64 CPU Name : haswell CPU Features : aes avx avx2 bmi bmi2 cmov cx16 f16c fma fsgsbase lzcnt mmx movbe pclmul popcnt rdrnd sse sse2 sse3 sse4.1 sse4.2 ssse3 xsave xsaveopt __OS Information__ Platform : Darwin-17.6.0-x86_64-i386-64bit Release : 17.6.0 System Name : Darwin Version : Darwin Kernel Version 17.6.0: Tue May 8 15:22:16 PDT 2018; root:xnu-4570.61.1~1/RELEASE_X86_64 OS specific info : 10.13.5 x86_64 __Python Information__ Python Compiler : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final) Python Implementation : CPython Python Version : 2.7.15 Python Locale : en_US UTF-8 __LLVM information__ LLVM version : 6.0.0 __CUDA Information__ Found 1 CUDA devices id 0 GeForce GT 750M [SUPPORTED] compute capability: 3.0 pci device id: 0 pci bus id: 1

(output truncated due to length)

Источник

Overview¶

Numba supports AMD ROC GPU programming by directly compiling a restricted subset of Python code into HSA kernels and device functions following the HSA execution model. Kernels written in Numba appear to have direct access to NumPy arrays.

Terminology¶

Several important terms in the topic of HSA programming are listed here:

kernels: a GPU function launched by the host and executed on the device
device function: a GPU function executed on the device which can only be called from the device (i.e. from a kernel or another device function)

Requirements¶

This document describes the requirements for using ROC. Essentially an AMD dGPU is needed (Fiji, Polaris and Vega families) and a CPU which supports PCIe Gen3 and PCIe Atomics (AMD Ryzen and EPYC, and Intel CPUs >= Haswell), full details are in the linked document. Further a linux operating system is needed, those supported and tested are also listed in the linked document.

Installation¶

Follow this document for installation instructions to enable ROC support for the system. Be sure to use the binary packages for the system’s linux distribution to simplify the process. At this point the install should be tested by running:

the output of which should list at least two HSA Agents, at least one of which should be a CPU and at least one of which should be a dGPU.

Assuming the installation is working correctly, the ROC support for Numba is provided by the roctools package which can be installed via conda , along with Numba, from the Numba channel as follows (creating an env called numba_roc ):

$ conda create -n numba_roc -c numba numba roctools

Activating the env, and then running the Numba diagnostic tool should confirm that Numba is running with ROC support enabled, e.g.:

$ source activate numba_roc $ numba -s

The output of numba -s should contain a section similar to:

__ROC Information__ ROC available : True Available Toolchains : librocmlite library, ROC command line tools Found 2 HSA Agents: Agent id : 0 vendor: CPU name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz type: CPU Agent id : 1 vendor: AMD name: gfx803 type: GPU Found 1 discrete GPU(s) : gfx803

confirming that ROC is available, listing the available toolchains and displaying the HSA Agents and dGPU count.

Источник

Device management¶

For multi-GPU machines, users may want to select which GPU to use. By default the CUDA driver selects the fastest GPU as the device 0, which is the default device used by Numba.

The features introduced on this page are generally not of interest unless working with systems hosting/offering more than one CUDA-capable GPU.

Device Selection¶

If at all required, device selection must be done before any CUDA feature is used.

from numba import cuda cuda.select_device(0)

The device can be closed by:

Users can then create a new context with another device.

cuda.select_device(1) # assuming we have 2 GPUs

Create a new CUDA context for the selected device_id. device_id should be the number of the device (starting from 0; the device order is determined by the CUDA libraries). The context is associated with the current thread. Numba currently allows only one context per thread.

If successful, this function returns a device instance.

Explicitly close all contexts in the current thread.

Compiled functions are associated with the CUDA context. This makes it not very useful to close and create new devices, though it is certainly useful for choosing which device to use when the machine has multiple GPUs.

The Device List¶

The Device List is a list of all the GPUs in the system, and can be indexed to obtain a context manager that ensures execution on the selected GPU.

numba.cuda. gpus numba.cuda.cudadrv.devices. gpus ¶

numba.cuda.gpus is an instance of the _DeviceList class, from which the current GPU context can also be retrieved:

class numba.cuda.cudadrv.devices. _DeviceList property current

Returns the active device or None if there’s no active device

Источник

Supported Python features in CUDA Python¶

This page lists the Python features supported in the CUDA Python. This includes all kernel and device functions compiled with @cuda.jit and other higher level Numba decorators that targets the CUDA GPU.

Language¶

Execution Model¶

CUDA Python maps directly to the single-instruction multiple-thread execution (SIMT) model of CUDA. Each instruction is implicitly executed by multiple threads in parallel. With this execution model, array expressions are less useful because we don’t want multiple threads to perform the same task. Instead, we want threads to perform a task in a cooperative fashion.

For details please consult the CUDA Programming Guide.

Constructs¶

The following Python constructs are not supported:

Exception handling ( try .. except , try .. finally )
Context management (the with statement)
Comprehensions (either list, dict, set or generator comprehensions)
Generator (any yield statements)

The raise statement is supported.

The assert statement is supported, but only has an effect when debug=True is passed to the numba.cuda.jit() decorator. This is similar to the behavior of the assert keyword in CUDA C/C++, which is ignored unless compiling with device debug turned on.

Printing of strings, integers, and floats is supported, but printing is an asynchronous operation — in order to ensure that all output is printed after a kernel launch, it is necessary to call numba.cuda.synchronize() . Eliding the call to synchronize is acceptable, but output from a kernel may appear during other later driver operations (e.g. subsequent kernel launches, memory transfers, etc.), or fail to appear before the program execution completes.

Built-in types¶

The following built-in types support are inherited from CPU nopython mode.

Built-in functions¶

The following built-in functions are supported:

abs()
bool
complex
enumerate()
float
int : only the one-argument form
len()
min() : only the multiple-argument form
max() : only the multiple-argument form
range
round()
zip()

Standard library modules¶

cmath ¶

The following functions from the cmath module are supported:

math ¶

The following functions from the math module are supported:

operator ¶

The following functions from the operator module are supported:

operator.add()
operator.and_()
operator.eq()
operator.floordiv()
operator.ge()
operator.gt()
operator.iadd()
operator.iand()
operator.ifloordiv()
operator.ilshift()
operator.imod()
operator.imul()
operator.invert()
operator.ior()
operator.ipow()
operator.irshift()
operator.isub()
operator.itruediv()
operator.ixor()
operator.le()
operator.lshift()
operator.lt()
operator.mod()
operator.mul()
operator.ne()
operator.neg()
operator.not_()
operator.or_()
operator.pos()
operator.pow()
operator.rshift()
operator.sub()
operator.truediv()
operator.xor()

Numpy support¶

Due to the CUDA programming model, dynamic memory allocation inside a kernel is inefficient and is often not needed. Numba disallows any memory allocating features. This disables a large number of NumPy APIs. For best performance, users should write code such that each thread is dealing with a single element at a time.

accessing ndarray attributes .shape , .strides , .ndim , .size , etc..
scalar ufuncs that have equivalents in the math module; i.e. np.sin(x[0]) , where x is a 1D array.
indexing and slicing works.

Unsupported numpy features:

Источник