ó ùµÈ[c@@sLdZddlmZddlmZddlZddlZddlZddlm Z m Z m Z m Z m Z mZddlmZmZmZmZmZddlmZmZmZi ejd 6ejd 6ejd 6ejd 6ejd 6ejd6ejd6ejd6ejd6Zde fd„ƒYZ!de fd„ƒYZ"dS(s0Interface to runtime cuda kernel compile module.i(tabsolute_import(tarrayNi(t_LIBtmx_uinttc_arrayt c_array_buft c_str_arrayt check_call(tc_strtCudaModuleHandletCudaKernelHandlet numeric_typest string_types(t_DTYPE_NP_TO_MXt_DTYPE_MX_TO_NPtNDArraytfloattdoublet__halftuint8_ttinttint32_ttint8_ttchartint64_tt CudaModulecB@s/eZdZddd„Zd„Zd„ZRS(sÌCompile and run CUDA code from Python. In CUDA 7.5, you need to prepend your kernel definitions with 'extern "C"' to avoid name mangling:: source = r''' extern "C" __global__ void axpy(const float *x, float *y, float alpha) { int i = threadIdx.x + blockIdx.x * blockDim.x; y[i] += alpha * x[i]; } ''' module = mx.rtc.CudaModule(source) func = module.get_kernel("axpy", "const float *x, float *y, float alpha") x = mx.nd.ones((10,), ctx=mx.gpu(0)) y = mx.nd.zeros((10,), ctx=mx.gpu(0)) func.launch([x, y, 3.0], mx.gpu(0), (1, 1, 1), (10, 1, 1)) print(y) Starting from CUDA 8.0, you can instead export functions by name. This also allows you to use templates:: source = r''' template __global__ void axpy(const DType *x, DType *y, DType alpha) { int i = threadIdx.x + blockIdx.x * blockDim.x; y[i] += alpha * x[i]; } ''' module = mx.rtc.CudaModule(source, exports=['axpy', 'axpy']) func32 = module.get_kernel("axpy", "const float *x, float *y, float alpha") x = mx.nd.ones((10,), dtype='float32', ctx=mx.gpu(0)) y = mx.nd.zeros((10,), dtype='float32', ctx=mx.gpu(0)) func32.launch([x, y, 3.0], mx.gpu(0), (1, 1, 1), (10, 1, 1)) print(y) func64 = module.get_kernel("axpy", "const double *x, double *y, double alpha") x = mx.nd.ones((10,), dtype='float64', ctx=mx.gpu(0)) y = mx.nd.zeros((10,), dtype='float64', ctx=mx.gpu(0)) func32.launch([x, y, 3.0], mx.gpu(0), (1, 1, 1), (10, 1, 1)) print(y) Parameters ---------- source : str Complete source code. options : tuple of str Compiler flags. For example, use "-I/usr/local/cuda/include" to add cuda headers to include path. exports : tuple of str Export kernel names. c C@s’t|tƒr|f}nt|tƒr6|f}ntƒ|_ttjt|ƒt|ƒt |ƒt|ƒt |ƒt j |jƒƒƒdS(N( t isinstanceR R thandleRRtMXRtcCudaModuleCreateRtlenRtctypestbyref(tselftsourcetoptionstexports((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pyt__init___s         cC@sttj|jƒƒdS(N(RRtMXRtcCudaModuleFreeR(R ((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pyt__del__msc C@sºtƒ}g}g}g}tjdƒ}tjdd|ƒjdƒ}xà|D]Ø} |j| ƒ} | s| jƒddkr”td| ƒ‚n|jt | jƒdƒƒ| jƒd} |jt | jƒd ƒƒ| t krt d | dj t j ƒƒfƒ‚n|jtt | ƒqOWttj|jt|ƒt|ƒttjtd |ƒƒttjtd |ƒƒttjtd |ƒƒtj|ƒƒƒt||||ƒS( søGet CUDA kernel from compiled module. Parameters ---------- name : str String name of the kernel. signature : str Function signature for the kernel. For example, if a kernel is declared as:: extern "C" __global__ void axpy(const float *x, double *y, int alpha) Then its signature should be:: const float *x, double *y, int alpha or:: const float *, double *, int Note that `*` in signature marks an argument as array and `const` marks an argument as constant (input) array. Returns ------- CudaKernel CUDA kernels that can be launched on GPUs. s/^\s*(const)?\s*([\w_]+)\s*(\*)?\s*([\w_]+)?\s*$s\s+t t,itconstsQInvalid function prototype "%s". Must be in the form of "(const) type (*) (name)"iis=Unsupported kernel argument type %s. Supported types are: %s.ti(R tretcompiletsubtsplittmatchtgroupst ValueErrortappendtboolt_DTYPE_CPP_TO_NPt TypeErrortjointkeysR RRtMXRtcCudaKernelCreateRRRRRtc_intRRt CudaKernel( R tnamet signaturethdlt is_ndarraytis_consttdtypestpatterntargstargR/tdtype((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pyt get_kernelps:    "   (((t__name__t __module__t__doc__R$R&RE(((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pyR*s4 R:cB@s,eZdZd„Zd„Zdd„ZRS(skConstructs CUDA kernel. Should be created by `CudaModule.get_kernel`, not intended to be used by users.cC@s?||_||_||_g|D]}t|^q"|_dS(N(Rt_namet _is_ndarrayRt_dtypes(R RR;R>R@R*((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pyR$°s   cC@sttj|jƒƒdS(N(RRtMXRtcCudaKernelFreeR(R ((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pyR&¶sic C@s|jdkstdƒ‚t|ƒdks9tdƒ‚t|ƒdksWtdƒ‚t|ƒt|jƒkstd|jt|jƒt|ƒfƒ‚g}g}xàtt||j|jƒƒD]À\}\} } } | r!t| t ƒstd|t | ƒfƒ‚|j | j ƒqÈt| t ƒsLtd|t | ƒfƒ‚|j tj| d| ƒƒ|j |d jjtjƒƒqÈWttj|j |jttj|ƒt|d ƒt|d ƒt|d ƒt|d ƒt|d ƒt|d ƒt|ƒƒ ƒd S(s®Launch cuda kernel. Parameters ---------- args : tuple of NDArray or numbers List of arguments for kernel. NDArrays are expected for pointer types (e.g. `float*`, `double*`) while numbers are expected for non-pointer types (e.g. `int`, `float`). ctx : Context The context to launch kernel on. Must be GPU context. grid_dims : tuple of 3 integers Grid dimensions for CUDA kernel. block_dims : tuple of 3 integers Block dimensions for CUDA kernel. shared_mem : integer, optional Size of dynamically allocated shared memory. Defaults to 0. tgpus'Cuda kernel can only be launched on GPUis'grid_dims must be a tuple of 3 integerss.CudaKernel(%s) expects %d arguments but got %ds9The %d-th argument is expected to be a NDArray but got %ss9The %d-th argument is expected to be a number, but got %sRDiÿÿÿÿiiiN(t device_typetAssertionErrorRRKRIt enumeratetzipRJRRttypeR2RR tnpRRtdata_astc_void_pRRtMXRtcCudaKernelCallt device_idRR( R RBtctxt grid_dimst block_dimst shared_memt void_argst ref_holderR*RCtis_ndRD((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pytlaunch¹s4%4$ ''(RFRGRHR$R&R_(((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pyR:­s  (#RHt __future__RRR+RtnumpyRStbaseRRRRRRRR R R R tndarrayR RRtfloat32tfloat64tfloat16tuint8tint32tint8tint64R4tobjectRR:(((sI/usr/local/lib/python2.7/site-packages/mxnet-1.3.1-py2.7.egg/mxnet/rtc.pyts(   .(         ƒ