σ šΔοYc@@sjdZddlmZddlZddlmZmZmZmZm Z m Z de fd„ƒYZ dS(s0Interface to runtime cuda kernel compile module.i(tabsolute_importNi(t_LIBt NDArrayHandlet RtcHandletmx_uinttc_arrayt check_calltRtccB@s)eZdZd„Zd„Zd„ZRS(sMXRtc object in mxnet. This class allow you to write CUDA kernels in Python and call them with NDArray. Parameters ---------- name : str Name of the kernel. inputs : tuple of (str, mxnet.ndarray) List of input names and ndarray. outputs : tuple of (str, mxnet.ndarray) List of output names and ndarray. kernel : str The actual kernel code. Note that this is only the body of the kernel, i.e. after { and before }. Rtc will decorate the kernel. For example, if ``name = "mykernel"`` and inputs = [('x', mx.nd.zeros((10,)))] outputs = [('y', mx.nd.zeros((10,)))] kernel = "y[threadIdx.x] = x[threadIdx.x];", then the compiled kernel will be: extern "C" __global__ mykernel(float *x, float *y) { const int x_ndim = 1; const int x_dims = { 10 }; const int y_ndim = 1; const int y_dims = { 10 }; y[threadIdx.x] = x[threadIdx.x]; } c C@sotƒ|_tjttjg|D]}|d^q"ƒtjtjƒƒ}tjttjg|D]}|d^qcƒtjtjƒƒ}tjttg|D]}|dj^q‘ƒtjtƒƒ}tjttg|D]}|dj^qίƒtjtƒƒ} tt j tj|ƒt t |ƒƒt t |ƒƒ|||| tj|ƒtj |jƒƒ ƒdS(Nii(RthandletctypestcastRtc_char_ptPOINTERRRRt MXRtcCreateRtlentbyref( tselftnametinputstoutputstkerneltit input_namest output_namest input_ndst output_nds((s)build/bdist.linux-armv7l/egg/mxnet/rtc.pyt__init__7s$ ,,,, cC@sttj|jƒƒdS(N(RRt MXRtcFreeR(R((s)build/bdist.linux-armv7l/egg/mxnet/rtc.pyt__del__KscC@stjttg|D]}|j^qƒtjtƒƒ}tjttg|D]}|j^qMƒtjtƒƒ}ttj|jt t |ƒƒt t |ƒƒ||t |dƒt |dƒt |dƒt |dƒt |dƒt |dƒƒ ƒdS(s€Run the kernel. Parameters ---------- inputs : list of NDArray List of inputs. Can contain different NDArrays than those used for the constructor, but its elements must have the same shapes and appear in the same order. outputs : list of NDArray List of outputs. Can contain different ndarrays than used for the constructor, but must have the same shapes and appear in the same order. grid_dims : tuple of 3 uint Grid dimension for kernel launch. block_dims : tuple of 3 uint Block dimension for kernel launch. iiiN( R R RRRR RRt MXRtcPushRR(RRRt grid_dimst block_dimsRRR((s)build/bdist.linux-armv7l/egg/mxnet/rtc.pytpushNs((     (t__name__t __module__t__doc__RRR (((s)build/bdist.linux-armv7l/egg/mxnet/rtc.pyRs  ( R#t __future__RR tbaseRRRRRRtobjectR(((s)build/bdist.linux-armv7l/egg/mxnet/rtc.pyts .