# TOPI Recipe: TVM Operator Optimization Recipes TOPI is the operator collection library for TVM intended at sharing the effort of crafting and optimizing tvm generated kernels. The goal: - Provide sugars for operator declaration - Give common primitives for fused op creation. - Provide commonly used schedules under each architectures ## Guidelines - Use numpy-style naming convention for known ops - Seperate operator declaration from schedule when possible. - This can be inconvenient but enables more general scheduling across ops. - We can always recover the tensors from its outputs by traversing the tree. - Deliberately assert the requirements - Some kernels have requirements on shape and data layout, assert them - Data layout aware, if not specified in argument or in function, assume NCHW by default. ## Performance Tuning Workflow Since TVM is work in progress, some optimization might not be perfect. One quick way I find useful is to do codegen plus manual modification. The workflow is: - Generate the GPU kernels, write them into a file, say ```perf/matexp_generated.cu``` - Copy the generated file into another one, say ```perf/matexp_manual.cu```, do modifications according to your intuition. - Set use_manual flag in the script to continue the codegen workflow as normal, but piggy back the manual written code instead. - Observe the performance difference. - If the performance improves, mark the manual code and think of optimization pass to generate the desired target code.