# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. """ Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU ======================================================================== **Author**: `Grant Watson `_ This section contains an example of how to use TVM to run a model on an Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU, using bare metal. The Cortex(R)-M55 is a small, low-power CPU designed for use in embedded devices. The Ethos(TM)-U55 is a microNPU, specifically designed to accelerate ML inference in resource-constrained embedded devices. In order to run the demo application without having access to a Cortex(R)-M55 and Ethos(TM)-U55 development board, we will be running our sample application on a Fixed Virtual Platform (FVP). The FVP based on Arm(R) Corstone(TM)-300 software, models a hardware system containing a Cortex(R)-M55 and Ethos(TM)-U55. It provides a programmer's view that is suitable for software development. In this tutorial, we will be compiling a MobileNet v1 model and instructing TVM to offload operators to the Ethos(TM)-U55 where possible. """ ################################################################################ # Obtaining TVM # ------------- # # To obtain TVM for you platform, please visit https://tlcpack.ai/ and follow the # instructions. Once TVM has been installed correctly, you should have access to # ``tvmc`` from the command line. # # Typing ``tvmc`` on the command line should display the following: # # .. code-block:: text # # usage: tvmc [-h] [-v] [--version] {tune,compile,run} ... # # TVM compiler driver # # optional arguments: # -h, --help show this help message and exit # -v, --verbose increase verbosity # --version print the version and exit # # commands: # {tune,compile,run} # tune auto-tune a model # compile compile a model. # run run a compiled module # # TVMC - TVM driver command-line interface # ################################################################################ # Installing additional python dependencies # ----------------------------------------- # # In order to run the demo, you will need some additional python packages. # These can be installed by using the requirements.txt file below: # # .. code-block:: text # :caption: requirements.txt # :name: requirements.txt # # attrs==21.2.0 # cloudpickle==2.0.0 # decorator==5.1.0 # ethos-u-vela==2.1.1 # flatbuffers==1.12 # lxml==4.6.3 # nose==1.3.7 # numpy==1.19.5 # Pillow==8.3.2 # psutil==5.8.0 # scipy==1.5.4 # synr==0.4 # tflite==2.4.0 # tornado==6.1 # # These packages can be installed by running the following from the command line: # # .. code-block:: bash # # pip install -r requirements.txt # ################################################################################ # Obtaining the Model # ------------------- # # For this tutorial, we will be working with MobileNet v1. # MobileNet v1 is a convolutional neural network designed to classify images, # that has been optimized for edge devices. The model we will be using has been # pre-trained to classify images into one of 1001 different categories. # The network has an input image size of 224x224 so any input images will need # to be resized to those dimensions before being used. # # For this tutorial we will be using the model in Tflite format. # # .. code-block:: bash # # mkdir -p ./build # cd build # wget https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz # gunzip mobilenet_v1_1.0_224_quant.tgz # tar xvf mobilenet_v1_1.0_224_quant.tar # ################################################################################ # Compiling the model for Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU # ---------------------------------------------------------------------- # # Once we've downloaded the MobileNet v1 model, the next step is to compile it. # To accomplish that, we are going to use ``tvmc compile``. The output we get from # the compilation process is a TAR package of the model compiled to the Model # Library Format (MLF) for our target platform. We will be able to run that model # on our target device using the TVM runtime. # # .. code-block:: bash # # tvmc compile --target="ethos-u -accelerator_config=ethos-u55-256, c" \ # --target-c-mcpu=cortex-m55 \ # --runtime=crt \ # --executor=aot \ # --executor-aot-interface-api=c \ # --executor-aot-unpacked-api=1 \ # --pass-config tir.disable_vectorize=1 \ # ./mobilenet_v1_1.0_224_quant.tflite \ # --output-format=mlf # ################################################################################ # .. note:: Explanation of tvmc compile arguments: # # * ``--target="ethos-u -accelerator_config=ethos-u55-256, c"`` : offload operators to the Ethos(TM)-U55 NPU where possible and fall back to using generated C code on the Cortex(R)-M where an operator is not supported on the NPU.. # # * ``--target-c-mcpu=cortex-m55`` : Cross-compile for the Cortex(R)-M55. # # * ``--runtime=crt`` : Generate glue code to allow operators to work with C runtime. # # * ``--executor=aot`` : Use Ahead Of Time compiltaion instead of the Graph Executor. # # * ``--executor-aot-interface-api=c`` : Generate a C-style interface with structures designed for integrating into C apps at the boundary. # # * ``--executor-aot-unpacked-api=1`` : Use the unpacked API internally. # # * ``--pass-config tir.disable_vectorize=1`` : Disable vectorize since there are no standard vectorized types in C. # # * ``./mobilenet_v1_1.0_224_quant.tflite`` : The TFLite model that is being compiled. # # * ``--output-format=mlf`` : Output should be generated in the Model Library Format. # ################################################################################ # Extracting the generated code into the current directory # -------------------------------------------------------- # # .. code-block:: bash # # tar xvf module.tar # ################################################################################ # Getting ImageNet labels # ----------------------- # # When running MobileNet v1 on an image, the result is an index in the range 0 to # 1000. In order to make our application a little more user friendly, instead of # just displaying the category index, we will display the associated label. We # will download these image labels into a text file now and use a python script # to include them in our C application later. # # .. code-block:: bash # # curl -sS https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/java/demo/app/src/main/assets/labels_mobilenet_quant_v1_224.txt \ # -o ./labels_mobilenet_quant_v1_224.txt # ################################################################################ # Getting the input image # ----------------------- # # As input for this tutorial, we will use the image of a cat, but you can # substitute an image of your choosing. # # .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg # :height: 224px # :width: 224px # :align: center # # We download the image into the build directory and we will use a python script # in the next step to convert the image into an array of bytes in a C header file. # # .. code-block:: bash # # curl -sS https://s3.amazonaws.com/model-server/inputs/kitten.jpg -o ./kitten.jpg # ################################################################################ # Pre-processing the image # ------------------------ # # The following script will create 2 C header files in the src directory: # # * ``inputs.h`` - The image supplied as an argument to the script will be converted # to an array of integers for input to our MobileNet v1 model. # * ``outputs.h`` - An integer array of zeroes will reserve 1001 integer values # for the output of inference. # # .. code-block:: python # :caption: convert_image.py # :name: convert_image.py # # #!python ./convert_image.py # import os # import pathlib # import re # import sys # from PIL import Image # import numpy as np # # # def create_header_file(name, section, tensor_name, tensor_data, output_path): # """ # This function generates a header file containing the data from the numpy array provided. # """ # file_path = pathlib.Path(f"{output_path}/" + name).resolve() # # Create header file with npy_data as a C array # raw_path = file_path.with_suffix(".h").resolve() # with open(raw_path, "w") as header_file: # header_file.write( # "#include \n" # + f"const size_t {tensor_name}_len = {tensor_data.size};\n" # + f'uint8_t {tensor_name}[] __attribute__((section("{section}"), aligned(16))) = "' # ) # data_hexstr = tensor_data.tobytes().hex() # for i in range(0, len(data_hexstr), 2): # header_file.write(f"\\x{data_hexstr[i:i+2]}") # header_file.write('";\n\n') # # # def create_headers(image_name): # """ # This function generates C header files for the input and output arrays required to run inferences # """ # img_path = os.path.join("./", f"{image_name}") # # # Resize image to 224x224 # resized_image = Image.open(img_path).resize((224, 224)) # img_data = np.asarray(resized_image).astype("float32") # # # Convert input to NCHW # img_data = np.transpose(img_data, (2, 0, 1)) # # # Create input header file # input_data = img_data.astype(np.uint8) # create_header_file("inputs", "ethosu_scratch", "input", input_data, "./include") # # Create output header file # output_data = np.zeros([1001], np.uint8) # create_header_file( # "outputs", # "output_data_sec", # "output", # output_data, # "./include", # ) # # # if __name__ == "__main__": # create_headers(sys.argv[1]) # # Run the script from the command line: # # .. code-block:: bash # # python convert_image.py ./kitten.jpg ################################################################################ # Pre-processing the labels # ------------------------- # # The following script will create a ``labels.h`` header file in the src directory. # The labels.txt file that we downloaded previously will be turned # into an array of strings. This array will be used to display the label that # our image has been classified as. # # .. code-block:: python # :caption: convert_labels.py # :name: convert_labels.py # # #!python ./convert_labels.py # import os # import pathlib # import sys # # # def create_labels_header(labels_file, section, output_path): # """ # This function generates a header file containing the ImageNet labels as an array of strings # """ # labels_path = pathlib.Path(labels_file).resolve() # file_path = pathlib.Path(f"{output_path}/labels.h").resolve() # # with open(labels_path) as f: # labels = f.readlines() # # with open(file_path, "w") as header_file: # header_file.write(f'char* labels[] __attribute__((section("{section}"), aligned(16))) = {{') # # for _, label in enumerate(labels): # header_file.write(f'"{label.rstrip()}",') # # header_file.write("};\n") # # # if __name__ == "__main__": # create_labels_header(sys.argv[1], "ethosu_scratch", "./include") # # Run the script from the command line: # # .. code-block:: bash # # python convert_labels.py ################################################################################ # Writing the demo application # ---------------------------- # # The following C application will run a single inference of the MobileNet v1 # model on the image that we downloaded and converted to an array of integers # previously. Since the model was compiled with a target of "ethos-u ...", # operators supported by the Ethos(TM)-U55 NPU will be offloaded for acceleration. # Once the application is built and run, our test image should be correctly # classied as a "tabby" and the result should be displayed on the console. # This file should be placed in ``./src`` # # .. code-block:: c # :caption: demo.c # :name: demo.c # # #include # #include # # #include "ethosu_mod.h" # #include "uart.h" # # // Header files generated by convert_image.py and convert_labels.py # #include "inputs.h" # #include "labels.h" # #include "outputs.h" # # int abs(int v) { return v * ((v > 0) - (v < 0)); } # # int main(int argc, char** argv) { # uart_init(); # printf("Starting Demo\n"); # EthosuInit(); # # printf("Allocating memory\n"); # StackMemoryManager_Init(&app_workspace, g_aot_memory, WORKSPACE_SIZE); # # printf("Running inference\n"); # struct tvmgen_default_outputs outputs = { # .output = output, # }; # struct tvmgen_default_inputs inputs = { # .input = input, # }; # struct ethosu_driver* driver = ethosu_reserve_driver(); # struct tvmgen_default_devices devices = { # .ethos_u = driver, # }; # tvmgen_default_run(&inputs, &outputs, &devices); # ethosu_release_driver(driver); # # // Calculate index of max value # uint8_t max_value = 0; # int32_t max_index = -1; # for (unsigned int i = 0; i < output_len; ++i) { # if (output[i] > max_value) { # max_value = output[i]; # max_index = i; # } # } # printf("The image has been classified as '%s'\n", labels[max_index]); # # // The FVP will shut down when it receives "EXITTHESIM" on the UART # printf("EXITTHESIM\n"); # while (1 == 1) # ; # return 0; # } # # # In addition, you will need these header files from github in your ``./include`` directory: # # `include files `_ ################################################################################ # Creating the linker script # -------------------------- # # We need to create a linker script that will be used when we build our application # in the following section. The linker script tells the linker where everything # should be placed in memory. The corstone300.ld linker script below should be # placed in your working directory. # # An example linker script for the FVP can be found here # `corstone300.ld `_ ################################################################################ # .. note:: # # The code generated by TVM will place the model weights and the Arm(R) # Ethos(TM)-U55 command stream in a section named ``ethosu_scratch``. # For a model the size of MobileNet v1, the weights and command stream will not # fit into the limited SRAM available. For this reason it's important that the # linker script places the ``ethosu_scratch`` section into DRAM (DDR). ################################################################################ # .. note:: # # Before building and running the application, you will need to update your # PATH environment variable to include the path to cmake 3.19.5 and the FVP. # For example if you've installed these in ``/opt/arm`` , then you would do # the following: # # ``export PATH=/opt/arm/FVP_Corstone_SSE-300_Ethos-U55/models/Linux64_GCC-6.4:/opt/arm/cmake/bin:$PATH`` # ################################################################################ # Building the demo application using make # ---------------------------------------- # # We can now build the demo application using make. The Makefile should be placed # in your working directory before running ``make`` on the command line: # # An example Makefile can be found here: # `Makefile `_ ################################################################################ # Running the demo application # ---------------------------- # # Finally, we can run our demo appliction on the Fixed Virtual Platform (FVP), # by using the following command: # # .. code-block:: bash # # FVP_Corstone_SSE-300_Ethos-U55 -C cpu0.CFGDTCMSZ=15 \ # -C cpu0.CFGITCMSZ=15 -C mps3_board.uart0.out_file=\"-\" -C mps3_board.uart0.shutdown_tag=\"EXITTHESIM\" \ # -C mps3_board.visualisation.disable-visualisation=1 -C mps3_board.telnetterminal0.start_telnet=0 \ # -C mps3_board.telnetterminal1.start_telnet=0 -C mps3_board.telnetterminal2.start_telnet=0 -C mps3_board.telnetterminal5.start_telnet=0 \ # -C ethosu.extra_args="--fast" \ # -C ethosu.num_macs=256 ./build/demo # # You should see the following output displayed in your console window: # # .. code-block:: text # # telnetterminal0: Listening for serial connection on port 5000 # telnetterminal1: Listening for serial connection on port 5001 # telnetterminal2: Listening for serial connection on port 5002 # telnetterminal5: Listening for serial connection on port 5003 # # Ethos-U rev dedfa618 --- Jan 12 2021 23:03:55 # (C) COPYRIGHT 2019-2021 Arm Limited # ALL RIGHTS RESERVED # # Starting Demo # ethosu_init. base_address=0x48102000, fast_memory=0x0, fast_memory_size=0, secure=1, privileged=1 # ethosu_register_driver: New NPU driver at address 0x20000de8 is registered. # CMD=0x00000000 # Soft reset NPU # Allocating memory # Running inference # ethosu_find_and_reserve_driver - Driver 0x20000de8 reserved. # ethosu_invoke # CMD=0x00000004 # QCONFIG=0x00000002 # REGIONCFG0=0x00000003 # REGIONCFG1=0x00000003 # REGIONCFG2=0x00000013 # REGIONCFG3=0x00000053 # REGIONCFG4=0x00000153 # REGIONCFG5=0x00000553 # REGIONCFG6=0x00001553 # REGIONCFG7=0x00005553 # AXI_LIMIT0=0x0f1f0000 # AXI_LIMIT1=0x0f1f0000 # AXI_LIMIT2=0x0f1f0000 # AXI_LIMIT3=0x0f1f0000 # ethosu_invoke OPTIMIZER_CONFIG # handle_optimizer_config: # Optimizer release nbr: 0 patch: 1 # Optimizer config cmd_stream_version: 0 macs_per_cc: 8 shram_size: 48 custom_dma: 0 # Optimizer config Ethos-U version: 1.0.6 # Ethos-U config cmd_stream_version: 0 macs_per_cc: 8 shram_size: 48 custom_dma: 0 # Ethos-U version: 1.0.6 # ethosu_invoke NOP # ethosu_invoke NOP # ethosu_invoke NOP # ethosu_invoke COMMAND_STREAM # handle_command_stream: cmd_stream=0x61025be0, cms_length 1181 # QBASE=0x0000000061025be0, QSIZE=4724, base_pointer_offset=0x00000000 # BASEP0=0x0000000061026e60 # BASEP1=0x0000000060002f10 # BASEP2=0x0000000060002f10 # BASEP3=0x0000000061000fb0 # BASEP4=0x0000000060000fb0 # CMD=0x000Interrupt. status=0xffff0022, qread=4724 # CMD=0x00000006 # 00006 # CMD=0x0000000c # ethosu_release_driver - Driver 0x20000de8 released # The image has been classified as 'tabby' # EXITTHESIM # Info: /OSCI/SystemC: Simulation stopped by user. # # You should see near the end of the output that the image has been correctly # classified as 'tabby'.