## Introduction The sub-project Dynasty is an deep learning inference engine. It supports [floating point](floating_point/README.md) inference and [dynamic fixed point](dynamic_fixed_point/README.md) inference. **Please read the corresponding part in this file before you do anything or ask any questions.** Though we are at a start-up and need to code fast. It is your responsibility to write good comments and clean codes. **Read the [Google C++ Coding Styles](https://google.github.io/styleguide/cppguide.html) and [Doxygen Manual](http://www.doxygen.nl/manual/docblocks.html)**. ## Before You Start If you are a consumer of Dynasty, you are supposed to only use the released version in the deployment in the [CI/CD pipelines](http://192.168.200.1:8088/TC/kneron_piano/pipelines) of the [release branch](http://192.168.200.1:8088/TC/kneron_piano/tree/release) or the [dev branch](http://192.168.200.1:8088/TC/kneron_piano/tree/dev). The release may be less updated than dev but more stable. If you are a developer of Dynasty, you are supposed to build the whole project from source files. ### Prerequisites Dependencies are needed ONLY IF you are a developer of either CUDA or CUDNN; [CUDA 10.1.243](https://developer.nvidia.com/cuda-10.1-download-archive-update2) is required. To install CUDA is painful, I encourage you to use Nvidia's docker. See [this README](test/docker/cuda/README.md) for details. To install CUDNN, follow [this guide](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html). Or refer to [the CI Dockerfile](test/docker/cuda/Dockerfile). ## Features 1. Use builder, pimpl, visitor, chain of responsibility, template method patterns; the inferencer interfaces are clean and easy to extend to CUDA, OpenCL, MKL, etc; 2. The whole project will be compiled to a single static library; 3. The project has a flexible CMakeList and is able to be compiled in any platform; it will compile a dummy version of inferencer which is not possible in the current platform, for example, a dummy CUDAInferencer on desktops without GPU. 4. Have a user-friendly [integration test framework](test/README.MD); only need to edit a JSON file for more test cases; 5. The CI system is built into an [Docker image (ctrl-click-me)](#CI-docker), which is highly portable. 6. Pure OOD, users only need to work with interfaces; ## Performance [Available Here](https://docs.google.com/document/d/1S1gU6qLA_JsJ7D2AczqsLz1uboVZOpUVEBxI7rpu998/edit?usp=sharing). ## Profile I add a smart build ```bash rm -rf build && mkdir build && cd build && cmake .. -DBUILD_DYNASTY_STATIC_LIB=ON -PROFILE_NODES=ON && make -j8 ``` ## Supported Operations See the section `Latest Supported Operations` [here](release/README.md). ## Build ### When you start development You may want to build the Dynasty, run tests, but not upgrade the library. ```bash # enter the Piano project root rm -rf build && mkdir build && cd build && cmake .. -DBUILD_DYNASTY_STATIC_LIB=ON && make -j8 # do not make install!!! ``` __iOS BUILD__ (when you have Mac OS environment and XCode): ```bash # go to Piano project root rm -rf build && mkdir build && cd build cmake -DCMAKE_TOOLCHAIN_FILE=../compiler/toolchains/ios.toolchain.cmake -DENABLE_BITCODE=OFF -DBUILD_IOS=ON .. make -j # do not make install !!! ``` ### When you finish development You may want to create a pull request and upgrade the libraries. **DO NOT UPGRADE LIBS WITHOUT GRANT**. 1. Make sure the compiling works 2. Run all the tests and make sure tests passed 3. Increment version numbers of Dynasty in [CMakeLists.txt](./CMakeLists.txt). Refer to [version control doc](https://semver.org/) 4. Update [readme](release/README.md) files; record your changes, e.g. operation_xx [N] -> operation_xx [Y] 5. Push your changes to your remote rep, monitor the pipelines; 6. Create a PR if everything works; ## Usage of the Inferencer Binary After building Dynasty as what instructed above, ```bash $ ./build/dynasty/run_inferencer Kneron's Neural Network Inferencer Usage: Inferencer [OPTION...] -i, --input arg input config json file -e, --encrypt use encrypted model or not -t, --type arg inferencer type; AVAILABLE CHOICES: CPU, CUDA, SNPE, CUDNN, MKL, MSFT -h, --help optional, print help -d, --device arg optional, gpu device number -o, --output arg optional, path_to_folder to save the outputs; CREATE THE FOLDER BEFORE CALLING THE BINARY ``` ### `--input` The input file now is a [Json file](conf/input_config_example.json). You should specify the input data-path names and corresponding file stored vectors, which guarantees the input order. ```json { "model_path": "/path/to/example.onnx", # or "/path/to/example.onnx.bie" "model_input_txts": [ { "data_vector": "/path/to/input_1_h_w_c.txt", "operation_name": "input_1_o0" }, { "data_vector": "/path/to/input_2_h_w_c.txt", "operation_name": "input_2_o0" } ] } ``` ### `--encrypt` The options are `true` or `false`. If `true`, the input models should be `bie` files; if `false`, the input models should be `onnx` files. ### `--type` 1. CPU: Kneron's old CPU codes; support almost any operation but without accelerations or optimizations; 2. CUDA: Kneron's CUDA codes; reasonable performance; 3. SNPE: Qualcomm's Snapdragon Neural Processing Engine on Qualcomm devices 4. CUDNN: Kneron's GPU codes using CUDNN; good performance; the operations CUDNN does not support will fall back to CUDA codes; 5. MSFT: Kneron's CPU codes using ONNXRuntime; good performance; has constraints in operations; see error messages there are problems with some operations; 6. MKL: Kneron's CPU codes using MKL-DNN; good performance; the operations MKL-DNN does not support will fall back to old CPU codes; ### `--device` Necessary for CUDA, CUDNN, MKL inferencers. Other inferencers will ignore this option. ### `--output` If the output directory is specified, all outputs will be stored in that directory. The name of each file is the name of each output operation. ## Usage of Libraries To get an instance of a specific type of inferencer, you should use the builder of a specific implementation. The order of builder matters. 1. `WithDeviceID(uint i)`: valid for CUDNN, CUDA, MKL inferencers; 2. `WithGraphOptimization(uint i)`: 0 or 1; 0 means no optimization; 1 means do some optimizations to the graph; 3. `WithONNXModel(string s)`: path to the ONNX model; 4. `Build()`: build the instance; For example, to build a CPU Inferencer; ```c++ #include #include "AbstractInferencer.h" #include "PianoInferencer.h" #include "CPUInferencer.h" using std::unique_ptr; int main() { using dynasty::inferencer::cpu::Inferencer; auto inferencer = Inferencer::GetBuilder()->WithGraphOptimization(1)->WithONNXModel("res/example_prelu.origin.hdf5.onnx")->Build(); inferencer->Inference("res/input_config.json"); } ``` The instance has several interfaces to do inference. ```c++ /** * \param preprocess_input: [{operation_node_name, 1d_vector}] * \brief interface need to be implemented, pack output data path names and their float vectors then return * \return name_value_pair: {operation node name: corresponding float vector} */ std::unordered_map> Inference( std::unordered_map> const &preprocess_input, bool only_output_layers = true); /** * \param preprocess_input: [{operation_node_name, path_to_1d_vector}] * \brief interface to inference from operation_name and txt pairs * \return name_value_pair: {operation node name: corresponding float vector} */ virtual std::unordered_map> Inference( std::unordered_map const &preprocess_input, bool only_output_layers = true); /** * \param preprocess_input_config: a json file specify the config * { "model_input_txts": [ { "data_vector": "/path/to/input_1_h_w_c.txt", "operation_name": "input_1_o0" }, { "data_vector": "/path/to/input_2_h_w_c.txt", "operation_name": "input_2_o0" } ] } only_output_layers: if true, will only return results of output operations, otherwise will return results of all operations * \brief interface to inference from a config file * \return name_value_pair: {operation node name: corresponding float vector} */ std::unordered_map> Inference(std::string const &preprocess_input_config, bool only_output_layers = true); ``` See the corresponding headers for more details. ## Test After building Dynasty as what instructed above, you need to download the testing models, which is in the server (@10.200.210.221:/home/nanzhou/piano_new_models) mounted by NFS. Follow the commands below to get models, ```bash sudo apt install nfs-common sudo vim /etc/fstab # add the following line # 10.200.210.221:/home/nanzhou/piano_new_models /mnt/nfs-dynasty nfs ro,soft,intr,noatime,x-gvfs-show sudo mkdir /mnt/nfs-dynasty sudo mount -a -v ``` Now you can access the server’s model directory in your file manager. Then run the following commands for different inferencers. Note that the model paths in `test_config_cuda.json` are by default directed to `/mnt/nfs-dynasty`. ```bash $ ./build/dynasty/test/inferencer_integration_tests --CPU dynasty/test/conf/test_config_cpu.json $ ./build/dynasty/test/inferencer_integration_tests -e --CPU dynasty/test/conf/test_config_cpu_bie.json # -e means use bie files $ ./build/dynasty/test/inferencer_integration_tests --CUDA dynasty/test/conf/test_config_cuda.json $ ./build/dynasty/test/inferencer_integration_tests --CUDNN dynasty/test/conf/test_config_cudnn.json $ ./build/dynasty/test/inferencer_integration_tests --MSFT dynasty/test/conf/test_config_msft.json $ ./build/dynasty/test/inferencer_integration_tests -e --MSFT dynasty/test/conf/test_config_msft_enc.json $ ./build/dynasty/test/inferencer_integration_tests --MKL dynasty/test/conf/test_config_mkl.json $ ./build/dynasty/test/inferencer_integration_tests --SNPE dynasty/test/conf/test_config_snpe.json ``` If you want to change the test case. See the instructions [here](test/README.md). ## CI & Docker Testing should be fully automatic and the environment should be as clean as possible. Setting the CI system on a physical machine is not clean at all since it is very hard to do version control for dependencies on shared machines. As a result, we build a docker image that has all the necessary dependencies for the CI system. Since the `--gpu` option is still [under development](https://gitlab.com/gitlab-org/gitlab-runner/issues/4585), we do not use the `docker executor` of Gitlab runner. Instead, we install a GitLab-runner executor inside the image and use the `shell executor` inside a running container. It is not perfect since it is difficult to scale. However, in a start-up with around 30 engineers, scaling is not a big concern. Another issue is regarding where to store the files. We decide to use a read-only NFS volume. The NFS server runs on the physical machine. The advantages are that now states are kept out of images, and we have a good consistency since only the server keeps the states. As a conclusion, we converged into using docker with NFS volume to build CI. Follow the instructions below to build a fresh CI system. ### Physical Machine Requirement Find a server, which satisfies 1. GNU/Linux x86_64 with kernel version > 3.10 2. NVIDIA GPU with Architecture > Fermi (2.1) 3. NVIDIA drivers ~= 361.93 (untested on older versions) ### Install Docker Please install the Docker of version >= 19.03. The fastest way to install docker is ```bash curl -fsSL https://get.docker.com -o get-docker.sh sh get-docker.sh ``` After which, get the `nvidia-container-toolkit`, ```bash curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` ### Build or Pull the Image If the latest image is already in DockerHub, you only need to pull it, ```bash sudo docker login -u nanzhoukneron # the image name may vary sudo docker pull nanzhoukneron/kneron_ci:nvcc_cmake_valgrind_gitlabrunner_sdk-1.0 ``` Otherwise build and push it. See and modify the [bash codes](dynasty/test/docker/cuda/build.sh) if necessary. ```bash sh dynasty/test/docker/cuda/build.sh 1.0 ``` ### Enable a NFS Server Start an NFS server following the commands below, ```bash sudo apt install nfs-kernel-server sudo vim /etc/exports # add the following line # /home/nanzhou/piano_dynasty_models 10.200.210.0/24(ro,sync,root_squash,subtree_check) sudo exportfs -ra sudo chmod 777 /home/nanzhou/piano_dynasty_models sudo chmod 777 /home/nanzhou/piano_dynasty_models/* sudo chmod 666 /home/nanzhou/piano_dynasty_models/*/* ``` ### Create an NFS Volume ```bash # change the server address if necessary sudo docker volume create --opt type=nfs --opt o=addr=10.200.210.221,ro --opt device=:/home/nanzhou/piano_dynasty_models dynasty-models-volume ``` ### Start a Container Attached with the NFS volume ```bash ### Create a directory to store the gitlab-runner configuration sudo mkdir cuda-runner-dir && cd cuda-runner-dir ### Start a container, change the image tag if necessary ### Please be aware of the gpu that is assigned to the docker container ### If written "--gpus all" means assigning all gpus available to the container, in which cudnn inferencer ### will crash if chosen gpu device other than 0 (Reasons Unknown). Therefore, please assign only 1 GPU to ### container and also relieve pressure on the V100 devoted for training only. sudo docker run -it \ -v dynasty-models-volume:/home/zhoun14/models \ -v ${PWD}:/etc/gitlab-runner/ \ --name piano-cuda \ --network host \ --gpus device=0 \ nanzhoukneron/kneron_ci:nvcc_cmake_valgrind_gitlabrunner_sdk-1.0 bash ### Register a gitlab runner inside the container root@compute01:/home/zhoun14# gitlab-runner register Runtime platform arch=amd64 os=linux pid=16 revision=05161b14 version=12.4.1 Running in system-mode. Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com/): http://192.168.200.1:8088/ Please enter the gitlab-ci token for this runner: fLkY2cT78Wm2Q3G2D8DP Please enter the gitlab-ci description for this runner: [compute01]: nvidia stateless runner for CI Please enter the gitlab-ci tags for this runner (comma separated): piano-nvidia-runner Registering runner... succeeded runner=fLkY2cT7 Please enter the executor: docker-ssh+machine, docker-ssh, ssh, docker+machine, shell, virtualbox, kubernetes, custom, docker, parallels: shell Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded! root@compute01:/home/zhoun14# gitlab-runner run Runtime platform arch=amd64 os=linux pid=39 revision=05161b14 version=12.4.1 Starting multi-runner from /etc/gitlab-runner/config.toml ... builds=0 Running in system-mode. Configuration loaded builds=0 Locking configuration file builds=0 file=/etc/gitlab-runner/config.toml pid=39 listen_address not defined, metrics & debug endpoints disabled builds=0 [session_server].listen_address not defined, session endpoints disabled builds=0 ``` Now, a single-node CI system has run successfully. It is sufficient for small projects. You can edit the file `config.toml` in `cuda-runner-dir` dir for higher parallelism. ## Design Patterns We highly recommend you read the following diagrams. 1. Interfaces and Implementations of Inferencers ![](design/inferencer_design.png) 2. An example: Implementations of CPUInferencer ![](design/inferencerImpl_design.png) 3. Other Utils and Namespaces ![](design/util_design.png) ### Template Method Typical Usage: 1. The pure virtual functions `Inference`, `GraphBasedInference` and their overloads; 2. The functions `Initialize` and `CleanUp` of inferencer defined in `BaseInferencerImpl`; ### Builder Each Inferencer implementation has a builder. We don't implement a builder in the abstract classes. Since we want that if the user only uses CPUInferencer, the compiled executables will not contain objects of other inferencers. ### Chain of responsibility Typical Usage: 1. Handler used in `Initialize` and `CleanUp`. Chain of responsibility helps us easily achieve: 1. operations not supported by CUDNN inferencer will fall back into CUDA inferencer; 2. operations not supported by MKL inferencer will fall back into CPU inferencer; ### PImpl PImpl is used to hide headers for all inferencers which extends [BaseInferencerImpl](floating_point/include/common/BaseInferencerImpl.h). ## Development In order to add a new inferencer, please implement the [Inferencer Interface](include/inferencer/AbstractInferencer.h) with corresponding builder. If the inferencer utilizes Piano's graph, we highly recommend you extends the class [BaseInferencerImpl](floating_point/include/common/BaseInferencerImpl.h). ## Coding Style We strictly follow the [Google Coding style](https://google.github.io/styleguide/cppguide.html).