2026-01-28 06:16:04 +00:00

407 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!---
Contributor: Nan Zhou, Ryan Han, Yao Zou, Yunhan Ma
Manager: Kidd Su, Jenna Wu
Author of this file: Nan Zhou
-->
## Introduction
The sub-project Dynasty is an deep learning inference engine. It supports [floating point](floating_point/README.md)
inference and [dynamic fixed point](dynamic_fixed_point/README.md) inference.
**Please read the corresponding part in this file before you do anything or ask any questions.**
Though we are at a start-up and need to code fast. It is your responsibility to write good comments and clean codes.
**Read the [Google C++ Coding Styles](https://google.github.io/styleguide/cppguide.html) and
[Doxygen Manual](http://www.doxygen.nl/manual/docblocks.html)**.
## Before You Start
If you are a consumer of Dynasty, you are supposed to only use the released version in the deployment in the [CI/CD
pipelines](http://192.168.200.1:8088/TC/kneron_piano/pipelines) of the [release branch](http://192.168.200.1:8088/TC/kneron_piano/tree/release)
or the [dev branch](http://192.168.200.1:8088/TC/kneron_piano/tree/dev). The release may be less updated than dev but more stable.
If you are a developer of Dynasty, you are supposed to build the whole project from source files.
### Prerequisites
Dependencies are needed ONLY IF you are a developer of either CUDA or CUDNN; [CUDA 10.1.243](https://developer.nvidia.com/cuda-10.1-download-archive-update2) is required.
To install CUDA is painful, I encourage you to use Nvidia's docker. See [this README](test/docker/cuda/README.md) for details.
To install CUDNN, follow [this guide](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html). Or refer to
[the CI Dockerfile](test/docker/cuda/Dockerfile).
## Features
1. Use builder, pimpl, visitor, chain of responsibility, template method patterns; the inferencer interfaces are clean and easy to extend to CUDA, OpenCL, MKL, etc;
2. The whole project will be compiled to a single static library;
3. The project has a flexible CMakeList and is able to be compiled in any platform; it will compile a dummy version of inferencer which is not possible in the current platform, for example, a dummy CUDAInferencer on desktops without GPU.
4. Have a user-friendly [integration test framework](test/README.MD); only need to edit a JSON file for more test cases;
5. The CI system is built into an [Docker image (ctrl-click-me)](#CI-docker), which is highly portable.
6. Pure OOD, users only need to work with interfaces;
## Performance
[Available Here](https://docs.google.com/document/d/1S1gU6qLA_JsJ7D2AczqsLz1uboVZOpUVEBxI7rpu998/edit?usp=sharing).
## Profile
I add a smart build
```bash
rm -rf build && mkdir build && cd build && cmake .. -DBUILD_DYNASTY_STATIC_LIB=ON -PROFILE_NODES=ON && make -j8
```
## Supported Operations
See the section `Latest Supported Operations` [here](release/README.md).
## Build
### When you start development
You may want to build the Dynasty, run tests, but not upgrade the library.
```bash
# enter the Piano project root
rm -rf build && mkdir build && cd build && cmake .. -DBUILD_DYNASTY_STATIC_LIB=ON && make -j8
# do not make install!!!
```
__iOS BUILD__ (when you have Mac OS environment and XCode):
```bash
# go to Piano project root
rm -rf build && mkdir build && cd build
cmake -DCMAKE_TOOLCHAIN_FILE=../compiler/toolchains/ios.toolchain.cmake -DENABLE_BITCODE=OFF -DBUILD_IOS=ON ..
make -j
# do not make install !!!
```
### When you finish development
You may want to create a pull request and upgrade the libraries. **DO NOT UPGRADE LIBS WITHOUT GRANT**.
1. Make sure the compiling works
2. Run all the tests and make sure tests passed
3. Increment version numbers of Dynasty in [CMakeLists.txt](./CMakeLists.txt). Refer to [version control doc](https://semver.org/)
4. Update [readme](release/README.md) files; record your changes, e.g. operation_xx [N] -> operation_xx [Y]
5. Push your changes to your remote rep, monitor the pipelines;
6. Create a PR if everything works;
## Usage of the Inferencer Binary
After building Dynasty as what instructed above,
```bash
$ ./build/dynasty/run_inferencer
Kneron's Neural Network Inferencer
Usage:
Inferencer [OPTION...]
-i, --input arg input config json file
-e, --encrypt use encrypted model or not
-t, --type arg inferencer type; AVAILABLE CHOICES: CPU, CUDA, SNPE, CUDNN, MKL, MSFT
-h, --help optional, print help
-d, --device arg optional, gpu device number
-o, --output arg optional, path_to_folder to save the outputs; CREATE THE
FOLDER BEFORE CALLING THE BINARY
```
### `--input`
The input file now is a [Json file](conf/input_config_example.json). You should specify the input data-path names and corresponding file stored vectors, which guarantees the input order.
```json
{
"model_path": "/path/to/example.onnx", # or "/path/to/example.onnx.bie"
"model_input_txts": [
{
"data_vector": "/path/to/input_1_h_w_c.txt",
"operation_name": "input_1_o0"
},
{
"data_vector": "/path/to/input_2_h_w_c.txt",
"operation_name": "input_2_o0"
}
]
}
```
### `--encrypt`
The options are `true` or `false`. If `true`, the input models should be `bie` files; if `false`, the input models should be `onnx` files.
### `--type`
1. CPU: Kneron's old CPU codes; support almost any operation but without accelerations or optimizations;
2. CUDA: Kneron's CUDA codes; reasonable performance;
3. SNPE: Qualcomm's Snapdragon Neural Processing Engine on Qualcomm devices
4. CUDNN: Kneron's GPU codes using CUDNN; good performance; the operations CUDNN does not support will fall back to CUDA codes;
5. MSFT: Kneron's CPU codes using ONNXRuntime; good performance; has constraints in operations; see error messages there are problems with some operations;
6. MKL: Kneron's CPU codes using MKL-DNN; good performance; the operations MKL-DNN does not support will fall back to old CPU codes;
### `--device`
Necessary for CUDA, CUDNN, MKL inferencers. Other inferencers will ignore this option.
### `--output`
If the output directory is specified, all outputs will be stored in that directory. The name of each file is the name of each output operation.
## Usage of Libraries
To get an instance of a specific type of inferencer, you should use the builder of a specific implementation.
The order of builder matters.
1. `WithDeviceID(uint i)`: valid for CUDNN, CUDA, MKL inferencers;
2. `WithGraphOptimization(uint i)`: 0 or 1; 0 means no optimization; 1 means do some optimizations to the graph;
3. `WithONNXModel(string s)`: path to the ONNX model;
4. `Build()`: build the instance;
For example, to build a CPU Inferencer;
```c++
#include <memory>
#include "AbstractInferencer.h"
#include "PianoInferencer.h"
#include "CPUInferencer.h"
using std::unique_ptr;
int main() {
using dynasty::inferencer::cpu::Inferencer;
auto inferencer =
Inferencer<float>::GetBuilder()->WithGraphOptimization(1)->WithONNXModel("res/example_prelu.origin.hdf5.onnx")->Build();
inferencer->Inference("res/input_config.json");
}
```
The instance has several interfaces to do inference.
```c++
/**
* \param preprocess_input: [{operation_node_name, 1d_vector}]
* \brief interface need to be implemented, pack output data path names and their float vectors then return
* \return name_value_pair: {operation node name: corresponding float vector}
*/
std::unordered_map<std::string, std::vector<T>> Inference(
std::unordered_map<std::string, std::vector<T>> const &preprocess_input, bool only_output_layers = true);
/**
* \param preprocess_input: [{operation_node_name, path_to_1d_vector}]
* \brief interface to inference from operation_name and txt pairs
* \return name_value_pair: {operation node name: corresponding float vector}
*/
virtual std::unordered_map<std::string, std::vector<T>> Inference(
std::unordered_map<std::string, std::string> const &preprocess_input, bool only_output_layers = true);
/**
* \param preprocess_input_config: a json file specify the config
* {
"model_input_txts": [
{
"data_vector": "/path/to/input_1_h_w_c.txt",
"operation_name": "input_1_o0"
},
{
"data_vector": "/path/to/input_2_h_w_c.txt",
"operation_name": "input_2_o0"
}
]
}
only_output_layers: if true, will only return results of output operations,
otherwise will return results of all operations
* \brief interface to inference from a config file
* \return name_value_pair: {operation node name: corresponding float vector}
*/
std::unordered_map<std::string, std::vector<T>> Inference(std::string const &preprocess_input_config,
bool only_output_layers = true);
```
See the corresponding headers for more details.
## Test
After building Dynasty as what instructed above, you need to download the testing models, which is in the
server (@10.200.210.221:/home/nanzhou/piano_new_models) mounted by NFS. Follow the commands below to get models,
```bash
sudo apt install nfs-common
sudo vim /etc/fstab
# add the following line
# 10.200.210.221:/home/nanzhou/piano_new_models /mnt/nfs-dynasty nfs ro,soft,intr,noatime,x-gvfs-show
sudo mkdir /mnt/nfs-dynasty
sudo mount -a -v
```
Now you can access the servers model directory in your file manager.
Then run the following commands for different inferencers. Note that the model paths in `test_config_cuda.json` are by default
directed to `/mnt/nfs-dynasty`.
```bash
$ ./build/dynasty/test/inferencer_integration_tests --CPU dynasty/test/conf/test_config_cpu.json
$ ./build/dynasty/test/inferencer_integration_tests -e --CPU dynasty/test/conf/test_config_cpu_bie.json # -e means use bie files
$ ./build/dynasty/test/inferencer_integration_tests --CUDA dynasty/test/conf/test_config_cuda.json
$ ./build/dynasty/test/inferencer_integration_tests --CUDNN dynasty/test/conf/test_config_cudnn.json
$ ./build/dynasty/test/inferencer_integration_tests --MSFT dynasty/test/conf/test_config_msft.json
$ ./build/dynasty/test/inferencer_integration_tests -e --MSFT dynasty/test/conf/test_config_msft_enc.json
$ ./build/dynasty/test/inferencer_integration_tests --MKL dynasty/test/conf/test_config_mkl.json
$ ./build/dynasty/test/inferencer_integration_tests --SNPE dynasty/test/conf/test_config_snpe.json
```
If you want to change the test case. See the instructions [here](test/README.md).
## CI & Docker
<a name="CI-docker"></a>
Testing should be fully automatic and the environment should be as clean as possible. Setting the CI system on a physical machine is not clean at all since it is very hard to do version control for dependencies on shared machines. As a result,
we build a docker image that has all the necessary dependencies for the CI system.
Since the `--gpu` option is still [under development](https://gitlab.com/gitlab-org/gitlab-runner/issues/4585),
we do not use the `docker executor` of Gitlab runner. Instead,
we install a GitLab-runner executor inside the image and use the `shell executor` inside a running container. It is not perfect since it is difficult to scale. However, in a start-up with around 30 engineers, scaling is not a big concern.
Another issue is regarding where to store the files. We decide to use a read-only NFS volume. The NFS server runs
on the physical machine. The advantages are that now states are kept out of images, and we have a good consistency since only the
server keeps the states.
As a conclusion, we converged into using docker with NFS volume to build CI. Follow the instructions below to build a fresh CI system.
### Physical Machine Requirement
Find a server, which satisfies
1. GNU/Linux x86_64 with kernel version > 3.10
2. NVIDIA GPU with Architecture > Fermi (2.1)
3. NVIDIA drivers ~= 361.93 (untested on older versions)
### Install Docker
Please install the Docker of version >= 19.03. The fastest way to install docker is
```bash
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
```
After which, get the `nvidia-container-toolkit`,
```bash
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```
### Build or Pull the Image
If the latest image is already in DockerHub, you only need to pull it,
```bash
sudo docker login -u nanzhoukneron
# the image name may vary
sudo docker pull nanzhoukneron/kneron_ci:nvcc_cmake_valgrind_gitlabrunner_sdk-1.0
```
Otherwise build and push it. See and modify the [bash codes](dynasty/test/docker/cuda/build.sh) if necessary.
```bash
sh dynasty/test/docker/cuda/build.sh 1.0
```
### Enable a NFS Server
Start an NFS server following the commands below,
```bash
sudo apt install nfs-kernel-server
sudo vim /etc/exports
# add the following line
# /home/nanzhou/piano_dynasty_models 10.200.210.0/24(ro,sync,root_squash,subtree_check)
sudo exportfs -ra
sudo chmod 777 /home/nanzhou/piano_dynasty_models
sudo chmod 777 /home/nanzhou/piano_dynasty_models/*
sudo chmod 666 /home/nanzhou/piano_dynasty_models/*/*
```
### Create an NFS Volume
```bash
# change the server address if necessary
sudo docker volume create --opt type=nfs --opt o=addr=10.200.210.221,ro --opt device=:/home/nanzhou/piano_dynasty_models dynasty-models-volume
```
### Start a Container Attached with the NFS volume
```bash
### Create a directory to store the gitlab-runner configuration
sudo mkdir cuda-runner-dir && cd cuda-runner-dir
### Start a container, change the image tag if necessary
### Please be aware of the gpu that is assigned to the docker container
### If written "--gpus all" means assigning all gpus available to the container, in which cudnn inferencer
### will crash if chosen gpu device other than 0 (Reasons Unknown). Therefore, please assign only 1 GPU to
### container and also relieve pressure on the V100 devoted for training only.
sudo docker run -it \
-v dynasty-models-volume:/home/zhoun14/models \
-v ${PWD}:/etc/gitlab-runner/ \
--name piano-cuda \
--network host \
--gpus device=0 \
nanzhoukneron/kneron_ci:nvcc_cmake_valgrind_gitlabrunner_sdk-1.0 bash
### Register a gitlab runner inside the container
root@compute01:/home/zhoun14# gitlab-runner register
Runtime platform arch=amd64 os=linux pid=16 revision=05161b14 version=12.4.1
Running in system-mode.
Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com/):
http://192.168.200.1:8088/
Please enter the gitlab-ci token for this runner:
fLkY2cT78Wm2Q3G2D8DP
Please enter the gitlab-ci description for this runner:
[compute01]: nvidia stateless runner for CI
Please enter the gitlab-ci tags for this runner (comma separated):
piano-nvidia-runner
Registering runner... succeeded runner=fLkY2cT7
Please enter the executor: docker-ssh+machine, docker-ssh, ssh, docker+machine, shell, virtualbox, kubernetes, custom, docker, parallels:
shell
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!
root@compute01:/home/zhoun14# gitlab-runner run
Runtime platform arch=amd64 os=linux pid=39 revision=05161b14 version=12.4.1
Starting multi-runner from /etc/gitlab-runner/config.toml ... builds=0
Running in system-mode.
Configuration loaded builds=0
Locking configuration file builds=0 file=/etc/gitlab-runner/config.toml pid=39
listen_address not defined, metrics & debug endpoints disabled builds=0
[session_server].listen_address not defined, session endpoints disabled builds=0
```
Now, a single-node CI system has run successfully. It is sufficient for small projects. You can edit the file `config.toml` in `cuda-runner-dir` dir
for higher parallelism.
## Design Patterns
We highly recommend you read the following diagrams.
1. Interfaces and Implementations of Inferencers
![](design/inferencer_design.png)
2. An example: Implementations of CPUInferencer
![](design/inferencerImpl_design.png)
3. Other Utils and Namespaces
![](design/util_design.png)
### Template Method
Typical Usage:
1. The pure virtual functions `Inference`, `GraphBasedInference` and their overloads;
2. The functions `Initialize` and `CleanUp` of inferencer defined in `BaseInferencerImpl`;
### Builder
Each Inferencer implementation has a builder. We don't implement a builder in the abstract classes. Since we want that if the user
only uses CPUInferencer, the compiled executables will not contain objects of other inferencers.
### Chain of responsibility
Typical Usage:
1. Handler used in `Initialize` and `CleanUp`.
Chain of responsibility helps us easily achieve:
1. operations not supported by CUDNN inferencer will fall back into CUDA inferencer;
2. operations not supported by MKL inferencer will fall back into CPU inferencer;
### PImpl
PImpl is used to hide headers for all inferencers which extends [BaseInferencerImpl](floating_point/include/common/BaseInferencerImpl.h).
## Development
In order to add a new inferencer, please implement the [Inferencer Interface](include/inferencer/AbstractInferencer.h) with corresponding
builder. If the inferencer utilizes Piano's graph, we highly recommend you extends the class [BaseInferencerImpl](floating_point/include/common/BaseInferencerImpl.h).
## Coding Style
We strictly follow the [Google Coding style](https://google.github.io/styleguide/cppguide.html).