Description

An AI Asset enables accelerated deployment of deep learning models to resource constrained low power embedded systems (Deep Edge). The provided workflows deliver powerful and easy-to-deploy building blocks for creating complex AI models that can be deployed on cyber-physical systems.

By taking care of many of end-to-end tooling dependencies and providing standardized interfaces, Bonseyes AI Asset enable users to focus on producing optimal solutions while allowing faster feedback during the implementation of end user requirements. The goal is to facilitate easier deployment to the deep edge with the Bonseyes AI Marketplace.

Requirements

Hardware requirements

In order to utilize maximum potential of AI Assets - specially for training - it is required to have NVIDIA Graphic Card (GTX1060 and newer) with CUDA support on x86_64 environments. Nonetheless, AI Assets can also be run using Intel/AMD CPUs.

We also provide support for Nvidia Jetsons devices as well as for platforms with arm64v8 achitectures. The support of these devices allows the user to evaluate any given AI Assets on them and obtain embedded-oriented benchmarks for a faster design process.

Software requirements

The following requirements need to be installed in the platform where the AI Asset will run:

Docker

To install docker, follow the instructions in here.

By default, docker is not accessible to normal users. To allow the current user to access docker, run the following command:

sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Verify that you can run docker commands without sudo:

docker run hello-world

Git and Git LFS

Install git and git LFS by executing:

sudo apt-get install git git-lfs

Git LFS is not active by default. To make sure git lfs is active, run the following command:

git lfs install

The command will print some errors that can be safely ignored.

Due to a bug in Ubuntu 18.04 LTS, the binaries installed by pip are not available by default. To make sure that they are available, run the following command:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> .bashrc

Unfortunately, due to some limitations in Ubuntu 18.04 LTS, it is not possible to ensure that the new user groups are taken into account after a simple logout/login. To complete the setup, it is necessary to restart the machine so that the changes to PATH and groups take effect.

Packages

The remaining packages can be installed by executing the following command:

sudo apt-get install python3 python3-pip python3-wheel python3-setuptools

To be able to benchmark you AI Asset on your HW platform, you would need to install the following packages, based on the target platform:

x86 (CPU)

pip3 install psutil

x86 (Cuda)

pip3 install psutil nvidia-smi nvidia-ml-py3

arm64v8

pip3 install psutil

Nvidia Jetsons

pip3 install psutil jetson-stats

NVIDIA Drivers

If you are working on a x86 platform with a Nvidia GPU, ensure that you have installed the appropriate NVIDIA drivers. On Ubuntu, the easiest way of ensuring that you have the right version of the drivers set up is by installing a version of CUDA, at least as new as the image you intend to use via the official NVIDIA CUDA download page. As an example, if you intend on using CUDA 10.2 you should ensure that you have the correct graphics drivers, as described here.

The following command can be used to verify your system for x86 platforms:

docker run --gpus all nvidia/cuda:10.2-base nvidia-smi

If you are using a Nvidia Jetson device, it will be sufficient to set up the device following the DPE workflow described in DPE.

Nvidia docker

You will also need to install the NVIDIA Container Toolkit to enable GPU device access within Docker containers. Installation instructions can be found here.

Setup

AI Asset CLI

AI Asset CLI is a command line interface allowing end users to interact with AI Assets providing functionalities for variety of tasks such as export, processing (video, image, camera), evaluation etc.

Install the Bonseyes AI Asset CLI on the intended device from remote:

pip3 install git+https://gitlab.com/bonseyes-opensource/aiassets_cli.git

Add user path to system path:

export PATH=$PATH:/home/${USER}/.local/bin

For detailed AI Asset CLI usage, please refer to official documentation

Board setup

For board setup, please, follow first the DPE workflow explained in DPE.

Usage

Currently available AI Assets:

  • 3D Face Landmark detection (68 keypoints)
    • Backbones: mobilenetv1, mobilenetv0.5

    • Input-sizes: 120x120

    • Datasets: aflw, aflw2000-3d

  • Access token
    • Username: gitlab+deploy-token-483452

    • Password: zsEqp4321jiCzWS-TUaG

  • Whole Body Pose estimation (133 keypoints)
    • Backbones: resnet22, shufflenetv2k30, shufflenetv2k16

    • Input-sizes: 128x96, 128x128, 256x256, 384x216, 512x384

    • Datasets: wholebody

  • Access token
    • Username: gitlab+deploy-token-557315

    • Password: AskgZQwcDRRYv3Da7BNB

Currently available platforms and environments:

  • x86_64 machines
    • cpu

    • cuda10.2_tensorrt7.0

    • cuda11.2_tensorrt7.2_rtx3070

    • cuda11.4_tensorrt8.0

  • Nvidia Jetson Devices
    • jetpack4.4

    • jetpack4.6

  • Arm CPUs
    • arm64v8

If you face some issues during the workflow, you can export the DEBUG flag on your terminal to obtain more information about the issue:

export DEBUG=True

Installation

Download and initialize specified demo AI Asset locally:

bonseyes_aiassets_cli init
    --task {3dface_landmarks, whole_body_pose}
    --platform {x86_64, jetson, rpi}
    --environment {cpu,cuda10.2_tensorrt7.0,cuda11.2_tensorrt7.2_rtx3070,cuda11.4_tensorrt8.0,jetpack4.4,jetpack4.6,arm64v8}
    --version {v1.0, v2.0, ...}
    --user gitlab+deploy-token-USERNAME
    --password PASSWORD
    [--camera-id CAMERA_ID]

Check supported options by running:

bonseyes_aiassets_cli init --help

Example:

bonseyes_aiassets_cli init \
    --task whole_body_pose \
    --platform x86_64 \
    --environment cuda10.2_tensorrt7.0 \
    --version v1.0 \
    --camera-id 0 \
    --user <username> \
    --password <password>

Check if the container is running and on what port by executing:

docker ps

If you want to stop running AI Asset

docker kill <task_name>

Switch between AI Assets

Use specific AI Asset locally:

bonseyes_aiassets_cli use --task {3dface_landmarks, whole_body_pose}

Check supported tasks by running:

bonseyes_aiassets_cli use --help

Train

Train network and produce model based on available configuration files:

bonseyes_aiassets_cli train start --config <config_name>

Check for available configs by running:

bonseyes_aiassets_cli train start --help

Example:

bonseyes_aiassets_cli train start --config v1.0_shufflenetv2k30_default_641x641_fp32_config

Check training status by running:

bonseyes_aiassets_cli train status

Stop training process by running:

bonseyes_aiassets_cli train stop

Export

Export pretrained models from PyTorch format to ONNX and/or TensorRT format(s):

usage: bonseyes_aiassets_cli export [-h]
    --export-input-sizes EXPORT_INPUT_SIZES [EXPORT_INPUT_SIZES ...]
    --engine {all, onnxruntime, tensorrt}
    --precisions {fp32, fp16}
    --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
    [--workspace-unit {MB, GB}]
    [--workspace-size WORKSPACE_SIZE]
    [--enable-dla]

Note: When exporting models to TensorRT format on devices with lower RAM size (<4GB) it is recommended to specify lower workspace size in MBs.

Example:

bonseyes_aiassets_cli export \
     --export-input-sizes 120x120 320x320 \
     --engine all \
     --backbone shufflenetv2k30 \
     --precisions fp32 fp16

Optimize

Optimize exported models by performing PTQ (post training quantization)

usage: bonseyes_aiassets_cli optimize [-h]
    --optimize-input-sizes OPTIMIZE_INPUT_SIZES [OPTIMIZE_INPUT_SIZES ...]
    --engine {all, onnxruntime, tensorrt}
    --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
    [--workspace-unit {MB, GB}]
    [--workspace-size WORKSPACE_SIZE]
    [--enable-dla]

Note: When optimizing models for TensorRT format on devices with lower RAM size (<4GB) it is recommended to specify lower workspace size in MBs.

Example:

bonseyes_aiassets_cli optimize \
    --optimize-input-sizes 120x120 320x320 \
    --engine tensorrt \
    --backbone shufflenetv2k30

Process

Note: if you are using a VM in Virtual Box, you can share a camera (or a USB device) by selecting “Devices” > “Webcams” (or USB) and ticking the device you want to share with the VM.

Image

Currently only supported format is .jpg

bonseyes_aiassets_cli demo image
    [--input-size INPUT_SIZE]
    [--engine {pytorch, onnxruntime, tensorrt}]
    [--precision {fp32, fp16, int8}]
    [--device {gpu, cpu}]
    [--cpu-num CPU_NUM]
    --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
    --image-input <image_absolute_path>

    3d face landmark specific:
    [--render {2d_sparse, 2d_dense, 3d, pose, axis}]
    [--thickness THICKNESS]
    [--single-face-track]

Example:

# CPU
bonseyes_aiassets_cli demo image \
    --input-size 320x320 \
    --engine pytorch \
    --precision fp32 \
    --backbone shufflenetv2k30 \
    --device cpu \
    --image-input </path/to/img.jpg>
# GPU
bonseyes_aiassets_cli demo image \
    --input-size 320x320 \
    --engine pytorch \
    --precision fp32 \
    --backbone shufflenetv2k30 \
    --device gpu \
    --image-input </path/to/img.jpg>

Video

Currently only supported format is .mp4

bonseyes_aiassets_cli demo video
    [--input-size INPUT_SIZE]
    [--engine {pytorch, onnxruntime, tensorrt}]
    [--precision {fp32, fp16, int8}]
    [--device {gpu, cpu}]
    [--cpu-num CPU_NUM]
    [--color COLOR]
    [--rotate {90, -90, 180}]
    --video-input <video_absolute_path>
    --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}

    3d face landmark specific:
    [--render {2d_sparse, 2d_dense, 3d, pose, axis}]
    [--thickness THICKNESS]
    [--single-face-track]

Example:

# CPU
bonseyes_aiassets_cli demo video \
    --input-size 320x320 \
    --engine pytorch \
    --precision fp32 \
    --backbone shufflenetv2k30 \
    --device cpu \
    --video-input </path/to/video.mp4>
# GPU
bonseyes_aiassets_cli demo video \
    --input-size 320x320 \
    --engine pytorch \
    --precision fp32 \
    --backbone shufflenetv2k30 \
    --device gpu \
    --video-input </path/to/video.mp4>

Camera

bonseyes_aiassets_cli demo camera
    [--input-size INPUT_SIZE]
    [--engine {pytorch, onnxruntime, tensorrt}]
    [--precision {fp32, fp16, int8}]
    [--device {gpu, cpu}]
    [--cpu-num CPU_NUM]
    [--color COLOR]
    [--rotate {90, -90, 180}]
    --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}

    3d face landmark specific:
    [--render {2d_sparse, 2d_dense, 3d, pose, axis}]
    [--thickness THICKNESS]
    [--single-face-track]

Example:

# CPU
bonseyes_aiassets_cli demo camera \
    --input-size 320x320 \
    --engine pytorch \
    --precision fp32 \
    --backbone shufflenetv2k30 \
    --device cpu \
    --camera-id 0
# GPU
bonseyes_aiassets_cli demo camera \
    --input-size 320x320 \
    --engine pytorch \
    --precision fp32 \
    --backbone shufflenetv2k30 \
    --device gpu \
    --camera-id 0 \

Server

bonseyes_aiassets_cli server start
    [--input-size INPUT_SIZE]
    [--engine {pytorch, onnxruntime, tensorrt}]
    [--precision {fp32, fp16, int8}]
    [--device {gpu, cpu}]
    [--cpu-num CPU_NUM]
    --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}

    3d face landmark specific:
    [--render {2d_sparse, 2d_dense, 3d, pose, axis}]
    [--thickness THICKNESS]
    [--single-face-track]

Example:

# CPU
bonseyes_aiassets_cli server start \
    --input-size 320x320 \
    --engine pytorch \
    --precision fp32 \
    --backbone shufflenetv2k30 \
    --device cpu
# GPU
bonseyes_aiassets_cli server start \
    --input-size 320x320 \
    --engine pytorch \
    --precision fp32 \
    --backbone shufflenetv2k30 \
    --device gpu

You can test if server is running correctly by calling:

curl --request POST --data-binary @/path/to/image.jpg http://localhost:<PORT>/inference

User <PORT> based on the aiasset you want ot use, each time you start the server PORT will be printed out to standard output, you can either save it or check

docker ps

And find out what port is AI Asset container exposing eg.

CONTAINER ID  63bc638d1243
IMAGE         registry.gitlab.com/bonseyes/assets/bonseyes_openpifpaf_wholebody/x86_64:v1.0_cuda10.2_tensorrt7.0
COMMAND       "/usr/local/bin/nvid…"
CREATED       58 minutes ago
STATUS        Up 58 minutes
PORTS         0.0.0.0:59838->59838/tcp, :::59838->59838/tcp
NAMES         whole_body_pose

To stop the server execute:

bonseyes_aiassets_cli server stop

Benchmark

Evaluate exported and pretrained models:

usage: bonseyes_aiassets_cli benchmark [-h]
    --benchmark-input-sizes INPUT_SIZES
    --engine {all, pytorch, onnxruntime, tensorrt}
    --backbone {mobilenetv1, mobilenetv0.5, resnet22, shufflenetv2k30, shufflenetv2k16}
    --device {gpu, cpu}

    3d face landmark specific:
    [--datasets {all, aflw, aflw2000-3d}]

Example:

# CPU
bonseyes_aiassets_cli benchmark \
    --benchmark-input-sizes 120x120 320x320 \
    --device cpu \
    --backbone shufflenetv2k30 \
    --engine pytorch onnxruntime \
    --dataset all
# GPU
bonseyes_aiassets_cli benchmark \
    --benchmark-input-sizes 120x120 320x320 \
    --device gpu \
    --backbone shufflenetv2k30 \
    --engine pytorch onnxruntime tensorrt \
    --dataset all