Johannes Traxler, MSc., 2021

# System on Module Deep-Learning-Inference board for Object Detection Made in Germany

0

Safe AI platform for High Performance / Deep-Learning inference on the Edge

YES GmbH | Im Wirtschaftspark 4 | A-3494 Gedersdorf | office@eyyes.com | www.eyyes.com

89777777

### **EYYES - We make machines see**

Technology leader for Safe Artificial Intelligence

EYYES stands for:

- ... make our world safer with artificial intelligent
- ... experts in computer vision, machine learning and embedded systems
- ... camera and sensing technology
- ... soft- and hardware development made in Europe
- ... technological, tailor-made solutions at the highest level through lead, competence and innovation
- ... strong relationship to leading Europe machine learning research organizations







#### Sites



#### **KREMS AN DER DONAU / Austria**

#### Headquarter

- General Management
- Sales & Marketing
- Project Management & Execution
- Assembling / Test Field
- Procurement
- Research & Development



#### AACHEN / Germany

#### **Competence Center Software Engineering**

- Development Software
- Development Algorithmic
- Development Artificial Intelligence & Machine Learning (Deep Learning)
- Research & Development



#### FREITAL (DRESDEN) / Germany

#### Competence Center Embedded Systems

- Development Embedded Systems
- Service & Repair
- Electronics Laboratory & Certifications
- Safety Engineering
- Research & Development





3

## Smart solutions for safe mobility

with Deep learning





# EYYES Deep Learning Technology

0

Evolution and development

EYYES GmbH | Im Wirtschaftspark 4 | A-3494 Gedersdorf | office@eyyes.com | www.eyyes.com

11111111111111111

C.G

## **Initial Situation in 2016**

#### Deep Learning needed for several project opportunities

- 2 powers of ten more calculation requirement
- On the edge of the physical limitation
- GPU require very high electrical power consumption and produce lots of waste heat
- No real alternative avaiable



Soft- and Hardware R&D Projects realized

#### Projects realizes with public and private funding:

- AIRVS "Artificial Intelligence Rear View System" together with SCCH Hagenberg
  - YOLO based network tests
  - Research on LSTM based tracking algorithms
  - Development of CNN software optimization algorithms
- **RailEye 3.0** in cooperation with TU-Dresden
  - Development of a SoM for 2 sensor realtime applikations
  - FPGA implementation of H.264 core and first deep-learning processing















#### Soft- and Hardware R&D Projects realized

#### How to solve the challenge of maximizing the performance of a CNN chip?

- Use quantisation to reduce the required memory bandwidth
- Decrease the required operations by second
- Improve the training algorithm
- Use explainability algorithm to monitor the functionality of the neural network
- Improve the parallel processing



#### Soft- and Hardware R&D Projects realized

**Challenge 1:** Use quantisation to reduce the required memory bandwidth

- EYYES developed a new approach to quantize the CNN parameter (patent pending)
- Methode to determine the meansquare error (MSE)



TFL Ours = Tensorflow Lite, <u>https://www.tensorflow.org/lite/</u>

= EYYES pptimization Toolchain, https://www.eyyes.com/technology/deep-learning-optimizer/



Soft- and Hardware R&D Projects realized

**Challenge 1:** Use quantisation to reduce the required memory bandwidth

Results comared with Tensorflow lite (TFL<sup>\*</sup>) 



|         | average             | minimum             | maximum             | σ                   |
|---------|---------------------|---------------------|---------------------|---------------------|
| MSETFL  | $3.4 \cdot 10^{-2}$ | $5.4 \cdot 10^{-3}$ | $1.4 \cdot 10^{-1}$ | $1.9\cdot 10^{-2}$  |
| MSEours | $3.6 \cdot 10^{-3}$ | $1.5 \cdot 10^{-3}$ | $1.3 \cdot 10^{-2}$ | $1.5 \cdot 10^{-3}$ |



we

see

Soft- and Hardware R&D Projects realized

Challenge 2: Decrease the required operations by second

- Reduce the required operations using
  - Pruning
  - Cutting
  - Specific additional reductions

| mAP deviation: | Parameter reduction: | Operation reduction: |  |  |
|----------------|----------------------|----------------------|--|--|
| $\Delta_{mAP}$ | $R_{N_{ m param}}$   | $R_{N_{ops}}$        |  |  |
| 0.5%           | 3.3%                 | 2.7%                 |  |  |
| 1.0%           | 3.6%                 | 2.9%                 |  |  |
| 2.5%           | 5.4%                 | 4.0%                 |  |  |
| 5.0%           | 12.4%                | 7.9%                 |  |  |



#### Soft- and Hardware R&D Projects realized

Challenge 3: Improve the training algorithm

- EYYES developed unique training mechanism
  - Autoannotation
  - Quality measurement (MaP, IoU, ...)
  - Simulation of the network
  - Perturbation methods to challenge the DNN
  - Extend the variety of objects and noise using "Prototypes" and GANs
  - Explainability due to stepwise analysis methods







Soft- and Hardware R&D Projects realized

Challenge 4: Improve the parallel processing





Soft- and Hardware R&D Projects realized

Challenge 4: Improve the parallel processing





Single storage operation



Soft- and Hardware R&D Projects realized

Challenge 4: Improve the parallel processing





Soft- and Hardware R&D Projects realized

#### Challenge 4: Improve the parallel processing

| Show disabled ports                                                        | Component Name ad  | da_wrapper_0 |           |  |
|----------------------------------------------------------------------------|--------------------|--------------|-----------|--|
| + SAXIS<br>+ SAXIS_BAS<br>+ SAXIS_WEIGHTS                                  | Bit Depth          | 16           |           |  |
| + S,AXI<br>M,AXIS,ACIX<br>• M,AXIS,ARESETN                                 | C S Axi Addr Width | 6            | $\otimes$ |  |
| s_ADIS_ACLK<br>s_ADIS_ARESETN<br>s_ADIS_WEIGHTS_ACLK                       | C S Axi Data Width | 32           |           |  |
| G S,AUS, WEIGHTS, ARESETN<br>S,AUIS, BIAS, ACLK<br>G S,AUIS, BIAS, ARESETN | Leakyness Shift    | 4            |           |  |
| active_convolutions(7.0)<br>in g_width(8.0)<br>in g_height(8.0)            | Max Img Height     | 320          |           |  |
| S_ANIJACUX Haady<br>S_ANIJARESETN done                                     | Max Img Width      | 320          | $\otimes$ |  |
| og_acus ulate<br>output_erable<br>satist                                   | Num Channels       | 8            | $\otimes$ |  |
| fitersize(40)<br>en_activation                                             | Num Filters        | 8            | $\otimes$ |  |
| reitu, sr., kentyreitu<br>kaa kynessi(30)<br>reitu, carp, top(150)         | Scale Factor       | 8            | $\otimes$ |  |
| relu.csp.lot(150)<br>op.up.lsen pling<br>op.pooling                        | Sim                | "0"          | 0         |  |
| pooling_custon_padding<br>pooling_stride[1:0]<br>ck_100                    |                    |              |           |  |



OK



Soft- and Hardware R&D Projects realized

#### **Challenge 4: Improve the parallel processing**

- Maximum parallelism
- Gerneralized processing unit
  - o Kernel W 1-16, H 1-16
  - o Strides W 1-2, H 1-2
  - o Padding 0
  - Maxpooling
  - Fully connected
  - Input Size arbitrary
  - Convolution and depthwise convolution
  - o Up to 32 Cores
  - > 10.000 operations per clock



LPU Terra Operations per Second compared between the LPU, TPU and GPU using similar frequencies



# **REALTIME INTERFACE 3**

0

0

0

571 CG High Performance SoM for Deep Learning on the edge

#### High Performance SoM for Deep Learning on the edge







#### High Performance SoM for Deep Learning on the edge





Sensorik 2



High Performance SoM for Deep Learning on the edge

The perfect Deep-Learning platform:

- Plug&Play device together with the EYYES camera sensors
- Power Supply via Power over Ethernet or direct power supply (low power)
- Process and control up to two independent Camera sensors via FDP LINK III
- Process up to two different digital H.264 videostreams
- Receive the object list directly with open standard protocol (ROS, ADTF, ...)
- Easy to configure using Webinterface (easy to use)
- Process in realtime the sensor data with deep-learning with 20 TOPs
  - Preinstalled EYYESNET with 7/21 object classes
  - Specialization and replacement of the DNN via Update



6\_\_\_\_\_



AI NET MOD



### **EYYES Technology Evolution**

FPGA Driven Development and Outlook





Evolution from an RTI1 to RTI3 and Outlook



#### High Performance SoM for Deep Learning on the edge





Examplevideo from Testdrive in Vienna