
AI-Driven Embedded Systems Showcase
About the Client
A technology company specializing in advanced microcontrollers and fusion processors with integrated AI/ML capabilities. It focuses on delivering scalable, secure, and power-efficient solutions for modern embedded systems.
Company’s Request
Technology Set
Alif Ensemble DevKit (Gen 2) | The development board used for this project, an Ensemble E7 SoC with dual Cortex-M55 cores and an Ethos-U55 NPU for AI acceleration. |
Zephyr RTOS | For activation various LVGL examples |
LVGL | A lightweight graphics library for rendering the user interface. It enabled real-time visualization of AI inferences, including displaying classification results. |
Vela Compiler | A tool provided by Arm for optimizing TensorFlow Lite (TFLite) models. |
TensorFlow Lite (TFLite) | A machine learning framework used to run pre-trained AI models for applications like keyword spotting, image classification, and face detection. |
MobileNet V2 | A lightweight deep learning model used for image classification tasks. It was optimized and deployed to recognize objects in real-time on the Ensemble E7. |
MicroNet | An efficient neural network model used for real-time speech processing and anomaly detection in audio signals. |
Wav2Letter | (Automatic Speech Recognition - ASR) – A deep learning model for speech-to-text conversion and real-time transcription capabilities. |
YOLO Fastest | A neural network-based noise suppression model that filtered out background noise in audio applications, improving speech recognition accuracy. |
MT9M114 | The primary camera module integrated into the project for real-time image classification and face detection. Its driver was customized for Zephyr RTOS. |
ILI9806E Display | A touchscreen display used to visualize AI inference results and provide an intuitive user interface for model selection and execution. |
Arm Clang / GNU GCC Compiler | Compilers used to build the AI applications for execution on embedded hardware. |
CMake | Tool for configuring and compiling the project, used to define build directories, target hardware, and application-specific configurations. |
Alif Security Toolkit (SETOOLS) | A specialized tool for flashing binaries onto the Alif Ensemble DevKit. |
OSPI Flash Memory | External flash storage used for loading and running large AI models. |
UART Debugging (Tera Term & J-Link Debugger) | Tools used for real-time debugging of system logs and AI inference outputs. |
GitHub (Alif ML Embedded Evaluation Kit Repository) | The repository containing pre-built AI models, tools, and scripts for setting up and testing the Ensemble E7 platform. |
Python 3.10+ (Development Environment) | Required for running TensorFlow model optimizations, Vela compilation, and automation scripts. |
Platform Setup
The project began with setting up the Alif Ensemble DevKit (Gen 2), which features dual Cortex-M55 cores and an Ethos-U55 NPU. This development board was chosen for its built-in AI acceleration capabilities and low-power architecture.
The first step was establishing power and communication connections between the Alif AI/ML AppKit and a development computer via micro-USB. This connection supplied power to the board and created a virtual COM port (/dev/ttyACM0 or /dev/ttyACM1) for serial communication. The virtual COM port was used for debugging, system monitoring, and data exchange, allowing us to receive system logs, error reports, and AI inference results in real-time. To ensure proper communication, we configured UART ports, enabling reliable data transfer between the development environment and the embedded system.
Once basic communication was established, we integrated external peripherals, including the MT9M114 MIPI CSI camera module and the ILI9806E MIPI DSI display (480×800 resolution). However, both required custom configurations to function properly with Zephyr RTOS.
The MT9M114 camera module did not have out-of-the-box support for Zephyr RTOS, requiring custom driver adaptations. The camera communicated with the processor via I2C, and we had to configure the I2C interface correctly to allow the camera to send image data to the AI processing pipeline. Additionally, we modified the device tree configuration to properly register the camera with the system. Debugging involved analyzing chip ID recognition logs, adjusting power settings, and fine-tuning initialization sequences.
The ILI9806E display module also required custom device tree modifications. The display needed proper GPIO mappings for backlight control and reset sequences to function correctly. To verify the display output, we enabled test patterns to check for correct pixel synchronization and interface timing. After making these adjustments, the display successfully rendered real-time AI inference results.
Development Environment
The Alif Security Toolkit (SETOOLS) was installed to securely flash compiled binaries onto the Alif Ensemble E7, so AI applications could be written to MRAyj M and external OSPI flash storage. Since some AI models were too large to fit entirely in internal SRAM, we configured memory partitions to allow models to run efficiently from external flash.
We installed the ARM Clang compiler and the GNU GCC toolchain for compilation. Since Zephyr RTOS 3.6.0 has compiler-specific optimizations, we manually adjusted toolchain configuration files for compatibility with Cortex-M55 cores.
Our team used CMake as a build system to organize and compile different AI applications separately. Since the project involved multiple AI tasks – Keyword Spotting (KWS), Image Classification, and Speech Recognition – we needed separate build directories for each use case. This prevented compilation conflicts and unnecessary rebuilds.
To optimize AI models for execution on the Ethos-U55 NPU, we installed TensorFlow Lite (TFLite) and the Vela Compiler. Vela was responsible for transforming AI models to match the hardware’s memory constraints and computational capabilities for faster inference.
During setup, we encountered environment configuration issues where the system did not recognize some development tools due to missing environment variables. Our team resolved this by creating custom scripts to correctly set system paths for both Linux and macOS.
AI Model Integration
Our team used pre-trained AI models, including:
- MobileNet V2 for image classification
- MicroNet for keyword spotting, anomaly detection, and visual wake word detection
- Wav2Letter for automatic speech recognition (ASR)
- YOLO Fastest for real-time object detection
- RNNoise for noise suppression
Since these models were originally designed for larger computing environments, they required further optimization before they could run efficiently on an embedded AI platform.
Using the Vela Compiler, our team optimized these TFLite models for execution on the Ethos-U55 NPU. This included adjusting model layers, reducing memory overhead, and optimizing execution scheduling. We tested different configurations, including H128 and H256 settings, for the best balance of speed and accuracy.
Due to memory limitations, we had to store larger models in external OSPI flash, modifying linker scripts to enable memory-mapped execution to avoid memory bottlenecks.
Building & Deploying Applications
Each AI task required a separate application binary to be compiled and flashed onto the Alif Ensemble E7. We used CMake to define specific build parameters for each AI application, targeting either M55-HE (High-Efficiency Core) or M55-HP (High-Performance Core), depending on workload requirements.
After compiling, we generated .bin firmware files and used SETOOLS to flash them onto the Alif Ensemble E7 to execute each application directly on the hardware.
To avoid performance issues, our team implemented multi-core task management policies, where AI workloads were distributed efficiently across the M55-HE and M55-HP cores.
Debugging & Optimization
Our team validated execution using UART logging, allowing us to monitor real-time AI inference results. For multi-core debugging, we configured J-Link with Segger Studio, enabling us to track execution across both M55-HE and M55-HP cores.
Also, to improve performance, we preloaded key AI model layers into MRAM, reducing latency. Dynamic voltage scaling was also implemented to minimize power consumption.
Delivering an AI Demo
The final system successfully demonstrated real-time AI inference for keyword spotting, speech recognition, image classification, and object detection.
To make it possible, our team implemented an LVGL-based GUI, allowing users to select and run AI models. The system was fully standalone, executing all AI tasks directly on the Alif Ensemble E7 without external hardware or cloud processing.
With such scalable architecture, the platform was designed for future AI model integration, allowing for continuous expansion and development.