Iris Recognition System Development for Enhanced Security and Privacy

About the Client

The client is a tech company focused on security solutions. They specialize in biometrics, developing technology that combines strong data protection with easy, reliable user access. Their systems are designed to adapt to different industries and environments, aiming for high accuracy and the best user experience.

Customer

Confidential

Location

USA

Industry

Technology and Biometric Authentication

Company’s Request

The client needed a high-precision iris recognition system for fast, reliable biometric authentication. The system had to work in real-time, accurately isolating the iris from other eye parts, even in varied lighting conditions and eye positions. It needed to operate efficiently on devices with limited resources without sacrificing performance. The client required a secure encoding process to produce a unique, standardized iris code for each user for reliable recognition and data privacy.

Technology Set

Computer Vision and Deep Learning Models:
MobileNetSSD	Used for initial eye detection due to its lightweight architecture, which enables efficient inference on limited-resource devices like NVIDIA Jetson.
YOLOv8	A more complex model that provides higher accuracy for eye detection, especially useful in varied lighting and positional scenarios.
Transfer Learning:
Jetson-Inference Toolkit by NVIDIA	Utilized for transfer learning and training the MobileNetSSD model on custom eye-detection datasets.
Ultralytics YOLOv8 Toolkit	Used for training and fine-tuning YOLOv8 on specific eye-detection tasks, providing flexibility in model training with high RAM usage optimization.
Data Annotation and Datasets:
OpenImages V7	A large-scale, publicly available dataset providing annotated images of eyes, enabling model training without the need to create a custom dataset.
Worldcoin Iris Segmentation Model	Pre-trained model for iris segmentation, trained on a combination of databases such as ND-IRIS-0405, CASIA-Iris, LivDet2013-Iris, LivDet2015-Iris, and others, yielding a comprehensive dataset of over 18,000 infrared (IR) images.
Programming Languages
C++	The core language for inference operations, including data pre-processing, post-processing, and running the model in real-time. Chosen for its efficiency, especially within Jetson environments, C++ handles the main computational tasks and integrates seamlessly with CUDA for optimized performance.
Python	Primarily used for dataset handling, model training, and preparation tasks. Python is valuable for its flexibility and extensive library support, allowing us to streamline data preparation and model training workflows before deploying the models in C++ for inference.
Image Processing and Segmentation Techniques:
Sobel Filter	Applied for edge detection within the iris, crucial for defining the iris boundaries by measuring gradients in horizontal and vertical directions.
Hough Circle Transform	Essential for detecting circular shapes to identify the inner and outer boundaries of the iris.
Rubber Sheet Transformation	Converts the circular iris area into a rectangular format, allowing for normalized encoding.
Z-Score Normalization	Implemented to standardize pixel values, enhancing consistency in feature extraction by scaling each pixel based on the mean and standard deviation of its distribution. This step further refines accuracy in the segmentation process.
Model Optimization and Inference Acceleration
TensorRT	NVIDIA's high-performance deep learning inference library, used to convert models (such as MobileNetSSD) to optimized formats for real-time inference on the Jetson Xavier NX.
ONNX (Open Neural Network Exchange)	Format conversion tool enabling cross-platform model interoperability, allowing trained models to be exported and optimized with TensorRT.
Training Optimization Techniques
CUDA (Compute Unified Device Architecture)	NVIDIA's parallel computing platform enables the acceleration of data processing, model training, and inference on compatible hardware.
Swap Memory Allocation	Used to expand the available memory on Jetson Xavier NX, allowing for efficient training of memory-intensive models like YOLOv8.
Feature Extraction for Iris Encoding
2D Gabor Filters	Applied to the segmented iris region for extracting unique texture patterns, essential for generating an individual’s binary iris code used in biometric comparisons.

The solution was developed through three core stages: eye detection, iris segmentation, and encoding. Below is a breakdown of each stage, including challenges faced and methods employed to overcome them.

Stage 1: Human Eye Detection

At the outset, our team used pre-trained MobileNetSSD and YOLOv8 models to facilitate efficient and accurate eye detection. However, adapting these models to the specific requirements of iris recognition presented several challenges.

Here’s a step-by-step account of the process:

Initial Approach with Transfer Learning
Keeping in mind limited computational power available, our team applied transfer learning to adapt the pre-trained MobileNetSSD and YOLOv8 models. We fine-tuned both models on our target dataset to improve eye detection accuracy. Using OpenImages V7 as the primary dataset, which includes over 20,000 images with human eye annotations, we optimized the models specifically for this application.

This approach allowed us to achieve high accuracy without requiring extensive new data or long training times.

Dataset Preparation and Model Training
Using NVIDIA’s jetson-inference for MobileNetSSD and Ultralytics tools for YOLOv8, we prepared to train our models. Since we operated on NVIDIA Jetson Xavier NX—a device with limited memory—we implemented swap memory adjustments to handle the memory-intensive training processes, particularly for YOLOv8. This setup allowed us to proceed with training, although it required additional optimization to achieve consistent detection rates across varied conditions.

Training Configuration for MobileNetSSD

We downloaded the eye dataset using OpenImages and pre-processed it to match the requirements of the MobileNetSSD model.
Each image was resized to 300×300 pixels in RGB format, and pixel values were normalized within a range of [-1; 1].
The model was trained with parameters fine-tuned for the specific demands of detecting human eyes, achieving an optimized performance within the Jetson environment.

Training Configuration for YOLOv8

The YOLOv8 model, more complex than MobileNetSSD, required significant computational power. Despite using swap memory, the training process was lengthy—estimated at around 65-70 hours on our Jetson Xavier NX. To reduce training time, we selectively limited the dataset size and adjusted batch sizes.

Data Processing and Inference
Once trained, both models were configured for efficient inference. During inference:

MobileNetSSD required the input to be in NCHW format with dimensions 300×300, while YOLOv8 processed images resized to 640×640 pixels.
Both models used post-processing to identify bounding boxes for eye locations, comparing confidence values to pre-defined thresholds to refine detections.

Each model demonstrated unique strengths: MobileNetSSD provided quicker results, while YOLOv8 showed greater accuracy at detecting eyes in various orientations and lighting conditions. By integrating both models, we achieved a reliable baseline for iris segmentation, moving smoothly to the next stage.

Stage 2: Iris Segmentation

For accurate iris segmentation, we used a pre-trained model from Worldcoin, built on several large datasets tailored for iris recognition. This provided a strong foundation for our needs, but achieving precise segmentation in real-time required significant fine-tuning, along with careful data pre-processing and post-processing adjustments. Here’s how the segmentation process was handled in detail:

Model Selection and Dataset Preparation
The Worldcoin model was chosen for its accuracy in segmenting irises across a variety of environments. It was trained on a broad set of datasets, including ND-IRIS-0405, CASIA-Iris, and LivDet, totaling over 18,000 infrared (IR) images. These datasets provided robust examples for segmenting irises across different lighting conditions and image qualities, addressing one of the primary challenges in iris recognition.

The dataset was divided into training, validation, and test sets, with specific image dimensions (640×480) required for the model.

Ensuring that each image was pre-processed to meet these specifications was necessary to avoid model misinterpretation during segmentation.

Data Pre-Processing for Enhanced Accuracy
To improve accuracy, we introduced a specific pre-processing step for each input image. Instead of resizing the entire image, we cropped it to focus solely on the eye region, adjusting it to fit the model’s required 640×480-pixel dimensions. Following this, we converted the image format from NHWC to NCHW, as the model processes input in this format. Finally, we scaled each pixel’s value to fall between 0 and 1, applying normalization with mean values (0.485, 0.456, 0.406) and standard deviations (0.229, 0.224, 0.225) to ensure accurate feature extraction.

Inference and Data Post-Processing
When the model ran inference, it generated a tensor with four dimensions that mapped specific regions within the eye: the eyeball, iris, pupil, and eyelashes. Post-processing was essential to clean up and refine these results. We started by resizing the segmented output to match the original image size, keeping it consistent with the input dimensions. Then, we carefully filtered out any pixels flagged as belonging to the pupil or eyelashes, leaving only the iris pixels to avoid overlap and ensure accurate segmentation.

Accurate segmentation at this stage depended heavily on clear boundary definitions, especially to distinguish the iris from surrounding eye regions.

While pre- and post-processing steps were standard, achieving this precision relied on a carefully controlled data acquisition setup. Factors such as camera lens type, lighting conditions, and the distance between the eye and camera were critical in reliably isolating high-quality iris segments. This level of accuracy ensured the data was ready for the next step: encoding.

Stage 3: Iris Encoding and Comparison

The final stage involved encoding the segmented iris region into a standardized format that could be compared across different users or instances for authentication. This process required transforming the circular iris structure into a flat, fixed-size rectangular image, followed by feature extraction for unique encoding.

Iris Normalization
To facilitate uniform encoding, the segmented iris image was normalized using a “rubber sheet” model. This method converted the circular iris into a rectangular format by unwrapping it. The steps included:

Edge Detection: Using the Sobel filter, we detected edges within the iris region. The Sobel filter provided both horizontal and vertical gradients, which were combined to create a clear outline of the iris boundaries.
Defining Inner and Outer Boundaries: By applying the Hough Circle Transform, we identified the inner and outer boundaries of the iris. This algorithm detected circular shapes within the image, allowing us to define the edges of the iris precisely.
Rubber Sheet Transformation: The iris was then “flattened” by mapping its coordinates from polar to Cartesian, with the center of the inner circle as the origin.

Feature Extraction using Gabor Filters
Gabor filters are standard in iris recognition systems due to their ability to capture fine spatial features. These filters identify variations in texture and frequency within the iris, which are essential for creating a unique “iris code” for each individual. Using these features, we generated a binary code representation that could be stored and compared for authentication.

The Gabor filter application involved:

Applying 2D Gabor filters at various orientations and scales to highlight unique patterns in the iris.
Converting the filtered data into a binary code based on pixel intensity, ensuring that each iris had a unique code.

Value Delivered

Consistent High Detection with Minimal Recalibration

Using MobileNetSSD and YOLOv8 models optimized with transfer learning, the system achieved high accuracy across varied lighting and angles, reducing the need for frequent recalibrations. This saved maintenance costs in any environment.

Adaptable Across Different Settings

The detection models were built to work well in indoor and outdoor conditions, handling different lighting levels. This flexibility lets the client deploy the system widely without costly adjustments for each location.

Lower Hardware Costs

By optimizing the system to run efficiently on low-resource devices, we minimized the need for high-powered hardware. This approach kept costs down and made it easier for the client to scale across multiple locations and device types.

Scalability for Diverse Markets

The system’s performance on embedded devices allowed the client to tap into markets that need portable, flexible biometric solutions, positioning them competitively in both traditional and emerging sectors.

Fast, Secure Authentication

The encoding mechanism generated a unique binary code for each iris, enabling quick and private authentication with low computational demands. This resulted in a smooth user experience, supporting high-speed access points where every second counts.

Reduced Operational Costs

Reliable detection, segmentation, and encoding reduced errors and minimized re-authentication needs, cutting down on training and support costs for users.

Stronger Market Position as a Secure Solution Provider

Delivering a solution with high accuracy, speed, and privacy reinforced the client’s reputation as a provider of secure biometric systems. This credibility opened doors to high-security sectors like finance, healthcare, and government.