AI Engineer at Practical DevSecOps

View All Jobs

Download File

AI Engineer — Model Training & AI Exploration

About the Project

We are building a comprehensive Quran Recitation Learning Platform — a production system that helps users practice and improve their Quran recitation using real-time AI-powered speech recognition, Tajweed rule analysis, and personalized audio feedback. The platform consists of a React Native mobile app, a FastAPI backend, and multiple GPU-accelerated microservices.

Our AI pipeline currently processes thousands of audio recordings, combining ASR (Automatic Speech Recognition), Tajweed analysis, pronunciation validation, and TTS (Text-to-Speech) feedback generation — all running as containerized gRPC microservices with CUDA acceleration.

Role Overview

We are looking for an AI Engineer to own and advance the model training pipeline and explore new AI approaches to improve our Quran recitation system. You will work with production ASR models and Tajweed analysis — improving accuracy, reducing latency, and expanding capabilities.

This is a hands-on role focused on fine-tuning, evaluation, improve scoring and AI R&D — not just API integration. You will be the primary person responsible for making AI models and scoring better.

What You'll Do

Scoring Improvement

Use method for improve tajweed and word error calculation
Create script for harness test

Model Training & Fine-Tuning

Fine-tune ASR models for Quranic Arabic using NVIDIA NeMo (FastConformer Hybrid RNNT/CTC architecture)
Train and optimize custom models for Tajweed rule detection (currently Whisper-based)
Train pronunciation validation models using Wav2Vec2 for harakat (diacritics) error detection
Build and maintain training data pipelines — data collection, cleaning, augmentation, and quality control
Develop evaluation harnesses with automated metrics (WER, CER, Tajweed accuracy, speaker similarity)
Manage experiment tracking (MLflow / Weights & Biases) and model versioning

AI Exploration & R&D

Research and prototype new architectures for Quranic Arabic ASR (conformer variants, whisper fine-tuning, custom tokenizers)
Explore on-device / edge deployment of lightweight ASR models for mobile inference
Experiment with LLM-based approaches for contextual recitation feedback and error explanation
Benchmark alternative models (e.g., Whisper large-v3, SeamlessM4T, custom conformer) against current pipeline
Research voice activity detection (VAD) and audio segmentation optimized for Quranic recitation patterns

Current System You'll Improve

Our AI pipeline today:

Mobile App (React Native)
↓ Audio (WAV 16kHz)
Backend (FastAPI + Socket.IO)
↓ gRPC
├── QuranASRNemo (port 50051) -- NeMo FastConformer, streaming + offline
├── QuranASRTajweed (port 50053) -- Whisper-based Tajweed rule detection
├── QuranASRWav2Vec2 (port 50054) -- Raw pronunciation validation
└── QuranFeedback (port 50052) -- Coqui XTTS v2 TTS with voice cloning ## Disabled for now
↓
Weighted Scoring → Accuracy + Tajweed Violations + Pronunciation Errors ## This need to be improve
↓
Audio Feedback (TTS) + Text Feedback → Mobile App ## Disabled for now

Known areas for improvement you'd tackle:

Hardcoded confidence scores (currently fixed at 0.9 regardless of actual model output)
GPU inference serialization bottleneck (single lock, no batching)
No model versioning or experiment tracking infrastructure
Scoring thresholds lack empirical calibration (current heuristic: 45/25/15/15 split)
TTS voice cloning path bug (hardcoded speaker reference)
No training data pipeline or data quality tooling exists yet

Notes

Model training and fine tune is not primary focus for now, but nice to do if wanted