Research & Innovation

Research & Innovation

Where Curiosity Meets Production Rigor

XMRetriever

Token-Efficient Vision-Language Inference on Complex Medical Tables via Cross-Modal Page-Level Few-Shot Retrieval

November 2025 Research Paper
PDF Pages (Input) Text Path OCR / Text Extract OpenAI Embed 1536-d Text Projection Head 512-d Vision Path Page Image Render ViT Embed 768-d Vision Projection Head 512-d Concatenate 1024-d FAISS Index Top-K Retrieval Token-Budgeted Prompt GPT-4o Inference Structured Output XMRetriever Pipeline — Cross-Modal Page-Level Few-Shot Retrieval
75%
Token Reduction
80k → 18k tokens/page
53%
Latency Reduction
210s → 98s/doc
79.4%
Exact Match
up from 72.1%
1.2%
Hallucination Rate
down from 8.6%
Dual-Head Projection
Lightweight 3-layer MLPs (<5M params) aligning text and vision embeddings into a shared 512-d space for cross-modal similarity search.
Neighbor-Aware Sampling
Novel triplet loss sampling strategy that draws hard negatives from adjacent page-count groups (g±1), improving retrieval precision on structurally similar documents.
Production-Ready
Trains in <3 hours on consumer hardware (Apple M4). FAISS retrieval completes in <3ms per query, making it viable for real-time production pipelines.
Read Full Analysis

Every research project starts with a production problem. The goal isn't to publish — it's to build systems that work at scale while pushing the boundaries of what's possible.

KDSML

Knowledge Distillation for Small Language Models — Compressing BERT-Large into a 6-Layer Student via Dual-Phase Distillation

2025 Research Project

A framework for transferring knowledge from full-scale BERT teacher to a compact 6-layer student model. The dual-phase approach combines pre-training distillation on WikiText-103 (KL divergence + cross-entropy loss) with task-specific fine-tuning on GLUE SST-2 and SQuAD benchmarks. Achieves competitive performance while significantly reducing model size and computational requirements.

BERT Teacher 12 Layers BERT-Large 340M params Soft Targets + Hidden States Knowledge Flow Knowledge Flow Phase 1: Pre-training WikiText-103 Corpus 65 epochs | DDP + AMP Phase 2: Fine-tuning SST-2 + SQuAD Benchmarks Task-specific adaptation Loss: KL Divergence + Cross-Entropy Soft target alignment + hard label matching Student Model 6 Layers Compact BERT ~50% smaller Inference-optimized Training Convergence Stable loss over 65 epochs Classification Accuracy Competitive on SST-2 + SQuAD KDSML Pipeline — Dual-Phase Knowledge Distillation from BERT Teacher to 6-Layer Student
50%
Model Size Reduction
12 → 6 layers
65
Pre-training Epochs
WikiText-103
2
Downstream Benchmarks
SST-2 + SQuAD
DDP+AMP
Training Infrastructure
Distributed + mixed precision
Dual-Phase Distillation
KL divergence for soft target alignment during pre-training, then task-specific fine-tuning. Separates general knowledge transfer from task adaptation.
Scalable Training Pipeline
PyTorch DistributedDataParallel with Automatic Mixed Precision. Checkpoint-based resumption for fault-tolerant training over 65+ epochs.
Open-Source & Reproducible
Full training pipeline, evaluation scripts, and convergence plots available. From pre-training through fine-tuning to benchmark comparison.
Knowledge DistillationModel CompressionBERTTransformersPyTorch DDPMixed Precision
View on GitHub

Research-Driven Solutions

Each project started with a research question and ended with production impact

Cigna 2023 — Present

Enterprise GenAI Platform

"How do you build few-shot retrieval for heterogeneous table extraction at enterprise scale?"
Custom embed+CNN model for cross-modal retrieval, FAISS indexing, token-budgeted prompts
Enterprise-grade GenAI with guardrails, PII safety, evaluation gates
Architecture LeadRAGLoRAvLLMFAISS
Hexad / Volkswagen 2022 — 2023

Document Intelligence Platform

"Can CV-based layout understanding improve document extraction accuracy?"
Semantic merging + entity logic for PDF parsing, CV-based layout detection
2x throughput, +37% extraction accuracy
NLP ArchitectDocument AIFLAN-T5
Hexad / Volkswagen 2022 — 2023

RAG Knowledge System

"How do you enable fast knowledge retrieval from hardware manuals?"
VectorDB-backed retrieval with LLM summarization, ROUGE/BLEU evaluation
Faster troubleshooting and knowledge reuse across engineering teams
RAGSummarizationVectorDBFLAN-T5
Trigyn 2020 — 2022

Video Analytics at Scale

"How do you reduce false alarms in real-time video analytics across 1000s of streams?"
Multi-model pipeline on edge devices (Jetson Nano/TX2), optimized inference with TensorRT
75% false alarm reduction, 17-20% cost reduction
Edge AIComputer VisionJetsonCUDA
Inkers 2016 — 2020

Face Recognition Systems

"How do you improve face recognition accuracy with limited training data?"
Feedback loops for continuous improvement, data lifecycle management, experiment tracking
35% false positive improvement, 15% pipeline quality improvement
Deep LearningFace Recognitionwandb
Impetus / American Express 2012 — 2016

Big Data Merchant Recommender

"How do you process terabytes of transaction data for personalized recommendations?"
SERT (Speed Engagement and Relevance Tool) architecture for merchant recommendation
Scalable recommendation engine for open card merchants
Big DataHadoopElasticsearch

Current Focus Areas

Cross-Modal Retrieval & Metric Learning
Aligning text and vision embeddings through projection heads and contrastive learning for document-level retrieval tasks.
Model Optimization
LoRA, quantization, pruning, and knowledge distillation techniques for deploying large models efficiently on constrained hardware.
Small Language Models for Edge
Exploring sub-3B parameter models for on-device inference, targeting latency-critical and privacy-sensitive applications.
RAG Architecture & Evaluation
Building robust retrieval-augmented generation pipelines with rigorous evaluation frameworks using ROUGE, BLEU, and exact-match metrics.
Computer Vision & Medical Imaging
Applying vision transformers and CNN architectures to medical document understanding, table detection, and structured data extraction.
Agentic AI & Multi-Agent Systems
Designing autonomous AI agents that reason, plan, and collaborate to solve complex multi-step enterprise workflows.

Interested in Collaboration?

Whether it's a research partnership, architecture review, or building the next production AI system — let's talk.

Get in Touch

Open to architecture consulting, advisory roles, and research collaborations

Send a Message

Powered by Formspree — your message goes directly to my inbox