Research & Innovation

November 2025 Research Paper

75%

Token Reduction

80k → 18k tokens/page

53%

Latency Reduction

210s → 98s/doc

79.4%

Exact Match

up from 72.1%

1.2%

Hallucination Rate

down from 8.6%

Dual-Head Projection

Lightweight 3-layer MLPs (<5M params) aligning text and vision embeddings into a shared 512-d space for cross-modal similarity search.

Neighbor-Aware Sampling

Novel triplet loss sampling strategy that draws hard negatives from adjacent page-count groups (g±1), improving retrieval precision on structurally similar documents.

Production-Ready

Trains in <3 hours on consumer hardware (Apple M4). FAISS retrieval completes in <3ms per query, making it viable for real-time production pipelines.

Read Full Analysis

Every research project starts with a production problem. The goal isn't to publish — it's to build systems that work at scale while pushing the boundaries of what's possible.

2025 Research Project

A framework for transferring knowledge from full-scale BERT teacher to a compact 6-layer student model. The dual-phase approach combines pre-training distillation on WikiText-103 (KL divergence + cross-entropy loss) with task-specific fine-tuning on GLUE SST-2 and SQuAD benchmarks. Achieves competitive performance while significantly reducing model size and computational requirements.

50%

Model Size Reduction

12 → 6 layers

65

Pre-training Epochs

WikiText-103

2

Downstream Benchmarks

SST-2 + SQuAD

DDP+AMP

Training Infrastructure

Distributed + mixed precision

Dual-Phase Distillation

KL divergence for soft target alignment during pre-training, then task-specific fine-tuning. Separates general knowledge transfer from task adaptation.

Scalable Training Pipeline

PyTorch DistributedDataParallel with Automatic Mixed Precision. Checkpoint-based resumption for fault-tolerant training over 65+ epochs.

Open-Source & Reproducible

Full training pipeline, evaluation scripts, and convergence plots available. From pre-training through fine-tuning to benchmark comparison.

Knowledge DistillationModel CompressionBERTTransformersPyTorch DDPMixed Precision

View on GitHub

Enterprise GenAI Platform

"How do you build few-shot retrieval for heterogeneous table extraction at enterprise scale?"

Innovation

Custom embed+CNN model for cross-modal retrieval, FAISS indexing, token-budgeted prompts

Impact

Enterprise-grade GenAI with guardrails, PII safety, evaluation gates

Architecture LeadRAGLoRAvLLMFAISS

Document Intelligence Platform

"Can CV-based layout understanding improve document extraction accuracy?"

Innovation

Semantic merging + entity logic for PDF parsing, CV-based layout detection

Impact

2x throughput, +37% extraction accuracy

NLP ArchitectDocument AIFLAN-T5

RAG Knowledge System

"How do you enable fast knowledge retrieval from hardware manuals?"

Innovation

VectorDB-backed retrieval with LLM summarization, ROUGE/BLEU evaluation

Impact

Faster troubleshooting and knowledge reuse across engineering teams

RAGSummarizationVectorDBFLAN-T5

Video Analytics at Scale

"How do you reduce false alarms in real-time video analytics across 1000s of streams?"

Innovation

Multi-model pipeline on edge devices (Jetson Nano/TX2), optimized inference with TensorRT

Impact

75% false alarm reduction, 17-20% cost reduction

Edge AIComputer VisionJetsonCUDA

Face Recognition Systems

"How do you improve face recognition accuracy with limited training data?"

Innovation

Feedback loops for continuous improvement, data lifecycle management, experiment tracking

Impact

35% false positive improvement, 15% pipeline quality improvement

Deep LearningFace Recognitionwandb

Big Data Merchant Recommender

"How do you process terabytes of transaction data for personalized recommendations?"

Innovation

SERT (Speed Engagement and Relevance Tool) architecture for merchant recommendation

Impact

Scalable recommendation engine for open card merchants

Big DataHadoopElasticsearch

Cross-Modal Retrieval & Metric Learning

Aligning text and vision embeddings through projection heads and contrastive learning for document-level retrieval tasks.

Model Optimization

LoRA, quantization, pruning, and knowledge distillation techniques for deploying large models efficiently on constrained hardware.

Small Language Models for Edge

Exploring sub-3B parameter models for on-device inference, targeting latency-critical and privacy-sensitive applications.

RAG Architecture & Evaluation

Building robust retrieval-augmented generation pipelines with rigorous evaluation frameworks using ROUGE, BLEU, and exact-match metrics.

Computer Vision & Medical Imaging

Applying vision transformers and CNN architectures to medical document understanding, table detection, and structured data extraction.

Agentic AI & Multi-Agent Systems

Designing autonomous AI agents that reason, plan, and collaborate to solve complex multi-step enterprise workflows.

Interested in Collaboration?

Whether it's a research partnership, architecture review, or building the next production AI system — let's talk.

Get in Touch Connect on LinkedIn

XMRetriever

KDSML

Research-Driven Solutions

Enterprise GenAI Platform

Document Intelligence Platform

RAG Knowledge System

Video Analytics at Scale

Face Recognition Systems

Big Data Merchant Recommender

Current Focus Areas

Interested in Collaboration?