Where Curiosity Meets Production Rigor
Token-Efficient Vision-Language Inference on Complex Medical Tables via Cross-Modal Page-Level Few-Shot Retrieval
Every research project starts with a production problem. The goal isn't to publish — it's to build systems that work at scale while pushing the boundaries of what's possible.
Knowledge Distillation for Small Language Models — Compressing BERT-Large into a 6-Layer Student via Dual-Phase Distillation
A framework for transferring knowledge from full-scale BERT teacher to a compact 6-layer student model. The dual-phase approach combines pre-training distillation on WikiText-103 (KL divergence + cross-entropy loss) with task-specific fine-tuning on GLUE SST-2 and SQuAD benchmarks. Achieves competitive performance while significantly reducing model size and computational requirements.
Each project started with a research question and ended with production impact
Whether it's a research partnership, architecture review, or building the next production AI system — let's talk.
Open to architecture consulting, advisory roles, and research collaborations