CUDA-Optimized Large Vision-Language Models
- Achieved 390× CPU speedup for Vision-Language Models (VLMs) with negligible loss in task-specific performance
- Reduced model size by 95%+ using parallelization and network compression techniques
- Achieved inference speed of 20–30 FPS on CPU