Akshath Mahajan

Akshath Mahajan

MS in Computer Engineering @ NYU

Optimizing BERT Beyond FlashAttention

Accelerated BERT inference by 34% on NVIDIA T4 GPU by fusing the GEMM, bias-add, and GELU into a single kernel.
Reduced training step time by 38% using the fused kernel in back-propagation, speeding up model iteration and lowering projected GPU cloud costs.
Identified bottlenecks consuming ~83% of GPU time through profiler, ensuring maximum return on optimization effort.

Share on

X Facebook LinkedIn Bluesky