Vector Processor Simulator

Developed a cycle-accurate simulator for VMIPS-based vector processors with 6 pipelined functional units
Implemented machine learning algorithms using the ISA, including dot products, matrix multiplications, and strided convolution layers
Conducted design space exploration to optimize configurations, resulting in up to 35% faster execution across benchmarks
Introduced novel architecture optimizations, improving performance by 15% (dot product), 20% (matrix multiplication), and 10% (convolution)
Improved memory access efficiency via parallel memory bank access, reducing dot product execution time by 33% and convolution by 27%