As tech companies race to deliver on-device AI, we are seeing a growing body of research and techniques for creating small language models (SLMs) that can run on resource-constrained devices. The latest models, created by a research team at Nvidia, leverage recent advances in pruning and distillation to create Llama-3.1-Minitron 4B, a compressed version of […]