Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences
Published at
CHI
| Honolulu, HI
2024
Abstract
On-device machine learning (ML) promises to improve the privacy, responsiveness,
and proliferation of new, intelligent user experiences by moving ML computation
onto everyday personal devices. However, today's large ML models must be
drastically compressed to run efficiently on-device, a hurtle that requires
deep, yet currently niche expertise. To engage the broader human-centered ML
community in on-device ML experiences, we present the results from an interview
study with 30 experts at Apple that specialize in producing efficient models. We
compile tacit knowledge that experts have developed through practical experience
with model compression across different hardware platforms. Our findings offer
pragmatic considerations missing from prior work, covering the design process,
trade-offs, and technical strategies that go into creating efficient models.
Finally, we distill design recommendations for tooling to help ease the
difficulty of this work and bring on-device ML into to more widespread practice.