Technical presentation - 30 minutes (including q&a)
Running AI workloads on ARM can be challenging due to the need for specialized hardware configurations, runtime optimizations, and dependency management. However, with the rise of tools like RamaLama, krunkit, libkrun, podman-machine, llama.cpp and vllm, developers can now deploy and manage AI models on ARM platforms with ease. In this talk, we’ll showcase how these tools come together to simplify AI development and deployment on ARM systems, focusing on practical workflows for AI environments. RamaLama, an open-source framework, streamlines AI model management by leveraging container technology (podman, docker), providing seamless integration with registries (ollama, hugging face, oci), and supporting AI runtimes optimized for ARM (llama.cpp and vllm). We’ll explore how podman-machine and Vulkan enable performant workloads on ARM GPUs. Through live demonstrations and examples, attendees will see how to: - Set up ARM-based systems for AI workloads using RamaLama. - Leverage containerized runtimes for predictable, repeatable deployment. - Optimize GPU performance using Vulkan and llama.cpp. - Deploy at scale with kubernetes YAML and podman quadlets for edge environments. Join us to discover how this ecosystem makes ARM a first-class citizen in the world of AI workloads, bridging the gap between experimentation and production with tools that prioritize simplicity and performance.
Red Hat Engineer working with CentOS Automotive SIG. Upstream maintainer of inotify-tools, ostree, etc.