Technical presentation - 30 minutes (including q&a)
Running AI workloads on ARM can be challenging due to the need for specialized hardware configurations, runtime optimizations, and dependency management. However, with the rise of tools like RamaLama, krunkit, libkrun, podman-machine, llama.cpp and vllm, developers can now deploy and manage AI models on ARM platforms with ease. In this talk, we’ll showcase how these tools come together to simplify AI development and deployment on ARM systems, focusing on practical workflows for AI environments. RamaLama, an open-source framework, streamlines AI model management by leveraging container technology (podman, docker), providing seamless integration with registries (ollama, hugging face, oci), and supporting AI runtimes optimized for ARM (llama.cpp and vllm). We’ll explore how podman-machine and Vulkan enable performant workloads on ARM GPUs. Through live demonstrations and examples, attendees will see how to: - Set up ARM-based systems for AI workloads using RamaLama. - Leverage containerized runtimes for predictable, repeatable deployment. - Optimize GPU performance using Vulkan and llama.cpp. - Deploy at scale with kubernetes YAML and podman quadlets for edge environments. Join us to discover how this ecosystem makes ARM a first-class citizen in the world of AI workloads, bridging the gap between experimentation and production with tools that prioritize simplicity and performance.
Technical presentation - 30 minutes (including q&a)
libkrun is a dynamic library providing Virtualization-based process isolation capabilities or, in fewer words, a VMM in library form. Written in Rust and designed for minimal boot time and small footprint, it has evolved since its initial inception as a companion for crun (the OCI runtime used by podman) to cover multiple, different use cases: - Enabling containers on macOS to do AI inference by exposing a paravirtualized GPU to the guest. - Running x86_64 games in Asahi Linux (Aarch64) using DRM native context, guest to host shared memory and pipewire redirection through vsock. - Launching Confidential Computing Workloads leveraging on technologies such as SNP, TDX and ARM CCA. - Extending podman+crun to be able to seamlessly run containers inside microVMs in AutoSD (Automotive Stream Distribution). In this talk I'll briefly present libkrun's main features and characteristics, and then we'll explore in depth how the different use cases have benefited from them.
Working in the Automotive Team at Red Hat with a focus on Virtualization. Lead developer of libkrun, maintainer of the "microvm" machine type in QEMU, co-developer of krunkit and muvm, and trying to put a microVM everywhere.