TL;DR
- We’ll use Distrobox to create an Ubuntu container, install the necessary tools, and compile llama.cpp with GPU support.
- In a couple of minutes, you’ll be able to run a model like Gemma 3 1B (or any 4b - 7b model) directly on your Steam Deck’s GPU.
- The Deck’s GPU uses around 10-11W when generating responses. The whole machine uses around 20-25W.
- You don’t need to modify your SteamOS at all. Everything runs inside a container.
(this is a real-time demonstration, not a sped-up video)
The Steam Deck is a surprisingly capable device for running local LLMs. With a bit of setup, you can have a portable, powerful inference machine.
Component | Specification |
---|---|
APU | 7 nm AMD APU |
CPU | Zen 2 4-core/8-thread, 2.4-3.5GHz |
GPU | 8 RDNA 2 compute units, 1.6GHz |
RAM | 16 GB LPDDR5 @ 5500 MT/s |
A Note on VRAM
The Steam Deck shares its 16GB of RAM between the system and the GPU. By default, the GPU is allocated 1GB of this shared memory. However, you can increase this to 4GB in the BIOS, which is highly recommended for running larger models.
Click here for instructions- Shut down the Steam Deck completely.
- Hold Volume Up (+) and press the Power button. Release the power button but keep holding Volume Up until you hear a chime.
- Select Setup Utility.
- Navigate to Advanced.
- Set UMA Frame Buffer Size to 4G.
- Go to Save & Exit and select Save Changes and Exit.
1. Enable SSH
First, we need to get shell access to the Steam Deck.
- Switch to Desktop Mode: Press the
Steam
button, go toPower
, and selectSwitch to Desktop
. - Open Konsole: Click the bottom-left button, go to
System
, and openKonsole
. - Set a Password: Press the X button to open the on-screen keyboard.Type
passwd
and press enter to set a password for thedeck
user. You’ll need this to SSH in. - Enable SSH: Run the following command to start the SSH server:
sudo systemctl enable sshd --now
- Find Your IP Address: Click the Wi-Fi icon in the bottom-right corner, click the arrow on your current connection, and choose
Details
to find your IP address. - Connect via SSH: From another computer on the same network, you can now SSH into your Steam Deck:
ssh deck@your-ip-address
2. Create a Distrobox Container
Next, we’ll create an Ubuntu container to house our development environment. This keeps the main SteamOS clean. We are using Distrobox, which is a tool that allows you to create and manage containerized development environments.
- Create the container:
distrobox create -I ubuntu:24.04
- Enter the container:
distrobox enter ubuntu:24.04
From now on, everything we do will be inside the container (your SteamOS will remain untouched). You can exit the container at any time by typing exit
.
3. Install Dependencies
Inside the Distrobox container, we need to install the tools required to build llama.cpp
with GPU acceleration.
sudo apt update && sudo apt install -y rocminfo nano ncdu nload screen tmux pigz unzip iotop htop build-essential cmake git mesa-vulkan-drivers libvulkan-dev vulkan-tools glslang-tools glslc libshaderc-dev spirv-tools libcurl4-openssl-dev ca-certificates kmod ccache radeontop
4. Build llama.cpp
Now we can clone the llama.cpp repository and build it with Vulkan support.
- Clone the repo:
git clone https://github.com/ggml-org/llama.cpp cd llama.cpp
- Configure the build:
rm -rf build cmake -B build -DGGML_VULKAN=ON -DGGML_CCACHE=ON
- Build it:
cmake --build build --config Release -j
5. Run an LLM!
With llama.cpp
built, you can now run a model. This command will download and run Gemma 3 1B, using the Steam Deck’s GPU for acceleration.
./build/bin/llama-cli \
-hf unsloth/gemma-3-1b-it-GGUF\
--gpu-layers -1 \
-p "Say hello from Steam Deck GPU."
You should see the model generate a response, with the GPU taking on the bulk of the work.
6. Power Consumption
A quick note on power. While running the Gemma 3 1B model, the GPU consumes around 10W. Factoring in the CPU, the total power draw is between 20-25W. This makes the Steam Deck a surprisingly efficient device for running even quantized 7B models.
Conclusion
The ability to run quantized 7B models in such a small and power-efficient form factor is incredible. I wonder if there are any good role-play finetunes that fit in the Deck’s VRAM? Who else is thinking about building something an AI companion that you can speak to (and that can speak back)?