How I Made Gemma 4 10x Faster on Jetson Orin Nano

JetsonHacks Apr 17, 2026 11:17

33,424 views · 870 likes Watch on YouTube ↗

Join this channel to get access to perks:
https://www.youtube.com/channel/UCQs0lwV6E4p7LQaGJ6fgy5Q/join

Gemma 4 is supposed to be edge-friendly.

So why did the recommended path fail on a Jetson Orin Nano?
In this video, I show what happened, what changed, and how that turns into a bigger lesson about running LLMs on small edge machines. This is not just about getting Gemma 4 to load. It is about understanding the tradeoffs between model size, quantization, context window, and memory, so the system becomes actually usable.

This is true of all the Jetsons as shown in the benchmarks, Orin Nano, AGX Orin and AGX Thor.

If you are working with Jetson, llama.cpp, local models, or edge AI, this video should help.

Used in the video:
Jetson Orin Nano Developer Kit: https://amzn.to/4ctd3o1
Jetson AGX Orin Developer Kit: https://amzn.to/4vvBDgN
Jetson AGX Thor Developer Kit: https://amzn.to/4sCcqhO

00:00 Intro
01:19 Model Shopping
02:47 Model Demo
03:24 Image Processing
04:51 Reasoning
05:30 Tool Use
07:09 Model quantization and context explained

As an Amazon Associate I earn from qualifying purchases.
Visit the JetsonHacks storefront on Amazon: https://www.amazon.com/shop/jetsonhacks

Visit the website at https://jetsonhacks.com
Sign up for the newsletter! https://newsletter.jetsonhacks.com
Github accounts: https://github.com/jetsonhacks
https://github.com/jetsonhacksnano
Twitter: http://twitter.com/jetsonhacks

Some of these links here are affiliate links. As an Amazon Associate I earn from qualifying purchases at no extra cost to you.

Category (YouTube): Science & Technology

Playback is via YouTube's official embedded player. Data from YouTube; Exumo is not affiliated with YouTube.