How I Made Gemma 4 10x Faster on Jetson Orin Nano
JetsonHacks 11:17
28,754 views · 780 likes Watch on YouTube ↗
Join this channel to get access to perks:
https://www.youtube.com/channel/UCQs0lwV6E4p7LQaGJ6fgy5Q/join
Gemma 4 is supposed to be edge-friendly.
So why did the recommended path fail on a Jetson Orin Nano?
In this video, I show what happened, what changed, and how that turns into a bigger lesson about running LLMs on small edge machines. This is not just about getting Gemma 4 to load. It is about understanding the tradeoffs between model size, quantization, context window, and memory, so the system becomes actually usable.
This is true of all the Jetsons as shown in the benchmarks, Orin Nano, AGX Orin and AGX Thor.
If you are working with Jetson, llama.cpp, local models, or edge AI, this video should help.
Used in the video:
Jetson Orin Nano Developer Kit: https://amzn.to/4ctd3o1
Jetson AGX Orin Developer Kit: https://amzn.to/4vvBDgN
Jetson AGX Thor Developer Kit: https://amzn.to/4sCcqhO
00:00 Intro
01:19 Model Shopping
02:47 Model Demo
03:24 Image Processing
04:51 Reasoning
05:30 Tool Use
07:09 Model quantization and context explained
As an Amazon Associate I earn from qualifying purchases.
Visit the JetsonHacks storefront on Amazon: https://www.amazon.com/shop/jetsonhacks
Visit the website at https://jetsonhacks.com
Sign up for the newsletter! https://newsletter.jetsonhacks.com
Github accounts: https://github.com/jetsonhacks
https://github.com/jetsonhacksnano
Twitter: http://twitter.com/jetsonhacks
Some of these links here are affiliate links. As an Amazon Associate I earn from qualifying purchases at no extra cost to you.
https://www.youtube.com/channel/UCQs0lwV6E4p7LQaGJ6fgy5Q/join
Gemma 4 is supposed to be edge-friendly.
So why did the recommended path fail on a Jetson Orin Nano?
In this video, I show what happened, what changed, and how that turns into a bigger lesson about running LLMs on small edge machines. This is not just about getting Gemma 4 to load. It is about understanding the tradeoffs between model size, quantization, context window, and memory, so the system becomes actually usable.
This is true of all the Jetsons as shown in the benchmarks, Orin Nano, AGX Orin and AGX Thor.
If you are working with Jetson, llama.cpp, local models, or edge AI, this video should help.
Used in the video:
Jetson Orin Nano Developer Kit: https://amzn.to/4ctd3o1
Jetson AGX Orin Developer Kit: https://amzn.to/4vvBDgN
Jetson AGX Thor Developer Kit: https://amzn.to/4sCcqhO
00:00 Intro
01:19 Model Shopping
02:47 Model Demo
03:24 Image Processing
04:51 Reasoning
05:30 Tool Use
07:09 Model quantization and context explained
As an Amazon Associate I earn from qualifying purchases.
Visit the JetsonHacks storefront on Amazon: https://www.amazon.com/shop/jetsonhacks
Visit the website at https://jetsonhacks.com
Sign up for the newsletter! https://newsletter.jetsonhacks.com
Github accounts: https://github.com/jetsonhacks
https://github.com/jetsonhacksnano
Twitter: http://twitter.com/jetsonhacks
Some of these links here are affiliate links. As an Amazon Associate I earn from qualifying purchases at no extra cost to you.
Playback is via YouTube's official embedded player. Data from YouTube; Exumo is not affiliated with YouTube.