GPU clusters forgive a lot of sloppy engineering. Jetson devices do not. Models that hit 80 FPS on an A10 often run at 6 FPS on a Jetson Orin Nano out of the box — and the difference is almost never the model weights themselves.
Quantization is the first real lever
INT8 quantization routinely gives 2–3× speedup with under 1% accuracy loss on most CNN backbones. We treat it as mandatory, not optional. TensorRT is the path of least resistance on Jetson; ONNX Runtime with the TensorRT execution provider is the compromise when the toolchain needs to stay portable.
Calibration data matters more than most tutorials suggest. Use a representative sample from the target environment — not the training distribution. Synthetic calibration sets produce surprises in the field.
Thermals are a product decision
Every Jetson has a thermal throttling curve, and every enclosure trades cooling for size. A model that hits advertised throughput on a bench can fail the same day in a sealed metal box in the sun. We build thermal margin into the evaluation loop — not just the mechanical design.
Continue reading