Godel Space · Edge Hardware Report

Prithvi-EO-2.0-300M Profiling

NVIDIA Jetson Orin Nano 8GB Super · FP16 TensorRT · 400-tile sustained workload

← Demo
68ms
Latency (p50 @ 25W)
1.29J/tile
Energy (@ 15W)
0.9948cos
FP16 vs FP32 Accuracy
62°C
Max Thermal (sustained)
01

Test Configuration

Hardware

Platform
NVIDIA Jetson Orin Nano 8GB Super
GPU
Ampere, 1024 CUDA / 32 Tensor cores
Memory
8GB unified LPDDR5, 68 GB/s
JetPack
6.2 (R36.4.7)
TensorRT
10.3.0
CUDA
12.6

Test Parameters

Engine
prithvi_fp16.trt (620 MB)
Input shape
(1, 6, 4, 224, 224)
Tiles per run
400
Warmup tiles
10
Cooldown
120s between modes
Clock governor
Dynamic (default)
02

Sustained Throughput

Power ModeThroughputp50p95p99100 tiles400 tiles
7W3.6 tiles/s273 ms273 ms273 ms27s110s
15W9.7 tiles/s102 ms102 ms102 ms10s41s
25W14.4 tiles/s68 ms68 ms69 ms7s28s
MAXN14.3 tiles/s69 ms69 ms70 ms7s28s
Recommended modesAll modes process 400 tiles well within the 20-minute orbital pass window.
03

Power Efficiency

Efficiency Pareto frontier
Fig 1. Tiles/Watt vs latency — 15W is Pareto-optimal for energy-constrained deployment
ModeMean WJ / tileTiles / W
7W7.8W2.130.47
15W12.5W1.290.77
25W20.3W1.410.71
MAXN21.5W1.500.67

Key Finding

15W is the energy-optimal mode at 1.29 J/tile. 25W provides best raw throughput but MAXN offers no benefit over 25W — wasted power for identical performance.

04

Thermal Behavior

GPU temperature over 400 tiles
Fig 2. GPU temperature remains below 62°C across all power modes — no thermal throttling
Total power draw over 400 tiles
Fig 3. Stable power consumption — no spikes or runaway behavior

7W

Peak: 53.1°C

52°C to TJ_MAX

15W

Peak: 56.4°C

49°C to TJ_MAX

25W

Peak: 61.0°C

44°C to TJ_MAX

MAXN

Peak: 61.9°C

43°C to TJ_MAX

TJ_MAX = 105°C (thermal shutdown). All modes operate with >40°C thermal headroom. Zero swap activity across all runs.

05

Latency Stability

Per-tile latency over 400 tiles
Fig 4. Rock-stable latency across all power modes — less than 1% degradation over 400 tiles
Power ModeFirst 10 tiles (p50)Last 10 tiles (p50)Degradation
7W272.61 ms272.63 ms+0.01%
15W102.36 ms101.83 ms-0.52%
25W68.13 ms68.31 ms+0.25%
MAXN69.17 ms68.49 ms-0.98%
06

Scene Processing Estimates

Scene processing time heatmap
Fig 5. Estimated wall-clock time by scene size and power mode — all within the 20-minute orbital window

7W

1 min 50s

Ultra-low power

15W

41s

Energy optimal

25W

28s

Latency optimal

MAXN

28s

No benefit over 25W

07

Precision Constraints

FP16 Accuracy Validation

MAE
4.55e-02 (threshold < 5e-02) — PASS
Cosine Similarity
0.9948 (threshold > 0.99) — PASS
Method
20 tiles, PyTorch FP32 CPU ground truth
Verdict
Functionally identical to FP32

INT8 Memory Ceiling

INT8 engine compilation for Prithvi-300M requires >578 MB contiguous GPU memory for block 23 fused constants. On the 8GB Orin Nano, this exceeds available memory after blocks 0-22 are compiled.

Tested exhaustively: native Python, Docker, trtexec CLI, opt_level 0-3, workspace 256-512 MB. All fail at the same point. This is a physical memory limitation, not a configuration issue.

INT8 deployment targets Jetson AGX Orin 64GB (275 TOPS, 204 GB/s bandwidth). Expected INT8 p50: ~10-15 ms.

08

Recommendations

Energy-Constrained

FP16 @ 15W

1.29 J/tile · 9.7 tiles/s · 0.77 tiles/W

Maximizes mission endurance on limited solar/battery budget. 400-tile scene in 41 seconds.

Latency-Constrained

FP16 @ 25W

68 ms p50 · 14.4 tiles/s · 0.71 tiles/W

Minimizes per-tile latency for time-critical detection. 400-tile scene in 28 seconds.

Stack

NASA/IBM Prithvi-EO-2.0-300MPyTorch 2.10ONNX opset 17TensorRT 10.3CUDA 12.6JetPack 6.2FP16 precisionApache 2.0