Pushing the limits of visual navigation in ultra-sparse environments

Share article
As mentioned in our previous post, a key blocker to autonomy for moving robots is the narrow nature of existing navigation stacks, which limit generalization across platforms.
Even for well-engineered perception stacks, performance degrades sharply in visually sparse environments. Most current solutions rely heavily on classical approaches and domain-specific assumptions, causing them to fail in real-world settings that lack uniquely identifiable visual features, such as clear landmarks or high-contrast lines.
Two weeks ago, in the Ocotillo desert, we set out to push our transformer-based visual navigation system to its operational limit. In the process, we achieved a remarkable breakthrough: ultra-low (<100 ft AGL) flights over desert terrains with almost no identifiable features. Crucially, this entire feat was carried out running in real-time on a limited compute node with a narrow FOV camera.
12 meter error at 75 ft AGL
Here are the key results across dozens of long-duration flights over the Ocotillo desert:
- <30m of positional error sustained across entire flights
- <5m of error during multiple sustained segments
- Altitude as low as 50 feet AGL, at speeds up to 60 mph
- All generated live without post processing