Frame Conventions

This page describes the coordinate frame conventions used in viser.

Scene Tree Naming

Each object added to the scene in viser is instantiated as a node in a hierarchical scene tree. The structure of this tree is determined by the names assigned to the nodes.

If we add a coordinate frame called /base_link/shoulder/wrist, it creates three nodes:

  • wrist is a child of shoulder

  • shoulder is a child of base_link

  • base_link is the root node

When we set the transformation of a parent node like /base_link/shoulder:

  • ✅ Both the node and all its children (e.g., /base_link/shoulder/wrist) will move

  • ❌ Its parent (/base_link) remains unaffected

Poses

Poses in viser are defined using two components:

Field

Description

wxyz

Unit quaternion orientation term (always 4D: w, x, y, z)

position

Translation vector (always 3D: x, y, z)

These correspond to a transformation from coordinates in the local frame to the parent frame:

\[\begin{split}p_\mathrm{parent} = \begin{bmatrix} R & t \end{bmatrix}\begin{bmatrix}p_\mathrm{local} \\ 1\end{bmatrix}\end{split}\]

where wxyz represents the quaternion form of the \(\mathrm{SO}(3)\) rotation matrix \(R\) and position represents the \(\mathbb{R}^3\) translation vector \(t\).

World Coordinates

In the world coordinate space, +Z points upward by default. This can be overridden with viser.SceneApi.set_up_direction().

Camera Conventions

In viser, all camera parameters use the COLMAP/OpenCV convention:

Axis

Direction

Forward

+Z

Up

-Y

Right

+X

Note

Difference from Nerfstudio

This is different from Nerfstudio, which uses the OpenGL/Blender convention:

  • Forward: -Z, Up: +Y, Right: +X

Conversion: A simple 180° rotation around the local X-axis converts between the two conventions.


See also

Related Documentation