Frame Conventions

In this note, we describe the coordinate frame conventions used in viser.

Scene tree naming

Each object that we add to the scene in viser is instantiated as a node in a scene tree. The structure of this tree is determined by the names assigned to the nodes.

If we add a coordinate frame called /base_link/shoulder/wrist, it signifies three nodes: the wrist is a child of the shoulder which is a child of the base_link.

If we set the transformation of a given node like /base_link/shoulder, both it and its child /base_link/shoulder/wrist will move. Its parent, /base_link, will be unaffected.

Poses

Poses in viser are defined using a pair of fields:

  • wxyz, a unit quaternion orientation term. This should always be 4D.

  • position, a translation term. This should always be 3D.

These correspond to a transformation from coordinates in the local frame to the parent frame:

\[\begin{split}p_\mathrm{parent} = \begin{bmatrix} R & t \end{bmatrix}\begin{bmatrix}p_\mathrm{local} \\ 1\end{bmatrix}\end{split}\]

where wxyz is the quaternion form of the \(\mathrm{SO}(3)\) matrix \(R\) and position is the \(\mathbb{R}^3\) translation term \(t\).

World coordinates

In the world coordinate space, +Z points upward by default. This can be overridden with viser.SceneApi.set_up_direction().

Cameras

In viser, all camera parameters exposed to the Python API use the COLMAP/OpenCV convention:

  • Forward: +Z

  • Up: -Y

  • Right: +X

Confusingly, this is different from Nerfstudio, which adopts the OpenGL/Blender convention:

  • Forward: -Z

  • Up: +Y

  • Right: +X

Conversion between the two is a simple 180 degree rotation around the local X-axis.