Frame Conventions¶

In this note, we describe the coordinate frame conventions used in viser.

Scene tree naming¶

Each object that we add to the scene in viser is instantiated as a node in a scene tree. The structure of this tree is determined by the names assigned to the nodes.

If we add a coordinate frame called /base_link/shoulder/wrist, it signifies three nodes: the wrist is a child of the shoulder which is a child of the base_link.

If we set the transformation of a given node like /base_link/shoulder, both it and its child /base_link/shoulder/wrist will move. Its parent, /base_link, will be unaffected.

Poses¶

Poses in viser are defined using a pair of fields:

wxyz, a unit quaternion orientation term. This should always be 4D.
position, a translation term. This should always be 3D.

These correspond to a transformation from coordinates in the local frame to the parent frame:

\[\begin{split}p_\mathrm{parent} = \begin{bmatrix} R & t \end{bmatrix}\begin{bmatrix}p_\mathrm{local} \\ 1\end{bmatrix}\end{split}\]

where wxyz is the quaternion form of the \(\mathrm{SO}(3)\) matrix \(R\) and position is the \(\mathbb{R}^3\) translation term \(t\).

World coordinates¶

In the world coordinate space, +Z points upward by default. This can be overridden with viser.SceneApi.set_up_direction().

Cameras¶

In viser, all camera parameters exposed to the Python API use the COLMAP/OpenCV convention:

Forward: +Z
Up: -Y
Right: +X

Confusingly, this is different from Nerfstudio, which adopts the OpenGL/Blender convention:

Forward: -Z
Up: +Y
Right: +X

Conversion between the two is a simple 180 degree rotation around the local X-axis.