Frame Conventions¶
In this note, we describe the coordinate frame conventions used in viser.
Scene tree naming¶
Each object that we add to the scene in viser is instantiated as a node in a scene tree. The structure of this tree is determined by the names assigned to the nodes.
If we add a coordinate frame called /base_link/shoulder/wrist, it signifies
three nodes: the wrist is a child of the shoulder which is a child of the
base_link.
If we set the transformation of a given node like /base_link/shoulder, both
it and its child /base_link/shoulder/wrist will move. Its parent,
/base_link, will be unaffected.
Poses¶
Poses in viser are defined using a pair of fields:
wxyz, a unit quaternion orientation term. This should always be 4D.position, a translation term. This should always be 3D.
These correspond to a transformation from coordinates in the local frame to the parent frame:
where wxyz is the quaternion form of the \(\mathrm{SO}(3)\) matrix
\(R\) and position is the \(\mathbb{R}^3\) translation term
\(t\).
World coordinates¶
In the world coordinate space, +Z points upward by default. This can be
overridden with viser.SceneApi.set_up_direction().
Cameras¶
In viser, all camera parameters exposed to the Python API use the
COLMAP/OpenCV convention:
Forward: +Z
Up: -Y
Right: +X
Confusingly, this is different from Nerfstudio, which adopts the OpenGL/Blender convention:
Forward: -Z
Up: +Y
Right: +X
Conversion between the two is a simple 180 degree rotation around the local X-axis.