Frame Conventions¶
In this note, we describe the coordinate frame conventions used in viser
.
Scene tree naming¶
Each object that we add to the scene in viser is instantiated as a node in a scene tree. The structure of this tree is determined by the names assigned to the nodes.
If we add a coordinate frame called /base_link/shoulder/wrist
, it signifies
three nodes: the wrist
is a child of the shoulder
which is a child of the
base_link
.
If we set the transformation of a given node like /base_link/shoulder
, both
it and its child /base_link/shoulder/wrist
will move. Its parent,
/base_link
, will be unaffected.
Poses¶
Poses in viser
are defined using a pair of fields:
wxyz
, a unit quaternion orientation term. This should always be 4D.position
, a translation term. This should always be 3D.
These correspond to a transformation from coordinates in the local frame to the parent frame:
where wxyz
is the quaternion form of the \(\mathrm{SO}(3)\) matrix
\(R\) and position
is the \(\mathbb{R}^3\) translation term
\(t\).
World coordinates¶
In the world coordinate space, +Z points upward by default. This can be
overridden with viser.SceneApi.set_up_direction()
.
Cameras¶
In viser
, all camera parameters exposed to the Python API use the
COLMAP/OpenCV convention:
Forward: +Z
Up: -Y
Right: +X
Confusingly, this is different from Nerfstudio, which adopts the OpenGL/Blender convention:
Forward: -Z
Up: +Y
Right: +X
Conversion between the two is a simple 180 degree rotation around the local X-axis.