The objective of this step is to find a transformation matrix to transform points expressed in world space to view space, a camera can be imagined to exist from a known point of view that captures some objects of the space

$$ \mathbf{v}_{view} = \mathbf{M}_{view} \mathbf{v}_{wld} $$

The construction of the transformation matrix to transform points from world space to view space needs 3 parameters:

  • $\mathbf{camera}$ a point expressed in world space defining the location of the point of view, note that the $\mathbf{camera}$ is at the origin of the view space
  • $\mathbf{at}$ the direction where the camera is aiming at
  • $\mathbf{up}$ denotes the upward orientation of the camera (typically coincides with the positive $y$-axis)
view transform

view transform

Note that the camera is looking at the negative $z$-axis of the view space, this is a convention rather than a rule since the projection matrix will be constructed in a way so that points in the $-z$-axis in view space are transformed to the range $[-1,1]$

Derivation of the view transform matrix

The process of transforming the vertices in the world space to view space is given by

  • Creation of a coordinate frame for the view space
  • Application of the appropriate translation for the camera location (world space -> upright space)
  • Transformation of the points in world space to camera space (upright space -> object space)

Creation of a coordinate frame for the view space

Given $\mathbf{camera}$, $\mathbf{at}$ and $\mathbf{up}$ the steps to compute the coordinate frame are whose basis vectors are $\mathbf{u}$, $\mathbf{v}$ and $\mathbf{w}$ (note that since these are basis vectors they need to be unit vectors)

  • compute $\mathbf{w}$ trivially by normalizing the vector $\mathbf{camera - at}$
$$ \mathbf{w} = \frac{\mathbf{camera - at}}{\norm{\mathbf{camera - at}}} $$

  • next $\mathbf{u}$ can be computed with the cross product of $\mathbf{w}$ and $\mathbf{up}$, again the resulting vector must be normalized
$$ \mathbf{u} = \frac{\mathbf{w} \times \mathbf{up}}{\norm{ \mathbf{w} \times \mathbf{up} }} $$

  • finally $\mathbf{v}$ can be computed as
$$ \mathbf{v} = \mathbf{w} \times \mathbf{u} $$

Camera translation

The transformation matrix that moves all the points from world space to view’s upright space is

$$ \mathbf{T} = \begin{bmatrix} 1 & 0 & 0 & -camera_x \\ 0 & 1 & 0 & -camera_y \\ 0 & 0 & 1 & -camera_z \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

Transformation of the points from world space to view space

Given that the camera transformation basis vectors (encoded in a matrix) are

$$ \mathbf{M}_{wld \leftarrow view} = \begin{bmatrix} \mathbf{u}_{3 \times 1} & \mathbf{v}_{3 \times 1} & \mathbf{w}_{3 \times 1} \end{bmatrix} $$

Expressed in a 4x4 matrix

$$ \mathbf{M}_{wld \leftarrow view} = \begin{bmatrix} x_u & x_v & x_w & 0 \\ y_u & y_v & y_w & 0 \\ z_u & z_v & z_w & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

Works as a transformation matrix to transform points from view space to world space, therefore the matrix that does the opposite operation (transformation from world space to view space) is the inverse of this matrix (the transpose is equivalent since the matrix is orthonormal)

$$ \mathbf{M}_{view \leftarrow wld} = \mathbf{M^{-1}}_{wld \leftarrow view} = \mathbf{M}^T_{wld \leftarrow view} = \begin{bmatrix} x_u & y_u & z_u & 0 \\ x_v & y_v & z_v & 0 \\ x_w & y_w & z_w & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

The view matrix

We can combine the translation and the rotation matrix in a single matrix called the view matrix which has the form

$$ \begin{align*} \mathbf{M}_{view} &= \mathbf{M}_{view \leftarrow wld} \mathbf{T} \\ &= \begin{bmatrix} x_u & y_u & z_u & 0 \\ x_v & y_v & z_v & 0 \\ x_w & y_w & z_w & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & -camera_x \\ 0 & 1 & 0 & -camera_y \\ 0 & 0 & 1 & -camera_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ &= \begin{bmatrix} x_u & y_u & z_u & -\mathbf{camera \cdot u} \\ x_v & y_v & z_v & -\mathbf{camera \cdot v} \\ x_w & y_w & z_w & -\mathbf{camera \cdot w} \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{align*} $$