The canonical view volume is a cube with its extreme points at [1,1,1] and [1,1,1]. Coordinates in this view volume are called normalized device coordinates (NDC), the objective of this step is to build a transformation matrix so that a region of space we want to render called the view volume is mapped to the canonical view volume

vndc=Mprojvview

Some points expressed in view space won’t be part of the view volume and will be discarded after the transformation, this process is called clipping (we only need to check if any coordinate of a point is outside the range [1,1] to discard it)

Later it’ll be seen that both transformations imply division and a neat trick is the use of projective geometry to avoid division, any point that has the form (αx,αy,αz,1) can be represented as (x,y,z,1α) in homogeneous coordinates, so we can introduce an intermediate step which transforms the points to clip coordinates and then to normalized device coordinates by doing a division with the w-coordinate 11/α=α

vclip=Mprojvviewvndc=αvclip

Orthographic projection

An orthographic projection matrix is built with 6 parameters

  • left, right: planes in the x-axis
  • bottom, top: planes in the y-axis
  • near, far: planes in the z-axis

These parameters bound the view volume which is an axis-aligned bounding box

Ortographic Projection

Ortographic Projection

Controls
lr
tb
near
far

Since the mapping of the range [l,r] to the range [1,1] is linear we can use the equation of the line y=mx+b and find the values of m and b however we can intuitively get a similar equation by creating a function f(x) so that f(0)=1 and f(1)=1, we can create a nested function g(x) so that g(l)=0 and g(r)=1 (note that [l,r] is the input range) then f(x) has the form

(1)f(x)=1+2g(x)(2)g(x)=xlrl

Finally f(x) has the form

f(x)=1+2xlrl=lrrl+2rlx2lrl=2rlx+lrrl(3)=2rlxr+lrl

We can adapt (3) to have a similar form for the y-coordinate using t and b. These equations are transformations from view space to clip space:

xclip=2rlxviewr+lrl
yclip=2tbyviewt+btb

The zclip value will be different from the ones above since we’re mapping [n,f][1,1]

zclip=2f(n)zviewf+(n)f(n)=2f+nzviewfnf+n=2fnzview+fnfn=2fnzviewf+nfn

The w is left untouched since the projection doesn’t imply division, the general orthographic projection matrix is

(4)Mproj=[2rl00r+lrl02tb0t+btb002fnf+nfn0001]

The transformation matrix from view space to clip space is

vclip=Mprojvview[xclipyclipzclipwclip]=[2rl00r+lrl02tb0t+btb002fnf+nfn0001][xviewyviewzviewwview]

Finally note that wclip will always have the value of wview=1, therefore the transformation to NDC will not modify the coordinates

[xndcyndczndc]=[xview/1yview/1zview/1]

Building the matrix using combined transformations

A simpler way to think about this orthographic projection transformation is by splitting it in three steps

  • translation of the bottom left near corner to the origin i.e. [l,b,n][0,0,0]
  • scale it to be a 2-unit length cube
  • translation of the bottom left corner from the origin i.e. [0,0,0][1,1,1]
Mproj=[1001010100110001][2rl00002tb00002fn00001][100l010b001n0001] =[1001010100110001][2rl002lrl02tb02btb002fn2nfn0001] =[2rl002lrl102tb02btb1002fn2nfn10001] =[2rl00r+lrl02tb0t+btb002fnf+nfn0001] 

Perspective projection

Projective geometry concepts are used in this type of projection, particularly the fact that objects away from the point of view appear smaller after projection, this type of projection mimics how we perceive objects in reality

A perspective projection matrix is built with 6 parameters, left, right, bottom, top, near, far

  • left, right: x-axis bounds for the near plane
  • bottom, top: y-axis bounds for the near plane
  • near, far: planes in the z-axis, the interception point of the line passing through the origin parallel to the vector [l,b,n] and the plane far is the bottom left far extreme of the view volume, a similar logic is used to find all the extremes in the far plane of the view volume

These parameters define a truncated pyramid also called a frustum

Perspective projection

Perspective projection

Controls
fov
near
far

General perspective projection matrix

The mapping of the range [l,r] to the range [1,1] can be split into two steps

  • Project all the points to the near plane, this way all the x- and y-coordinates will be inside the range [l,r]×[b,t]
  • Map all the values in the range [l,r] and [b,t] to the range [1,1]
Top view of the frustum

Top view of the frustum

Side view of the frustum

Side view of the frustum

Let vview be a vector in view space which is going to be transformed to clip space, by similar triangles we see that the value of xp and yp (the coordinates projected to the near plane) is

(5)xpxview=nzviewxp=nxviewzview(6)ypyview=nzviewyp=nyviewzview

Note that both quantities are inversely proportional to zview, what we can do is manipulate the coordinate so that it has a common denominator

[nxviewzviewnyviewzviewnzviewzview]T=[nxviewnyviewnzview]Tzview

The point in homogeneous coordinates is

[nxviewnyviewnzview1zview]T

OpenGL will then project any 4D homogeneous coordinate to the 3D hyperplane w=1 by dividing each of the coordinates by w, note that this division operation isn’t done by the application but by OpenGL itself on a further step on the rendering pipeline

We can take advantage of this process and use zview as our w, with this in mind we can construct a transformation matrix so that transformed points have w=zview

(7)[xclipyclipzclipwclip]=[............0010][xviewyviewzviewwview]wclip=zview

Where xclip,yclip,zclip,wclip are expressed in terms of the clip space, when each coordinate is divided by wclip we’ll have NDC

[xndcyndczndc]=[xclip/wclipyclip/wclipzclip/wclip]

Next xp and yp are mapped linearly to [1,1], we can use the function to perform linear mapping (3)

xndc=2rlxpr+lrl(8)yndc=2tbypt+btb

Next we substitute the values of xp (5) in xndc (8)

xndc=2rlnxviewzviewr+lrl=2nrlxviewzviewr+lrlzviewzview=(2nrlxview+r+lrlzview)/zview

Note that the second fraction is manipulated so that it’s also divisible by zview, also note that the quantity in the parenthesis is in clip space coordinates: xclip

xclip=2nrlxview+r+lrlzview

Similarly the value of yclip is

yclip=2ntbyview+t+btbzview

Then the transformation matrix seen in (7) is now

(9)[xclipyclipzclipwclip]=[2nrl0r+lrl002ntbt+btb0....0010][xviewyviewzviewwview]

Next we need to find the value of zclip, note that the projected value is always a constant because the zclip component depends on zview and is also divided by zview, we need zclip to be unique for the clipping and depth test, plus we should be able to unproject it (through an inverse transformation)

Since zndc doesn’t depend on xview or yview we can borrow the w-coordinate to find the relationship between zndc and zview, with that in mind we can make the third row of (9) equal to

(10)[xclipyclipzclipwclip]=[2nrl0r+lrl002ntbt+btb000AB0010][xviewyviewzviewwview]

Then zndc has the form

zndc=zclipwclip=Azview+Bwviewzview

Since wview=1 in view space

zndc=Azview+Bzview

Note that the value is not linear but it needs to be mapped to [n,f][1,1], substituting the desired output range [1,1] as zndc we have a system of equations

{1=An+Bn1=Af+Bf{An+B=nAf+B=f

Subtracting the second equation from the first

An+B+AfB=nfA(fn)=nfA=f+nfn

Solving for B given A

f+nfnn+B=n
B=nf+nfnn=fn+n2fnn2fn=2fnfn

Substituting the values of A and B in (10) we have the general perspective projection matrix

(11)Mproj=[2nrl0r+lrl002ntbt+btb000f+nfn2fnfn0010]

Symmetric perspective projection matrix

If the viewing volume is symmetric i.e. r=l and t=b then some quantities can be simplified

r+l=0,rl=2rt+b=0,tb=2t

Then (11) becomes

(12)Mproj=[nr0000nt0000f+nfn2fnfn0010]

Symmetric perspective projection matrix from field of view/aspect

gluPerspective receives instead of the x and y bounds two arguments

  • field of view (fov) which specifies the field of view angle in the y direction
  • aspect (aspect) which is the aspect ratio that determines the field of view in the x direction calculated as xy, the value is commonly screen widthscreen height
fov

fov

We see that the value of t (top) is

(13)tan(fov/2)=tn(14)t=ntan(fov/2)

We can find the value of r (right) with the aspect ratio

(15)aspect=2r2t=rt(16)r=aspectt(17)=aspectntan(fov/2)

Substituting (14) and (17) in (12)

(18)Mproj=[1aspecttan(fov/2)00001tan(fov/2)0000f+nfn2fnfn0010]