# Perspective Transforms

By Andre Yew (andrey@gluttony.ugcs.caltech.edu)

This is how I learned perspective transforms --- it was intuitive and understandable to me, so perhaps it'll be to others as well. It does require knowledge of matrix math and homogeneous coordinates. IMO, if you want to write a serious renderer, you need to know both.

First, let's look at what we're trying to do:

` S (screen) `

| * P (y, z)

| /|

| / |

| / |

|/ |

* R |

/ | |

/ | |

/ | |

E (eye)/ | | W

---------*-----|----*-------------

<- d -><-z->

E is the eye, P is the point we're trying to project, and R is its projected position on the screen S (this is the point you want to draw on your monitor). Z goes into the monitor (left- handed coordinates), with X and Y being the width and height of the screen. So let's find where R is:

` R = (xs, ys)`

Using similar triangles (ERS and EPW)

` xs/d = x/(z + d) `

ys/d = y/(z + d)

(Use similar triangles to determine this)

So,

` xs = x*d/(z + d) `

ys = y*d/(z + d)

Express this homogeneously:

` R = (xs, ys, zs, ws)`

Make

` xs = x*d `

ys = y*d

zs = 0 (the screen is a flat plane)

ws = z + d

and express this as a vector transformed by a matrix:

` [x y z 1][ d 0 0 0 ] `

[ 0 d 0 0 ] = R

[ 0 0 0 1 ]

[ 0 0 0 d ]

The matrix on the right side can be called a perspective transform. But we aren't done yet. See the zero in the 3rd column, 3rd row of the matrix? Make it a 1 so we retain the z value (perhaps for some kind of Z-buffer). Also, this isn't exactly what we want since we'd also like to have the eye at the origin and we'd like to specify some kind of field-of-view. So, let's translate the matrix (we'll call it M) by -d to move the eye to the origin:

` [ 1 0 0 0 ][ d 0 0 0 ] `

[ 0 1 0 0 ][ 0 d 0 0 ]

[ 0 0 1 0 ][ 0 0 1 1 ] <--- Remember, we put a 1 in (3,3) to

[ 0 0 -d 1 ][ 0 0 0 d ] retain the z part of the vector.

And we get:

` [ d 0 0 0 ] `

[ 0 d 0 0 ]

[ 0 0 1 1 ]

[ 0 0 -d 0 ]

Now parametrize d by the angle PEW, which is half the field-of-view (FOV/2). So we now want to pick a d such that ys = 1 always and we get a nice relationship:

` d = cot( FOV/2 )`

Or, to put it another way, using this formula, ys = 1 always.

Replace all the d's in the last perspective matrix and multiply through by sin's:

` [ cos 0 0 0 ] `

[ 0 cos 0 0 ]

[ 0 0 sin sin ]

[ 0 0 -cos 0 ]

With all the trig functions taking FOV/2 as their arguments. Let's refine this a little further and add near and far Z-clipping planes. Look at the lower right 2x2 matrix:

` [ sin sin ] `

[-cos 0 ]

and replace the first column by a and b:

` [ a sin ] `

[ b 0 ]

[ b 0 ]

Transform out near and far boundaries represented homogeneously as (zn, 1), (zf, 1), respectively and we get:

` (zn*a + b, zn*sin) and (zf*a + b, zf*sin)`

We want the transformed boundaries to map to 0 and 1, respectively, so divide out the homogeneous parts to get normal coordinates and equate:

` (zn*a + b)/(zn*sin) = 0 (near plane) `

(zf*a + b)/(zf*sin) = 1 (far plane)

Now solve for a and b and we get:

` a = (zf*sin)/(zf - zn) `

= sin/(1 - zn/zf)

b = -a*zn

b = -a*zn

At last we have the familiar looking perspective transform matrix:

` [ cos( FOV/2 ) 0 0 0 ] `

[ 0 cos( FOV/2 ) 0 0 ]

[ 0 0 sin( FOV/2 )/(1 - zn/zf) sin( FOV/2 ) ]

[ 0 0 -a*zn 0 ]

There are some pretty neat properties of the matrix. Perhaps the most interesting is how it transforms objects that go through the camera plane, and how coupled with a clipper set up the right way, it does everything correctly. What's interesting about this is how it warps space into something called Moebius space, which is kind of like a fortune-cookie except the folds pass through each other to connect the lower folds --- you really have to see it to understand it. Try feeding it some vectors that go off to infinity in various directions (ws = 0) and see where they come out.