For a period of time, I had a few extra minus signs in the wrong place and had the camera pointing in the wrong direction. Once I resolved that (and remembered to actually generate the Z-buffer from the Z values), everything worked well.
Average 7.68ms render time on 8 threads
A 4k image involving 3x as many spheres renders in an average of 303.63ms.
MacBook Pro (15-inch, 2016)
2.7 GHz quad-core Intel Core i7 Skylake (6820HQ)
16 GB 2133 MHz LPDDR3 RAM