Parallel Radiosity
Peterson Trethewey
Radiosity Background
Radiosity is a technique in computer graphics which simulates diffuse lighting in a rendered scene. In a basic radiosity algorithm [3], the scene (often the interior of a room) is subdivided into rectangular subsurfaces called patches, and then a system of equations is built with an equation for each patch capturing the fact that the amount of light emitted from that patch is proportional to the amount of light incident on the patch from all other patches in the scene. Radiosity is a very computationally expensive task, and of course there are many intelligent algorithms for patch placement and solving the radiosity system. One such technique is called "shooting radiosity" which iteratively "shoots" light out from the patch with the greatest error.

(Image from Wikipedia articlle)
In ordinary ray-tracing, each pixel's color is determined in a separate pass so there is no inherent dependence of pixels on each other. Parallelization is simply a matter of allocating to each processor a piece of the final image to render. With radiosity, on the other hand, each patch might depend on each other patch. The bulk of the calculation is not done per-pixel, so a good parallel scheme is not just a matter of partitioning the image.
Parallelization Techniques
In [1], Arnaldi et al introduce the idea of a virtual wall. The premise is to take the scene, subdivide it artificially into rooms and close off the rooms by temporary separating planes called virtual walls. The virtual walls are divided into patches, and start out emitting nothing and absorbing all light. In the first iteration of the algorithm, each room is given to a different processor to render. When all the rooms are rendered, the patches on each virtual wall become light emitters on the opposite side of the wall to simulate the light shining through the virtual wall (this requires communication to adjacent processors). In the following iterations, the lighting is refined until an equilibrium is reached.
In [2], an embellishment of this algorithm is proposed in which the virtual walls are replaced by a "virtual interface". Instead of alternating between solving each room and transmitting light between rooms, the transmission between rooms is continuous. A shooting radiosity technique is used in each room and whenever a patch shoots, it not only distributes light around the local room, but also into adjacent rooms. The virtual interface is not subdivided into patches like the virtual wall, so when light traverses an interface it carries with it a bitmap called a visibility mask which encodes how the light is occluded by objects in the source room.
Experimental Results
The first round of tests of the virtual interface method are meant to demonstrate that even on a single processor machine (A Sun UltraSparc), the technique of virtual interfaces can actually yield a speed improvement over rendering the entire scene as one radiosity system. They use three different models of varying complexity: a simple model of an office (Office), a model with many connected rooms (Rooms) and a detailed model of Soda Hall in Berkeley (Building). Although the simplest one, Office, failed to benefit from virtual interfaces, with the more complex models there was appreciable improvement:

(Image taken from [2])
The second round of tests were performed on a multiprocessor machine: the Intel Paragon XP/S. In the paper they claim that their algorithm is designed for a distributed memory system. The machine they use consists of 56 computing nodes each of which has its own local memory, a computing processor and a communication processor. Then they have the whole thing connected by ethernet to an SGI Onyx which renders the result to the screen. In the multiprocessor tests, the same scenes were rendered, this time allocating rooms to different nodes. Here is a graph from [2] showing their performance results:
Note that even with the simplest scene, Office, for which virtual interfaces failed to improve the running time on a single processor machine, with 56 processors, we are now seeing a speed-up factor of 10. That's the worst case, and it is still not too shabby. On the other hand, with that particular model, a speed-up of 10 seems to be the limit, after that, more processors don't help very much. The more complex the scene, the more efficient use the algorithm makes of multiple processors, but even for the complex models, the curves on the line graph are concave down. Perhaps for one given model, there is an optimal number of processors. This is in 1996 and 56 probably seemed like a lot, but it would be interesting to see how the problem scales to machines with thousands of processors. Still, with the Building model, they get a speed-up of 40 with 56 processors. That's about 71% the theoretical maximum from the machine they are using.
Sources
[1] B. Arnaldi, X. Pueyo, and J. Vilaplana. On the Division of Environments by Virtual Walls for Radiosity Computation. In F. Jansen and P. Burnet, editors, Photorealistic Rendering in Computer Graphics. Springer Verlag, 1994
[2] B. Arnaldi, T. Priol, L. Renambot and X. Pueyo. Visibility Masks for Solving Complex Radiosity Computations on Multiprocessors. Institut National de Recherche in Informatique et en Autimatique, Oct 1996
[3] Watt Alan H., Watt Mark. Advanced Animation and Rendering Techniques Theory and Practice. ACM Press, New York, NY, 1992