Imaging techniques for O2 developers
A post authored by César Blecua to the newsgroup comp.sys.sgi.graphics on Saturday 01 Jun 2002:
I'm posting here a collection of techniques which may be helpful for image processing software developers on O2 workstations. Some of these hints come from technical papers. Others come from my own experimentation and benchmarking. Note that I've *no* access to unpublished SGI documentation, so it's perfectly possible that some of the items on this list be inaccurate or wrong. However, they worked for me, and I think other developers may benefit as well. Corrections welcome.
These techniques are being used and tested in an O2-native image compositing application that I've been developing for the last months (a pre-announce was posted at comp.sys.sgi.apps about a month ago).
- Get hardware acceleration for as many image processing operations as
possible. If your interest is about CPU-based processing, you can skip the whole message. Otherwise, keep on reading.
- Try to keep the performance between 7M and 1M pixels per second. If
you get less than 1M pix/sec, the interactive experience gets too poor (although it depends on window size, of course).
- Highest performances you can expect from the O2:
7.8M pixels/sec for scale/bias with glCopyPixels 7.7M pixels/sec for color matrix with glCopyPixels 4.6M pixels/sec for 3x3 separable convolution with glCopyPixels 4.3M pixels/sec for 3x3 non-separable convolution with glCopyPixels 3.9M pixels/sec for 5x5 separable convolution with glCopyPixels 2.1M pixels/sec for 5x5 non-separable convolution with glCopyPixels 2.3M pixels/sec for 7x7 separable convolution with glCopyPixels 1.2M pixels/sec for 7x7 non-separable convolution with glCopyPixels
API --- OpenGL with extensions (color matrix, convolution, histogram, and color table)
How to process pixels
OpenGL performs image processing only when you transfer pixels. So, you need to call one of glDrawPixels, glReadPixels, glCopyPixels, glTexImage2D or glGetTexImage. In the O2, the fastest is glCopyPixels. The next is glDrawPixels (in some situations their performance is similar).
Preferred GLX visual
If you don't need destination alpha, choose a GLX visual without it. The above performances were measured in a window without alpha. The numbers are a bit smaller if your window has alpha. However, since the performance penalty is not too noticeable, you may still prefer to have destination alpha, because it's often helpful in image processing rendering pipelines.
Preferred image size
You get slightly better performance if the image width is multiple of 16. After benchmarking all OpenGL imaging extensions with GLperf, I noticed that 576x576 is usually the fastest square image size for all operations. There seems to be a size limit near 704x704 where performance can decrease down to a half (!), so it can be convenient to process the pixels in tiles rather than in just a big glCopyPixels. However, note that performance is also low for small image sizes. The most optimal sizes seem to be 480, 512, 576, and 640.
The ICE is the "Imaging and Compression Engine", an O2 ASIC responsible both for image compression and image processing. Your goal is to "convince" the ICE to process your requested operation. If you can't "convince" it, the operation will be processed on the CPU, and the performance can easily drop from 7M down to 300K, 50K, or even less...
How to "convince" the ICE
In general, all coefficients for a given operation must be either integer (greater than 1 or less than -1), or real in the [-1.0, 1.0] range. You can't mix integers and reals in the same operation, or the ICE will reject to process it. Note that '1' and '-1' are *not* integers for the ICE. '2' and '-2' are the first integers.
Examples with scale/bias (useful for brightness-contrast)
The GL_*_BIAS values of glPixelTransfer seem to be unaffected by this limitation (I think you can safely mix reals and integers for the GL_*_BIAS values)
All the GL_*_SCALE values of glPixelTransfer must be either real in the [-1.0, 1.0] range, or integer greater than 2. And don't forget about GL_ALPHA_SCALE, which is 1.0 by default (so you *necessarily* must change GL_ALPHA_SCALE if you want to use integer values for the GL_*_SCALE parameters).
You can implement non-integer GL_*_SCALEs outside the [-1.0, 1.0] range by decomposing the scale in two: one integer and the other in the [-1.0, 1.0] range. This can be done in a single pass, because the OpenGL pixel transfer pipeline has several scale/bias stages. Another option that may be useful is to put the integer scale in glPixelTransfer, and the non-integer scale in the Color Matrix.
Examples with Color Matrix
Purpose: you can use the color matrix for operations such as hue rotation, luminance conversion, tint, saturation tuning, and color model conversions. Take a look at Paul Haeberli's 'Grafica Obscura' for the details.
All the color matrix elements must belong to the [-1.0, 1.0] range in order to be accelerated by the ICE. That's not a problem for hue rotation nor luminance conversion, but there's one case where you always need elements outside [-1.0, 1.0] and non-integer: The matrix for increasing saturation. Fortunately, there's an easy workaround: find the smallest integer greater than the largest matrix element (in absolute value) and divide the whole matrix by such integer. Then set all the GL_POST_COLOR_MATRIX_*_SCALE settings to that integer. With this approach, you can increase saturation with the ICE.
In general, if you see that your color matrix performance is too low, dump it and see if any element is outside [-1.0, 1.0]. In such case, apply the previous integer divide technique.
Examples with Convolution
All elements in the convolution kernel must belong to the [-1.0, 1.0] range. For blurring it's not a problem (all blur kernels are made from elements in the [0.0,1.0] range). For sharpening, embossing, and edge detect it *is* a problem, and it's very possible that you need to use the GL_POST_CONVOLUTION_*_SCALE for compensating for a downscaled kernel.
Histogram is *slow* on O2, no matter whether done on the ICE or not. There's no performance enough for a per-frame histogram, so if possible you should compute it once and store it for later reuse.
The O2 OpenGL image processing pipeline is not 100% bug-free (at least not in the systems I've used), but it's easy to avoid the bugs if you follow these guidelines:
-Large glCopyPixels can stop prematurely without finishing the work. It's safer to decompose it in smaller tiles (as said above, 480,512,...640 are safe sizes, and also optimal for best performance). I've not found this limitation with glDrawPixels.
-Never use convolution with GL_*_BIAS and GL_*_SCALE, because some combinations can distort the image. Use GL_POST_CONVOLUTION_*_BIAS and GL_POST_CONVOLUTION_*_SCALE instead.
-Test, test, and test. If you get any image distortion, try to rearrange terms in the pixel transfer pipeline.
Hope somebody finds all this stuff useful,
César Blecua [email protected]