https://wiki.preterhuman.net/index.php?title=Imaging_techniques_for_O2_developers&feed=atom&action=historyImaging techniques for O2 developers - Revision history2020-12-01T21:36:13ZRevision history for this page on the wikiMediaWiki 1.35.0https://wiki.preterhuman.net/index.php?title=Imaging_techniques_for_O2_developers&diff=9346&oldid=prevNetfreak: Created page with "A post authored by César Blecua to the newsgroup comp.sys.sgi.graphics on Saturday 01 Jun 2002: Hi, I'm posting here a collection of techniques which may be helpful for im..."2019-07-19T08:11:33Z<p>Created page with "A post authored by César Blecua to the newsgroup comp.sys.sgi.graphics on Saturday 01 Jun 2002: Hi, I'm posting here a collection of techniques which may be helpful for im..."</p>
<p><b>New page</b></p><div>A post authored by César Blecua to the newsgroup comp.sys.sgi.graphics on Saturday 01 Jun 2002: <br />
<br />
Hi,<br />
<br />
I'm posting here a collection of techniques which may be helpful for<br />
image processing software developers on O2 workstations. Some of these<br />
hints come from technical papers. Others come from my own<br />
experimentation and benchmarking. Note that I've *no* access to<br />
unpublished SGI documentation, so it's perfectly possible that some of<br />
the items on this list be inaccurate or wrong. However, they worked for<br />
me, and I think other developers may benefit as well. Corrections<br />
welcome.<br />
<br />
These techniques are being used and tested in an O2-native image<br />
compositing application that I've been developing for the last months (a<br />
pre-announce was posted at comp.sys.sgi.apps about a month ago).<br />
<br />
<br />
Goals<br />
-----<br />
* Get hardware acceleration for as many image processing operations as<br />
possible. If your interest is about CPU-based processing, you can skip<br />
the whole message. Otherwise, keep on reading.<br />
<br />
* Try to keep the performance between 7M and 1M pixels per second. If<br />
you get less than 1M pix/sec, the interactive experience gets too poor<br />
(although it depends on window size, of course).<br />
<br />
<br />
Examples<br />
--------<br />
* Highest performances you can expect from the O2:<br />
<br />
7.8M pixels/sec for scale/bias with glCopyPixels<br />
7.7M pixels/sec for color matrix with glCopyPixels<br />
4.6M pixels/sec for 3x3 separable convolution with glCopyPixels<br />
4.3M pixels/sec for 3x3 non-separable convolution with glCopyPixels<br />
3.9M pixels/sec for 5x5 separable convolution with glCopyPixels<br />
2.1M pixels/sec for 5x5 non-separable convolution with glCopyPixels<br />
2.3M pixels/sec for 7x7 separable convolution with glCopyPixels<br />
1.2M pixels/sec for 7x7 non-separable convolution with glCopyPixels<br />
<br />
<br />
API<br />
---<br />
OpenGL with extensions (color matrix, convolution, histogram, and color<br />
table)<br />
<br />
<br />
How to process pixels<br />
---------------------<br />
OpenGL performs image processing only when you transfer pixels. So, you<br />
need to call one of glDrawPixels, glReadPixels, glCopyPixels,<br />
glTexImage2D or glGetTexImage. In the O2, the fastest is glCopyPixels.<br />
The next is glDrawPixels (in some situations their performance is<br />
similar).<br />
<br />
<br />
Preferred GLX visual<br />
--------------------<br />
If you don't need destination alpha, choose a GLX visual without it. The<br />
above performances were measured in a window without alpha. The numbers<br />
are a bit smaller if your window has alpha. However, since the<br />
performance penalty is not too noticeable, you may still prefer to have<br />
destination alpha, because it's often helpful in image processing<br />
rendering pipelines.<br />
<br />
<br />
Preferred image size<br />
--------------------<br />
You get slightly better performance if the image width is multiple of<br />
16. After benchmarking all OpenGL imaging extensions with GLperf, I<br />
noticed that 576x576 is usually the fastest square image size for all<br />
operations. There seems to be a size limit near 704x704 where<br />
performance can decrease down to a half (!), so it can be convenient to<br />
process the pixels in tiles rather than in just a big glCopyPixels.<br />
However, note that performance is also low for small image sizes. The<br />
most optimal sizes seem to be 480, 512, 576, and 640.<br />
<br />
<br />
The ICE<br />
-------<br />
The ICE is the "Imaging and Compression Engine", an O2 ASIC responsible<br />
both for image compression and image processing. Your goal is to<br />
"convince" the ICE to process your requested operation. If you can't<br />
"convince" it, the operation will be processed on the CPU, and the<br />
performance can easily drop from 7M down to 300K, 50K, or even less...<br />
<br />
<br />
How to "convince" the ICE<br />
-------------------------<br />
In general, all coefficients for a given operation must be either<br />
integer (greater than 1 or less than -1), or real in the [-1.0, 1.0]<br />
range. You can't mix integers and reals in the same operation, or the<br />
ICE will reject to process it. Note that '1' and '-1' are *not* integers<br />
for the ICE. '2' and '-2' are the first integers.<br />
<br />
<br />
Examples with scale/bias (useful for brightness-contrast)<br />
---------------------------------------------------------<br />
The GL_*_BIAS values of glPixelTransfer seem to be unaffected by this<br />
limitation (I think you can safely mix reals and integers for the<br />
GL_*_BIAS values)<br />
<br />
All the GL_*_SCALE values of glPixelTransfer must be either real in the<br />
[-1.0, 1.0] range, or integer greater than 2. And don't forget about<br />
GL_ALPHA_SCALE, which is 1.0 by default (so you *necessarily* must<br />
change GL_ALPHA_SCALE if you want to use integer values for the<br />
GL_*_SCALE parameters).<br />
<br />
You can implement non-integer GL_*_SCALEs outside the [-1.0, 1.0] range<br />
by decomposing the scale in two: one integer and the other in the [-1.0,<br />
1.0] range. This can be done in a single pass, because the OpenGL pixel<br />
transfer pipeline has several scale/bias stages. Another option that may<br />
be useful is to put the integer scale in glPixelTransfer, and the<br />
non-integer scale in the Color Matrix.<br />
<br />
<br />
Examples with Color Matrix<br />
--------------------------<br />
Purpose: you can use the color matrix for operations such as hue<br />
rotation, luminance conversion, tint, saturation tuning, and color model<br />
conversions. Take a look at Paul Haeberli's 'Grafica Obscura' for the<br />
details.<br />
<br />
All the color matrix elements must belong to the [-1.0, 1.0] range in<br />
order to be accelerated by the ICE. That's not a problem for hue<br />
rotation nor luminance conversion, but there's one case where you always<br />
need elements outside [-1.0, 1.0] and non-integer: The matrix for<br />
increasing saturation. Fortunately, there's an easy workaround: find the<br />
smallest integer greater than the largest matrix element (in absolute<br />
value) and divide the whole matrix by such integer. Then set all the<br />
GL_POST_COLOR_MATRIX_*_SCALE settings to that integer. With this<br />
approach, you can increase saturation with the ICE.<br />
<br />
In general, if you see that your color matrix performance is too low,<br />
dump it and see if any element is outside [-1.0, 1.0]. In such case,<br />
apply the previous integer divide technique.<br />
<br />
<br />
Examples with Convolution<br />
-------------------------<br />
All elements in the convolution kernel must belong to the [-1.0, 1.0]<br />
range. For blurring it's not a problem (all blur kernels are made from<br />
elements in the [0.0,1.0] range). For sharpening, embossing, and edge<br />
detect it *is* a problem, and it's very possible that you need to use<br />
the GL_POST_CONVOLUTION_*_SCALE for compensating for a downscaled<br />
kernel.<br />
<br />
<br />
Histogram<br />
---------<br />
Histogram is *slow* on O2, no matter whether done on the ICE or not.<br />
There's no performance enough for a per-frame histogram, so if possible<br />
you should compute it once and store it for later reuse.<br />
<br />
<br />
Bugs<br />
----<br />
The O2 OpenGL image processing pipeline is not 100% bug-free (at least<br />
not in the systems I've used), but it's easy to avoid the bugs if you<br />
follow these guidelines:<br />
<br />
-Large glCopyPixels can stop prematurely without finishing the work.<br />
It's safer to decompose it in smaller tiles (as said above,<br />
480,512,...640 are safe sizes, and also optimal for best performance).<br />
I've not found this limitation with glDrawPixels.<br />
<br />
-Never use convolution with GL_*_BIAS and GL_*_SCALE, because some<br />
combinations can distort the image. Use GL_POST_CONVOLUTION_*_BIAS and<br />
GL_POST_CONVOLUTION_*_SCALE instead.<br />
<br />
-Test, test, and test. If you get any image distortion, try to rearrange<br />
terms in the pixel transfer pipeline.<br />
<br />
<br />
Hope somebody finds all this stuff useful,<br />
<br />
César Blecua<br />
cesarble...@ono.com<br />
<br />
[[Category:SGI]]</div>Netfreak