Showing posts with label speed. Show all posts
Showing posts with label speed. Show all posts

Monday, March 1, 2010

xput comparative 1.19 vs. 2.0



Here is the comparative for the code in git. I will publish beta2 in a couple of days. As you can see the xput improvements are considerable.


1.19
-----

16 bits on CLUT profiles                     : 10.6667 MPixel/sec.
8 bits on CLUT profiles                      : 9.75015 MPixel/sec.
8 bits on Matrix-Shaper profiles             : 3.8638 MPixel/sec.
8 bits on SAME Matrix-Shaper profiles        : 4.28495 MPixel/sec.
8 bits on Matrix-Shaper profiles (AbsCol)    : 10.4507 MPixel/sec.
16 bits on Matrix-Shaper profiles            : 3.92349 MPixel/sec.
16 bits on SAME Matrix-Shaper profiles       : 3.96924 MPixel/sec.
16 bits on Matrix-Shaper profiles (AbsCol)   : 10.6667 MPixel/sec.
8 bits on curves                             : 4.33839 MPixel/sec.
16 bits on curves                            : 4.3944 MPixel/sec.
8 bits on CMYK profiles                      : 4.09626 MPixel/sec.
16 bits on CMYK profiles                     : 3.96924 MPixel/sec.
8 bits on gray-to-gray                       : 24.3902 MPixel/sec.
8 bits on SAME gray-to-gray                  : 24.3902 MPixel/sec.


2.0
----

16 bits on CLUT profiles                     : 10.1394 MPixel/sec.
8 bits on CLUT profiles                      : 10.6667 MPixel/sec.
8 bits on Matrix-Shaper profiles             : 26.2726 MPixel/sec.
8 bits on SAME Matrix-Shaper profiles        : 30.1318 MPixel/sec.
8 bits on Matrix-Shaper profiles (AbsCol)    : 10.5541 MPixel/sec.
16 bits on Matrix-Shaper profiles            : 10.2433 MPixel/sec.
16 bits on SAME Matrix-Shaper profiles       : 9.84615 MPixel/sec.
16 bits on Matrix-Shaper profiles (AbsCol)   : 10.6667 MPixel/sec.
8 bits on curves                             : 30.0752 MPixel/sec.
16 bits on curves                            : 35.3201 MPixel/sec.
8 bits on CMYK profiles                      : 4.19727 MPixel/sec.
16 bits on CMYK profiles                     : 4.26667 MPixel/sec.
8 bits on gray-to-gray                       : 40.9207 MPixel/sec.
8 bits on SAME gray-to-gray                  : 40.9207 MPixel/sec.
8 bits on CMYK profiles                      : 4.3573 MPixel/sec.
16 bits on CMYK profiles                     : 4.55192 MPixel/sec.
8 bits on gray-to-gray                       : 39.312 MPixel/sec.
8 bits on SAME gray-to-gray                  : 40.9207 MPixel/sec.

Monday, February 15, 2010

Speed measurements

Here are some performance numbers, measured on my laptop. This is a pretty old compaq nc6400, Dual core T2300E (1.66GHZ) so it should rock in a modern machines. Note how fast transforms go when both profiles are implemented as matrix shaper. The measurements are using the code on git.

16 bits on CLUT profiles.. 5.95238 Mpixel/sec.
8 bits on CLUT profiles...10.5541 Mpixels/sec.
8 bits on Matrix-Shaper profiles..27.6817 Mpixels/sec.
8 bits on SAME Matrix-Shaper profiles...36.5297 Mpixels/sec.
8 bits on curves... 29.2505 Mpixels/sec.
8 bits on Matrix-Shaper profiles (AbsCol)...27.6817 Mpixels/sec.
16 bits on curves... 37.9147 Mpixel/sec.
8 bits on CMYK profiles... 8.90373 Mpixels/sec.
16 bits on CMYK profiles...4.45186 Mpixel/sec.
8 bits on gray-to-gray conversions.. 32.9897 Mpixels/sec.
8 bits on same gray-to-gray conversions ...39.312 Mpixels/sec.

Sunday, November 29, 2009

What is new from lcms 1.x

First obvious question is “why should I upgrade to Little CMS 2.0”. Here are some clues:

Little CMS 2.0 is a full v4 CMM, which can accept v2 profiles. Little CMS 1.xx was a v2 CMM which can deal with (some) V4 profiles. The difference is important, as 2.0 handling of PCS is different, definitively better and far more accurate.
  • It does accept and understand floating point profiles (MPE) with DToBxx tags. (Yes, it works!) It has 32 bits precision. (lcms 1.xx was 16 bits)
  • It handles float and double formats directly. MPE profiles are evaluated in floating point with no precision loss.
  • It has plug-in architecture that allows you to change interpolation, add new proprietary tags, add new “smart CMM” intents, etc.
  • Is faster. In some combinations, has a x 6 throughput boost.
  • Some new algorithms, incomplete state of adaptation, Jan Morovic’s segment maxima gamut boundary descriptor, better K preservation…
  • Historic issues, like faulty icc34.h, freeing profiles after creating transform, etc. All is solved.

Tuesday, July 14, 2009

Same profile on both sides

It is very convenient to detect whatever the source and destination profiles are same to instruct the CMM to do nothing. Seems quite simple but it is certaily complex.

The issue is on embedded profiles. You can't do a binary compare because embedded profiles may have changed attributes. That is, some fields in the profile header are different to reflect the preference on intent and the fact the profile is being used embedded.

V4 offersProfileID, which is an MD5 checksum of the profile avoiding those conflicting fields. Which is a good thing: if both source and destination profiles does have same ProfileID AND the intent is same on both profiles, then you can get rid of the whole transform as it is basically a no-op.

But sometimes (most of times, currently) you get AdobeRGB or sRGB embedded, which are v2 profiles. No Profile ID, and a very common case.

So, let's try to do some optimization. If both profiles are matrix-shaper, you can detect if the obtained matrix is an identity, and then if the curves are cancelling. We have room for improvement in 3 cases:

  • All different
  • Same primaries but different gamma
  • Same primaries and equal tone curves
Last case is a no-op, but is pretty frequent: untagged images assumed to be sRGB and uncalibrated monitor assumed to be sRGB. Handling this case separately is a big plus if you care about speed.

Saturday, July 11, 2009

More on speed

As promised, I have updated the snapshot. The performance numbers on matrix-shaper to matrix shaper should be close to what lcms2 is going to deliver when released. If you want to run the testbed, you would need to copy those profiles from Photoshop distribution, as I'm not allowed to redistribute that:
  • AdobeRGB1998.icc
  • CoatedFOGRA27.icc
  • UncoatedFOGRA29.icc
  • USWebCoatedSWOP.icc
  • USWebUncoated.icc
Put them on the "testbed" folder. Ok, now just type

./configure; make; make check


Then take a look on the numbers at the end of the testbed execution.

tifficc utility should also work to some extent, but there is a 1-pixel caché that may give bad performance. I have to turn caché off for such profiles as the caché code takes more than the transform itself.

It is funny to note that this is pure "C" code, and in some situations outperforms SSE2 hand-written assembly. That was the case when using the Intel compiler.

64-bits hardware is pretty untested, so if you manage to make it work on such architectures, please drop me a note, thanks!

Thursday, July 9, 2009

about speed

I'm getting outstanding results with lcms2 and matrix-shaper profiles. It is still not on the public preview, so you have to trust me, but here are some numbers. That's tested on my laptop which is an old 2GHz 2-core CPU:

lcms 2.0:
8 bits on Matrix-Shaper profiles...done.
[625 tics, 0.625 sec, 25.6 Mpixels/sec.]

lcms 1.18:
lcms is transforming full spectrum in 8 bits...done.
[3984 tics, 3.984 sec, 4.01606 Mpixels/sec.]


Thats a boost of about X 6.3. Please note that applies only to 8 bit matrix-shaper to matrix-shaper transforms, so RGB only! When primaries of both profiles are same, the performance is even better, it reaches about 30 Megapixel/second. I will put all this code available this weekend.