./configure; make; make check
Then take a look on the numbers at the end of the testbed execution.
tifficc utility should also work to some extent, but there is a 1-pixel caché that may give bad performance. I have to turn caché off for such profiles as the caché code takes more than the transform itself.
It is funny to note that this is pure "C" code, and in some situations outperforms SSE2 hand-written assembly. That was the case when using the Intel compiler.
64-bits hardware is pretty untested, so if you manage to make it work on such architectures, please drop me a note, thanks!