Last Updated: February 25, 2016
·
582
· ardiyu07

Intel OpenCL Implicit Vectorizer

OpenCL compiler differs depending on its vendor, and Intel optimizes its compiler to auto-vectorize some loops that may take the advantage of the SSE and AVX instructions.

For example, the Black-Scholes equation when executed with single thread C99 and single thread OpenCL thread gives the execution time as below:

  • Input: 10MB of data
  • calculates both call and put option
  • both uses -O3 compiler option of gcc-4.4

c99 : 1612.203 ms
OpenCL : 673.248 ms

This 'hidden' optimization is kinda cool, isn't it?