For years, GPU processing (the use of high performance video cards for general purpose computing) has been the preserve of academic and niche commercial research. As the tools have improved, the march towards this technology has become more pronounced.
However, within the big IT manufactures, GPU's have mainly remained for displaying information, not for processing it.
This changed last week, when IBM annonced that they would offer the Nvidia Tesla cards in their System X series servers.
The interesting bit (at least for us) is the reason Big Blue decided to go with the Nvidia solution over the same from ATI. But first a small bit of background.
During the last few years, the two main competing GPU manufactures (Nvidia and ATI) both came out with their own method for using the graphics cards for general purpose computing. Nvidia has CUDA, and ATI had Stream. They were in effect the same thing (an C-based API to the card), but were independent of each other.
In the background a standard was forming, called OpenCL. Initially developed by Apple, it was subsequently embraced by AMD, IBM, Intel and Nvidia. It provided a non-GPU specific way of programming highly paralleled work to be performed on the graphics card.
The OpenCL standard had it's first real commercial release in the Mac OS X 10.6 (Snow Leopard) release. It's also something that we've been looking at internally (more to come).
So, we had two proprietary means of programming GPU's and one emerging standard. Even though IBM was part of the OpenCL group, they have chosen to go with Nvidia and specifically CUDA.
CUDA is mature, and has been well received and implemented by both academic and commercial companies. However, at the time of writing, it was limited to Nvidia cards. If you want to use ATI cards to do the same thing, you have to use ATI Stream (or the emerging OpenCL).
Personally we don't like the separation. If you right for CUDA, you have to re-write for Stream. For what ever reason, you might have both Nvidia and ATI in the same system. If that's the case then you'd need two separate pieces of code for each card, rather than a single code base that could run across both (agnostically).
For us, that means focusing on OpenCL. Yes, it's immature compared to CUDA (although maybe not so much ATI Stram). But it is catching up, and will allow us to deploy our code across different cards, regardless of manufacturer. For us, that's worth more than the benefits that CUDA or Stream bring in isolation. We hope that both companies can instead consolidate around OpenCL in the near future.