Programming on GPUs
From a short introductory couse on parallell GPGPU.
The main source of GPGPU programming. There are also links to tutorials and stuff on that page!
A main idea seems to be that in scientific applications you have large sets of data that should be worked on with a small set of instructions - like a big for-loop over some vectors. In cases like that, instructions can be loaded into the processing unit, then data can be worked on. In this way the streams of instructions and data from memory are not interweaved anymore, which make it possible to get good performance. This idea is ofcourse not unique to GPGPU.
GPUs can do about 250GFlops, Cell (IBM/Toshiba?) can do about 230GFlops. There are a few other contenders as well.
A GPU is a fast parallel array processors. That mean that computing on an array of data actually looks like you compute a single element. If you feed the gpu with an array, you can set constraints that make it possible to do different computations on different elements, but then the whole parallel performance is destroyed. So the “right” thing is obviously to feed in arrays and perform the same operations on all elements in the array.
As the gpu is mainly concerned about graphics, it is geometric primitives that can be specified as subset of dataarrays, like quads and trianbles. Lines can also be specified, but that will yield slower computations, point clouds even slower. We are talking about “output regions” here.
High level language: float (16bit/32bit), vectors, struts and arrays. There are basic arithmetic and logic operators, trigonometric functions, expoentials. There are basic conditionals like if, for and while, and you can create userdefined functions.
In a CPU input and output arrays may overlap, this is not possible on GPUs, output can not be placed into the input array. CPUs can do arbitrary gather and scatter, GPUs can only do arbitrary gather while scatter is restricted (both in what you can do and in speed). There are some other differences as well.
CPUs have large cache, few processing elements and are optimized for spatial and temporal data reuse. GPUs have small cache and many processing elements, and is optimzed for sequential (streaming) data access. This of course have implications on how to program for the different arichtectures. To exploit the performance of the GPU you have to be able to configure the GPU with instructions and compute on a large chunck of data, as the configuration phase is relatively costly, you benefit more, the more data you can compute on.
Glift: Generic, Efficient, Random access GPU data structures. Idea is to create something STL-like abstractions of data containers from algorithms for GPUs, such that a user do not need to deal that much with the specifics of programming a GPU. We see some really impressive demos on this (and a lot of other things :) You can do funny things with adaptivity, like creating finer/coarser mesh while you zoom in and out.
How do you program the GPU then? There are lots of stuff that need to be taken into account, as we are really talking about the graphics system of the computer. In the beginning, things like OpenGL, windowsystem and so on needed to be considered. And “Configware” and “Flowware”, the different parts of the setup of computations on GPUs had to be specified explicitly. Now, there are metalanguages like BrookGPU, Sh, Scout that hide lots of the specifics, such that a program written in a standard language like C, Fortran, Java, Perl, more easily can be transformed into a GPU program, using APIs. Brook for GPU are for instance available at sourceforge. During the talk I searched a bit, and also found: http://www.cs.lth.se/home/Calle_Lejdfors/pygpu/ which seems to enable GPU programming in Python! There are also device independent languages like Accelerator from Microsoft Research, Peakstream (www.peakstreaminc.com), RapidMind (www.rapidmind.net). There are aslo vendor specific languages from ATI/AMD and Nvidi, for instance (CTM, CUDA).
There is lots of nifty stuff here - have to check it out more :)