The Ikaros Math Library

Christian Balkenius

This article was updated 2012 and a newer version is available.

Introduction

This article describes the math library that was developed for Ikaros version 1.0. By using these function the execution speed of an Ikaros model can be greatly increased. In some cases, like matrix multiplication, the math library may be up to 100 times faster than plain scalar code as the library automatically takes advantage of vector processing hardware that may be available. This is done by placing the math library on top of a number of highly optimized libraries including BLAS/ATLAS. However, scalar code is also included so any model using the libary can execute on any hardware even if it does not support vector operations. The library also tries to used other available hardware optimized libraries. For example, on OS X, it uses functions within the Accelerate framework such as vImage, vForce and vDSP.

The math library is adapted to the data types used by Ikaros which means that all functions applies to scalars, arrays or matrices of floats. The reason for standardizing on floats and not doubles is that for the type of models that Ikaros is intended, the resolution of floats is usually more than sufficient. In addition, the currently available hardware acceleration usually only supports floats.

The math library is automatically included though IKAROS.h. All that is needed is to define the namespace ikaros in the beginning of a module. It is possible to use IKAROS_math.h instead of math.h in most cases, although IKAROS_Math does not include all the functions in math.h. One major difference between IKAROS_math and math.h is that function automatically use floats. This means that you should use, for example, sin(x) instead of sinf(x) even if x is a float. There are no double precision functions in the Ikaros math library.

Most functions are overloaded so that they can operate on either scalar, array or matrix objects. For example, a function f that returns a float may have the following different forms:

float f(float x);                          // scalar version
float f(float * x, int size);              // array version
float f(float ** x, int sizex, int sizey); // matrix version

In other cases, functions performs an operation on the individual elements of an array or matrix. In the case where the result is assigned to a new varable r, the above pattern translates into the following:

float    f(float x);                                      // scalar version
float *  f(float * r, float * x, int size);               // array version
float ** f(float ** r, float ** x, int sizex, int sizey); // matrix version

The above functions, return a pointer to the result to allow easy cascading of function calls. This is only for conveience, it is not necessary to assign the returned pointer to a variable. It is also possible for some functions to operate in place. In this case, the function pattern is the following:

float *  f(float * x, int size);              // array version
float ** f(float ** x, int sizex, int sizey); // matrix version

More complex functions follow the same pattern as above but may include additional arguments. To see exactly which versions are available of a function, you need to consult IKAROS_Math.h or the documentation.

Examples

The following example shows how to create some arrays and a matrix and perform some operations on them:

int i;
float x, y, z;

float * a = create_array(4);
float * b = create_array(4);
float * c = create_array(4);
float ** d = create_matrix(4, 4);

a[0] = 1; a[1] = 2; a[2] = 3; a[3] = 4;
b[0] = -1; b[1] = -4; b[2] = -2; b[3] = -1;

add(c, a, b, 4);        // add a and b and store results in c

x = add(a, 4);          // add all values in a
x = dot(a, b, 4);       // the dot product between a and b

x = min(a, 4);          // find the minimum element of a

i = arg_max(a, 4);      // get the index of the largets element in a

minmax(y, z, a, 4);     // simultaneously find minimum and maximum of a

outer(d, a, b, 4, 4)    // calculate the outer product of a and b
                        // and store it in d

add(c, 0.5, a, -1.0, 4);    // multiply each element in a with 0.5;
                            // subtract 1 and store the results in c

multiply(c, a, d, 4, 4, 4); // multiply the vector a with the matrix d
                            // and store the result in c

Since all array and matrix functions return return a pointer to the result, it is possible to combine functions in a compact way. The following example shows how to calculate the sum of squares by combining functions. In one case, the calculations are mad in place and the original data is destroyed. In the second ase, an intemediate buffer is used. The final example shows the calculation of the standard deviation of the array using a combination of four math functions. Some calulcations are performed in place which means that the values in the array a are destoyed during the calculation.

float a[4] = { 1.0, 2.3, 0.1, 8.9 };
float r[4];

// calculate the sum of squares in two ways

float sum_of_squares;

sum_of_squares = add(sqr(r, a, 4)); // non-destructive
sum_of_squares = add(sqr(a, 4)); // destoys the values in a

// calculate the standard deviation

float stdev = sqrt(1/(4-1)*add(sqr(subtract(a, mean(a, 4), 4)), 4), 4));

The above example also illustrates that the Ikaros math functions can operate on standard C arrays. The matrix operations are not compatible with C-matrices that are not allocated with create_matrix().

Add & Subtract

Most Ikaros modules uses many additions and subtractions so it is important that these are fast. The add and subtract functions of the math library makes sure that this is the case. There are fourteen different versions fo the add function depending on the exact operation and the data types involved.

All the add function are called add but they differ in the number and types of arguments. The different add function can be divided into a number of types:

The reason for the more complex addition functions is that they increase the execution speed when several operations can be done more or less simultaneously without additional cost. For example, to scale all elements of an array while adding it to another is much faster than first scaling and then adding.

The functions for subtractions are similar to the addition functions but they are fewer since in some cases you can get subtraction by changing the sign of a factor in the add function.

Multiplication and Division

Multiplication is typically one of the most costly operations and the Ikaros math library performs this task very fast. There are large number of different multiplication functions that should be sufficient for most needs. The functions are called multiply, dot, and outer. Multiply is used for element-wise multiplication, matrix x vector multiplication and matrix x matrix multiplication, dot calculates the dot product, and outer calculates the outer product of two vectors.

The single division function is called divide is used for element-wise division of the elements in an array or matrix.

Miscellaneous Functions

In addition to the functions decribed above, there are a number of useful mathematical functions. They work on scalar as well as arrays and matrices:

Conversion

The math libray contains a number of functions to convert between different data types. Some of these functions operate on whole arrays or matrices and are typically much faster than converting each element individually.

Use float_to_byte and byte_to_float to convert between floats and bytes. These functions are for example used internally by Ikaros when images are read or written in raw format. You can also convert from strings into integers or floats using string_to_int and string_to_float. These functions recognize NULL and will produce the supplied default value in this case.

Image Processing

There is currently only two image processing functions in the library: convolve and box_filter. The most useful function is convolve which implements two dimensional convolution and can be extermely fast on systems where it can use hardware acceleration. The following example illustrates how the function is used:

float ** I = create_matrix(9, 9);

I[3][3] = 1; I[3][4] = 1; I[3][5] = 1;
I[4][3] = 1; I[4][4] = 1; I[4][5] = 1;
I[5][3] = 1; I[5][4] = 1; I[5][5] = 1;

float ** k = create_matrix(3, 3);

k[0][0] = 1; k[0][1] = 0; k[0][2] = -1;
k[1][0] = 2; k[1][1] = 0; k[1][2] = -2;
k[2][0] = 1; k[2][1] = 0; k[2][2] = -1;

float ** J = create_matrix(7, 7);

convolve(J, I, k, 7, 7, 3, 3);

In this example, one image matrix I is created together with a kernel matrix k. A matrix to hold the results is also created and called J. The elements of the two matrices are fillled manually before the convolve function is called to calculate J. Note the different matrix sizes. The width of the resulting image J is calculated as: width(J) = width(I) - width(k) + 1, and similarily for the height. This means that the image size will be clipped through this operation.

Performance Issues

The array and matrix functions are optmized for use with large data sets. The overhead of calling the vector optimized functions is neglible when the arrays or matrices are large enough. If smaller data structures are used and performance is cirtical, it is possible to tune the math library by changing the settings in the IKAROS_System.h file.

By commenting the defines, USE_VDSP, USE_VIMAGE, USE_VFORCE, USE_BLAS, scalar code will be used instead. In some cases, performance will increase when only some of the math libraries are used. For example, if USE_BLAS is commented, the OS X version of Ikaros will use vDSP and VForce functions instead, which on some hardware may be faster.

When running on a multiprocessor system, many of the math functions already take advantage of several processors, but performance can often be increased even further by running Ikaros itself in threaded mode by setting the flag -t at start-up.

Another factor that influences performance is how complex the math functions are. In general, the fewest number of functions that produces the desired result should be used. The following example shows two ways to calculate the euclidean norm of an array:

float enorm = norm(a, 4);               // faster
float enorm = sqrt(add(sqr(a, 4), 4));  // slower

The first alternative is generally much faster since it calculates the square and sums at the same time while the second alternative must iterate through the elements two times, first during the sqt operation and then a second time during the add operation. Because the most time consumming task is to load and store data from memory, it is important to perform as many operations as possible while the data is in the cache. This is automatically taken care of if the more complex functions are used.

In some cases, this can lead to more complex functions being more efficient even though they do not do exectly what is needed. For example, to calculate the sum of squares of a large array, it is faster to use the norm function and square the result, than to calculate the sum of squares directly although the former calculates an unnecessary sqr and sqrt.

sum_of_squares = add(sqr(a, 4000));  // slower
sum_of_squares = sqr(norm(a, 4000)); // faster

It is important to realize that the relative merit of different ways to calculate a value depends on many things including the size of the array or matrix as well as the particular hardware and libraries used. It is always best to profile the module to test which methods is fastest. Automatic profiling is turned on using the -p switch when starting Ikaros.