2. Compile OmpSs programs

After the Installation of OmpSs you can now start compiling OmpSs programs. For that end you have to use the Mercurium compiler (which you already installed as described in Installation of Mercurium C/C++/Fortran source-to-source compiler).

See sections Compiling an OmpSs program with CUDA tasks and Compiling an OmpSs program with OpenCL tasks for more information. .. index:

double: Mercurium; drivers

2.1. Drivers

The list of available drivers can be found here: https://github.com/bsc-pm/mcxx/blob/master/doc/md_pages/profiles.md

2.1.1. Common flags accepted by Mercurium drivers

Usual flags like -O, -O1, -O2, -O3, -D, -c, -o, … are recognized by Mercurium.

Almost every Mercurium-specific flag is of the form --xxx.

Mercurium drivers are deliberately compatible with gcc. This means that flags of the form -fXXX, -mXXX and -Wxxx are accepted and passed onto the backend compiler without interpretation by Mercurium drivers.

Warning

In GCC a flag of the form -fXXX is equivalent to a flag of the form --XXX. This is not the case in Mercurium.

2.1.2. Help

You can get a summary of all the flags accepted by Mercurium using --help with any of the drivers:

$ mcc --help
Usage: mcc options file [file..]
Options:
  -h, --help               Shows this help and quits
  --version                Shows version and quits
  --v, --verbose           Runs verbosely, displaying the programs
                           invoked by the compiler
...

2.1.3. Passing vendor-specific flags

While almost every gcc of the form -fXXX or -mXXX can be passed directly to a Mercurium driver, some other vendor-specific flags may not be well known or be misunderstood by Mercurium. When this happens, Mercurium has a generic way to pass parameters to the backend compiler and linker.

--Wn,<comma-separated-list-of-flags>
Passes comma-separated flags to the native compiler. These flags are used when Mercurium invokes the backend compiler to generate the object file (.o)
--Wl,<comma-separated-list-of-flags>
Passes comma-separated-flags to the linker. These flags are used when Mercurium invokes the linker
--Wp,<comma-separated-list-of-flags>
Passes comma-separated flags to the C/Fortran preprocessor. These flags are used when Mercurium invokes the preprocessor on a C or Fortran file.

These flags can be combined. Flags --Wp,a --Wp,b are equivalent to --Wp,a,b. Flag --Wnp,a is equivalent to --Wn,a --Wp,a

Important

Do not confuse --Wl and --Wp with the gcc similar flags -Wl and -Wp (note that gcc ones have a single -). The latter can be used with the former, as in --Wl,-Wl,muldefs. That said, Mercurium supports -Wl and -Wp directly, so -Wl,muldefs should be enough.

2.2. Compiling an OmpSs program for shared-memory

For OmpSs programs that run in SMP or NUMA systems, you do not have to do anything. Just pick one of the drivers above.

2.2.1. Your first OmpSs program

Following is a very simple OmpSs program in C:

/* test.c */
#include <stdio.h>

int main(int argc, char *argv[])
{
      int x = argc;
#pragma omp task inout(x)
      {
          x++;
      }
#pragma omp task in(x)
      {
          printf("argc + 1 == %d\n", x);
      }
#pragma omp taskwait
      return 0;
}

Compile it using mcc:

$ mcc -o test --ompss test.c
Nanos++ prerun
Nanos++ phase

Important

Do not forget the flag --ompss otherwise the OmpSs directives will be ignored.

And run it:

$ ./test
argc + 1 == 2

2.3. Compiling an OmpSs program with CUDA tasks

To compile an OmpSs + CUDA program you must use one of the OmpSs/OmpSs-2 profiles (see Drivers) and the --cuda flag. Note that the Nanos++ installation that was provided to Mercurium must have been configured with CUDA (see Nanos++ configure flags).

The usual structure of an OmpSs program with CUDA tasks involves a set of C/C++ files (usually .c or .cpp) and a set of CUDA files (.cu). You can use one of our profiles to compile the CUDA files, but the driver will simply call the NVIDIA Cuda Compiler nvcc. We discourage mixing C/C++ code and CUDA code in the same file when using Mercurium. While this was supported in the past, such approach is regarded as deprecated. The recommended approach is to move the CUDA code to a .cu file.

Note

When calling kernels from C, make sure the .cu file contains an extern "C", otherwise C++ mangling in the name of the kernels will cause the link step to fail.

Warning

Fortran support for CUDA is experimental. Currently you cannot pass a .cu file to Mercurium: compile them separatedly using nvcc.

Consider the following CUDA kernels:

/* cuda-kernels.cu */

extern "C" {  // We specify extern "C" because we will call them from a C code

  __global__ void init(int n, int *x)
  {
      int i = blockIdx.x * blockDim.x + threadIdx.x;
      if (i >= n)
          return;
      x[i] = i;
  }

  __global__ void increment(int n, int *x)
  {
      int i = blockIdx.x * blockDim.x + threadIdx.x;
      if (i >= n)
          return;
      x[i]++;
  }

} /* extern "C" */

You can use the following OmpSs program in C to call them:

/* cuda-test.c */

#include <stdio.h>
#include <assert.h>

#pragma omp target device(cuda) copy_deps ndrange(1, n, 1)
#pragma omp task out(x[0 : n-1])
__global__ void init(int n, int *x);

#pragma omp target device(cuda) copy_deps ndrange(1, n, 1)
#pragma omp task inout(x[0 : n-1])
__global__ void increment(int n, int *x);

int main(int argc, char *argv[])
{
    int x[10];

    init(10, x);
    increment(10, x);

#pragma omp taskwait

    int i;
    for (i = 0; i < 10; i++)
    {
        fprintf(stderr, "x[%d] == %d\n", i, x[i]);
        assert(x[i] == (i+1));
    }

    return 0;
}

Compile these files using mcc:

$ mcc -o cuda-test cuda-test.c  cuda-kernels.cu  --ompss --cuda
cuda-test.c:5: note: adding task function 'init'
cuda-test.c:9: note: adding task function 'increment'
cuda-test.c:16: note: call to task function 'init'
cuda-test.c:17: note: call to task function 'increment'
cudacc_cuda-test.cu: info: enabling experimental CUDA support

Note

You can compile this example separatedly, as usual: using -c and linking the intermediate .o files.

Now you can run it:

$ ./cuda-test
x[0] == 1
x[1] == 2
x[2] == 3
x[3] == 4
x[4] == 5
x[5] == 6
x[6] == 7
x[7] == 8
x[8] == 9
x[9] == 10
MSG: [12] GPU 0 TRANSFER STATISTICS
MSG: [12]     Total input transfers: 0 B
MSG: [12]     Total output transfers: 0 B
MSG: [12]     Total device transfers: 0 B
MSG: [13] GPU 1 TRANSFER STATISTICS
MSG: [13]     Total input transfers: 0 B
MSG: [13]     Total output transfers: 40 B
MSG: [13]     Total device transfers: 0 B

Note that due to the caching mechanism of Nanos++ only 40 bytes (10 integers) have been transferred from the GPU to the host. See Running OmpSs Programs for detailed information about options affecting the execution of OmpSs programs.

2.4. Compiling an OmpSs program with OpenCL tasks

To compile an OmpSs + OpenCL program you must use one of the OmpSs/OmpSs-2 profiles (see Drivers) and the --opencl flag. Note that the Nanos++ installation that was provided to Mercurium must have been configured with OpenCL (see Nanos++ configure flags).

Consider this OpenCL file with kernels similar to the CUDA example above:

/* ocl_kernels.cl */
__kernel void init(int n, int  __global * x)
{
    int i = get_global_id(0);
    if (i >= n)
        return;
    x[i] = i;
}

__kernel void increment(int n, int  __global * x)
{
    int i = get_global_id(0);
    if (i >= n)
        return;
    x[i]++;
}

We can call these kernels from the OmpSs program:

#include <stdio.h>
#include <assert.h>

#pragma omp target device(opencl) copy_deps ndrange(1, n, 8) file(ocl_kernels.cl)
#pragma omp task out(x[0 : n-1])
void init(int n, int *x);

#pragma omp target device(opencl) copy_deps ndrange(1, n, 8) file(ocl_kernels.cl)
#pragma omp task inout(x[0 : n-1])
void increment(int n, int *x);

int main(int argc, char *argv[])
{
    int x[10];

    init(10, x);
    increment(10, x);

#pragma omp taskwait

    int i;
    for (i = 0; i < 10; i++)
    {
        fprintf(stderr, "x[%d] == %d\n", i, x[i]);
        assert(x[i] == (i+1));
    }

    return 0;
}

Note

OpenCL files must be passed to the driver if they appear in file directives. This also applies to separate compilation using -c.

Compile the OmpSs program using mcc:

$ mcc  -o ocl_test ocl_test.c ocl_kernels.cl --ompss
ocl_test.c:5: note: adding task function 'init'
ocl_test.c:9: note: adding task function 'increment'
ocl_test.c:18: note: call to task function 'init'
ocl_test.c:19: note: call to task function 'increment'

Now you can execute it:

$ ./ocl_test
x[0] == 1
x[1] == 2
x[2] == 3
x[3] == 4
x[4] == 5
x[5] == 6
x[6] == 7
x[7] == 8
x[8] == 9
x[9] == 10
MSG: [12] OpenCL dev0 TRANSFER STATISTICS
MSG: [12]     Total input transfers: 0 B
MSG: [12]     Total output transfers: 0 B
MSG: [12]     Total dev2dev(in) transfers: 0 B
MSG: [13] OpenCL dev1 TRANSFER STATISTICS
MSG: [13]     Total input transfers: 0 B
MSG: [13]     Total output transfers: 40 B
MSG: [13]     Total dev2dev(in) transfers: 0 B

Again, due to the caching mechanism of Nanos++, only 40 bytes (10 integers) have been transferred from the OpenCL device (in the example above it is a GPU) to the host.

2.5. Problems during compilation

While we put big efforts to make a reasonably robust compiler, you may encounter a bug or problem with Mercurium.

There are several errors of different nature that you may run into.

  • Mercurium ends abnormally with an internal error telling you to open a ticket.
  • Mercurium does not crash but gives an error on your input code and compilation stops, as if your code were not valid.
  • Mercurium does not crash, but gives an error involving an internal-source.
  • Mercurium generates wrong code and native compilation fails on an intermediate file.
  • Mercurium forgets something in the generated code and linking fails.

2.5.1. How can you help us to solve the problem quicker?

In order for us to fix your problem we need the preprocessed file.

If your program is C/C++ we need you to do:

  1. Figure out the compilation command of the file that fails to compile. Make sure you can replicate the problem using that compilation command alone.
  2. If your compilation command includes -c, replace it by -E. If it does not include -c simply add -E.
  3. If your compilation command includes -o file (or -o file.o) replace it by -o file.ii. If it does not include -o, simply add -o file.ii.
  4. Now run the compiler with this modified compilation command. It should have generated a file.ii.
  5. These files are usually very large. Please compress them with gzip (or bzip2 or any similar tool).

Send us an email to pm-tools@bsc.es with the error message you are experiencing and the (compressed) preprocessed file attached.

If your program is Fortran just the input file may be enough, but you may have to add all the INCLUDEd files and modules.