.. index::
    double: compile; OmpSs

.. _compile-ompss-programs:

Compile OmpSs programs
======================

After the :ref:`installation-ompss` you can now start compiling OmpSs programs. For that end you have to use the Mercurium compiler (which you already installed as described in
:ref:`mercurium-installation`).

See sections :ref:`compile-ompss-cuda` and :ref:`compile-ompss-opencl` for more information.
.. index::
    double: Mercurium; drivers

.. _mercurium-profiles:

Drivers
-------

The list of available drivers can be found here:
https://github.com/bsc-pm/mcxx/blob/master/doc/md_pages/profiles.md

.. index::
    single: Mercurium; common flags

Common flags accepted by Mercurium drivers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Usual flags like ``-O``, ``-O1``, ``-O2``, ``-O3``, ``-D``, ``-c``, ``-o``, ... are recognized by Mercurium.

Almost every Mercurium-specific flag is of the form ``--xxx``.

Mercurium drivers are deliberately compatible with ``gcc``. This means that flags
of the form ``-fXXX``, ``-mXXX`` and ``-Wxxx`` are accepted and passed onto the backend
compiler without interpretation by Mercurium drivers.

.. warning::
  In GCC a flag of the form ``-fXXX`` is equivalent to a flag of the form ``--XXX``. This is **not** the case in Mercurium.

.. index::
    single: Mercurium; help

Help
^^^^

You can get a summary of all the flags accepted by Mercurium using ``--help`` with any of the drivers::

  $ mcc --help
  Usage: mcc options file [file..]
  Options:
    -h, --help               Shows this help and quits
    --version                Shows version and quits
    --v, --verbose           Runs verbosely, displaying the programs
                             invoked by the compiler
  ...

.. index::
    single: Mercurium; vendor-specific flags

Passing vendor-specific flags
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

While almost every ``gcc`` of the form ``-fXXX`` or ``-mXXX`` can be passed
directly to a Mercurium driver, some other vendor-specific flags may not be
well known or be misunderstood by Mercurium. When this happens, Mercurium has a
generic way to pass parameters to the backend compiler and linker.

``--Wn,<comma-separated-list-of-flags>``
    Passes comma-separated flags to the native compiler. These flags are used when
    Mercurium invokes the backend compiler to generate the object file (``.o``)

``--Wl,<comma-separated-list-of-flags>``
    Passes comma-separated-flags to the linker. These flags are used when Mercurium
    invokes the linker

``--Wp,<comma-separated-list-of-flags>``
    Passes comma-separated flags to the C/Fortran preprocessor. These flags are used
    when Mercurium invokes the preprocessor on a C or Fortran file.

These flags can be combined. Flags ``--Wp,a --Wp,b`` are equivalent to
``--Wp,a,b``. Flag ``--Wnp,a`` is equivalent to ``--Wn,a --Wp,a``

.. important::
  Do not confuse ``--Wl`` and ``--Wp`` with the ``gcc`` similar flags ``-Wl``
  and ``-Wp`` (note that ``gcc`` ones have a single ``-``).  The latter can be
  used with the former, as in ``--Wl,-Wl,muldefs``. That said, Mercurium
  supports ``-Wl`` and ``-Wp`` directly, so ``-Wl,muldefs`` should be enough.

.. index::
    triple: compile; OmpSs; shared-memory

Compiling an OmpSs program for shared-memory
--------------------------------------------

For OmpSs programs that run in SMP or NUMA systems, you do not have to do anything. Just pick one of the drivers above.

.. index::
    single: OmpSs; first program

Your first OmpSs program
^^^^^^^^^^^^^^^^^^^^^^^^

.. highlight:: c

Following is a very simple OmpSs program in C::

  /* test.c */
  #include <stdio.h>

  int main(int argc, char *argv[])
  {
        int x = argc;
  #pragma omp task inout(x)
        {
            x++;
        }
  #pragma omp task in(x)
        {
            printf("argc + 1 == %d\n", x);
        }
  #pragma omp taskwait
        return 0;
  }

.. highlight:: none

Compile it using `mcc`::

  $ mcc -o test --ompss test.c
  Nanos++ prerun
  Nanos++ phase

.. important::
   Do not forget the flag ``--ompss`` otherwise the OmpSs directives will be ignored.

And run it::

  $ ./test
  argc + 1 == 2

.. index::
    triple: compile; OmpSs; CUDA

.. _compile-ompss-cuda:

Compiling an OmpSs program with CUDA tasks
------------------------------------------

To compile an OmpSs + CUDA program you must use one of the OmpSs/OmpSs-2 profiles
(see :ref:`mercurium-profiles`) and the ``--cuda`` flag. Note that the Nanos++
installation that was provided to Mercurium must have been configured with CUDA
(see :ref:`nanos-configure-flags`).

The usual structure of an OmpSs program with CUDA tasks involves a set of C/C++
files (usually ``.c`` or ``.cpp``) and a set of CUDA files (``.cu``).  You can
use one of our profiles to compile the CUDA files, but the driver will simply
call the NVIDIA Cuda Compiler ``nvcc``.  We discourage mixing C/C++ code and
CUDA code in the same file when using Mercurium. While this was supported in
the past, such approach is regarded as deprecated. The recommended approach is
to move the CUDA code to a ``.cu`` file.

.. note::
    When calling kernels from C, make sure the ``.cu`` file contains an ``extern "C"``, otherwise C++ mangling
    in the name of the kernels will cause the link step to fail.

.. warning::
    Fortran support for CUDA is experimental. Currently you cannot pass a ``.cu`` file to Mercurium: compile them
    separatedly using ``nvcc``.

.. highlight:: cpp

Consider the following CUDA kernels::

   /* cuda-kernels.cu */

   extern "C" {  // We specify extern "C" because we will call them from a C code

     __global__ void init(int n, int *x)
     {
         int i = blockIdx.x * blockDim.x + threadIdx.x;
         if (i >= n)
             return;
         x[i] = i;
     }

     __global__ void increment(int n, int *x)
     {
         int i = blockIdx.x * blockDim.x + threadIdx.x;
         if (i >= n)
             return;
         x[i]++;
     }

   } /* extern "C" */

.. highlight:: c

You can use the following OmpSs program in C to call them::

    /* cuda-test.c */

    #include <stdio.h>
    #include <assert.h>

    #pragma omp target device(cuda) copy_deps ndrange(1, n, 1)
    #pragma omp task out(x[0 : n-1])
    __global__ void init(int n, int *x);

    #pragma omp target device(cuda) copy_deps ndrange(1, n, 1)
    #pragma omp task inout(x[0 : n-1])
    __global__ void increment(int n, int *x);

    int main(int argc, char *argv[])
    {
        int x[10];

        init(10, x);
        increment(10, x);

    #pragma omp taskwait

        int i;
        for (i = 0; i < 10; i++)
        {
            fprintf(stderr, "x[%d] == %d\n", i, x[i]);
            assert(x[i] == (i+1));
        }

        return 0;
    }

.. highlight:: none

Compile these files using ``mcc``::

    $ mcc -o cuda-test cuda-test.c  cuda-kernels.cu  --ompss --cuda
    cuda-test.c:5: note: adding task function 'init'
    cuda-test.c:9: note: adding task function 'increment'
    cuda-test.c:16: note: call to task function 'init'
    cuda-test.c:17: note: call to task function 'increment'
    cudacc_cuda-test.cu: info: enabling experimental CUDA support

.. note::
    You can compile this example separatedly, as usual: using ``-c`` and linking the intermediate ``.o`` files.

Now you can run it::

    $ ./cuda-test
    x[0] == 1
    x[1] == 2
    x[2] == 3
    x[3] == 4
    x[4] == 5
    x[5] == 6
    x[6] == 7
    x[7] == 8
    x[8] == 9
    x[9] == 10
    MSG: [12] GPU 0 TRANSFER STATISTICS
    MSG: [12]     Total input transfers: 0 B
    MSG: [12]     Total output transfers: 0 B
    MSG: [12]     Total device transfers: 0 B
    MSG: [13] GPU 1 TRANSFER STATISTICS
    MSG: [13]     Total input transfers: 0 B
    MSG: [13]     Total output transfers: 40 B
    MSG: [13]     Total device transfers: 0 B

Note that due to the caching mechanism of Nanos++ only 40 bytes (10 integers)
have been transferred from the GPU to the host. See
:ref:`running-ompss-programs` for detailed information about options affecting
the execution of OmpSs programs.

.. See :ref:`ompss-language-syntax` for detailed information about the OmpSs
.. syntax used in the example above.

.. index::
    triple: compile; OmpSs; OpenCL

.. _compile-ompss-opencl:

Compiling an OmpSs program with OpenCL tasks
--------------------------------------------

To compile an OmpSs + OpenCL program you must use one of the OmpSs/OmpSs-2
profiles (see :ref:`mercurium-profiles`) and the ``--opencl`` flag. Note that
the Nanos++ installation that was provided to Mercurium must have been
configured with OpenCL (see :ref:`nanos-configure-flags`).

.. highlight:: cpp

Consider this OpenCL file with kernels similar to the CUDA example above::

    /* ocl_kernels.cl */
    __kernel void init(int n, int  __global * x)
    {
        int i = get_global_id(0);
        if (i >= n)
            return;
        x[i] = i;
    }

    __kernel void increment(int n, int  __global * x)
    {
        int i = get_global_id(0);
        if (i >= n)
            return;
        x[i]++;
    }


We can call these kernels from the OmpSs program::

    #include <stdio.h>
    #include <assert.h>

    #pragma omp target device(opencl) copy_deps ndrange(1, n, 8) file(ocl_kernels.cl)
    #pragma omp task out(x[0 : n-1])
    void init(int n, int *x);

    #pragma omp target device(opencl) copy_deps ndrange(1, n, 8) file(ocl_kernels.cl)
    #pragma omp task inout(x[0 : n-1])
    void increment(int n, int *x);

    int main(int argc, char *argv[])
    {
        int x[10];

        init(10, x);
        increment(10, x);

    #pragma omp taskwait

        int i;
        for (i = 0; i < 10; i++)
        {
            fprintf(stderr, "x[%d] == %d\n", i, x[i]);
            assert(x[i] == (i+1));
        }

        return 0;
    }

.. note::
    OpenCL files must be passed to the driver if they appear in ``file`` directives. This also
    applies to separate compilation using ``-c``.

.. highlight:: none

Compile the OmpSs program using ``mcc``::

    $ mcc  -o ocl_test ocl_test.c ocl_kernels.cl --ompss
    ocl_test.c:5: note: adding task function 'init'
    ocl_test.c:9: note: adding task function 'increment'
    ocl_test.c:18: note: call to task function 'init'
    ocl_test.c:19: note: call to task function 'increment'

Now you can execute it::

    $ ./ocl_test
    x[0] == 1
    x[1] == 2
    x[2] == 3
    x[3] == 4
    x[4] == 5
    x[5] == 6
    x[6] == 7
    x[7] == 8
    x[8] == 9
    x[9] == 10
    MSG: [12] OpenCL dev0 TRANSFER STATISTICS
    MSG: [12]     Total input transfers: 0 B
    MSG: [12]     Total output transfers: 0 B
    MSG: [12]     Total dev2dev(in) transfers: 0 B
    MSG: [13] OpenCL dev1 TRANSFER STATISTICS
    MSG: [13]     Total input transfers: 0 B
    MSG: [13]     Total output transfers: 40 B
    MSG: [13]     Total dev2dev(in) transfers: 0 B


Again, due to the caching mechanism of Nanos++, only 40 bytes (10 integers)
have been transferred from the OpenCL device (in the example above it is a GPU)
to the host.

.. index::
    double: compilation; problems

Problems during compilation
---------------------------

While we put big efforts to make a reasonably robust compiler, you may encounter a bug or problem with Mercurium.

There are several errors of different nature that you may run into.

* Mercurium ends abnormally with an internal error telling you to open a ticket.
* Mercurium does not crash but gives an error on your input code and compilation stops, as if your code were not valid.
* Mercurium does not crash, but gives an error involving an ``internal-source``.
* Mercurium generates wrong code and native compilation fails on an intermediate file.
* Mercurium forgets something in the generated code and linking fails.

How can you help us to solve the problem quicker?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In order for us to fix your problem we need the *preprocessed* file.

If your program is C/C++ we need you to do:

#. Figure out the compilation command of the file that fails to compile. Make sure you can replicate the problem using that compilation command alone.
#. If your compilation command includes ``-c``, replace it by ``-E``. If it does not include ``-c`` simply add ``-E``.
#. If your compilation command includes ``-o file`` (or ``-o file.o``) replace it by ``-o file.ii``. If it does not include ``-o``, simply add ``-o file.ii``.
#. Now run the compiler with this modified compilation command. It should have generated a ``file.ii``.
#. These files are usually very large. Please compress them with ``gzip`` (or ``bzip2`` or any similar tool).

Send us an email to pm-tools@bsc.es with the error message you are experiencing and the (compressed) preprocessed file attached.

If your program is Fortran just the input file may be enough, but you may have to add all the ``INCLUDE``\ d files and modules.