.. index:: double: compile; OmpSs .. _compile-ompss-programs: Compile OmpSs programs ====================== After the :ref:`installation-ompss` you can now start compiling OmpSs programs. For that end you have to use the Mercurium compiler (which you already installed as described in :ref:`mercurium-installation`). See sections :ref:`compile-ompss-cuda` and :ref:`compile-ompss-opencl` for more information. .. index:: double: Mercurium; drivers .. _mercurium-profiles: Drivers ------- The list of available drivers can be found here: https://github.com/bsc-pm/mcxx/blob/master/doc/md_pages/profiles.md .. index:: single: Mercurium; common flags Common flags accepted by Mercurium drivers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Usual flags like ``-O``, ``-O1``, ``-O2``, ``-O3``, ``-D``, ``-c``, ``-o``, ... are recognized by Mercurium. Almost every Mercurium-specific flag is of the form ``--xxx``. Mercurium drivers are deliberately compatible with ``gcc``. This means that flags of the form ``-fXXX``, ``-mXXX`` and ``-Wxxx`` are accepted and passed onto the backend compiler without interpretation by Mercurium drivers. .. warning:: In GCC a flag of the form ``-fXXX`` is equivalent to a flag of the form ``--XXX``. This is **not** the case in Mercurium. .. index:: single: Mercurium; help Help ^^^^ You can get a summary of all the flags accepted by Mercurium using ``--help`` with any of the drivers:: $ mcc --help Usage: mcc options file [file..] Options: -h, --help Shows this help and quits --version Shows version and quits --v, --verbose Runs verbosely, displaying the programs invoked by the compiler ... .. index:: single: Mercurium; vendor-specific flags Passing vendor-specific flags ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While almost every ``gcc`` of the form ``-fXXX`` or ``-mXXX`` can be passed directly to a Mercurium driver, some other vendor-specific flags may not be well known or be misunderstood by Mercurium. When this happens, Mercurium has a generic way to pass parameters to the backend compiler and linker. ``--Wn,`` Passes comma-separated flags to the native compiler. These flags are used when Mercurium invokes the backend compiler to generate the object file (``.o``) ``--Wl,`` Passes comma-separated-flags to the linker. These flags are used when Mercurium invokes the linker ``--Wp,`` Passes comma-separated flags to the C/Fortran preprocessor. These flags are used when Mercurium invokes the preprocessor on a C or Fortran file. These flags can be combined. Flags ``--Wp,a --Wp,b`` are equivalent to ``--Wp,a,b``. Flag ``--Wnp,a`` is equivalent to ``--Wn,a --Wp,a`` .. important:: Do not confuse ``--Wl`` and ``--Wp`` with the ``gcc`` similar flags ``-Wl`` and ``-Wp`` (note that ``gcc`` ones have a single ``-``). The latter can be used with the former, as in ``--Wl,-Wl,muldefs``. That said, Mercurium supports ``-Wl`` and ``-Wp`` directly, so ``-Wl,muldefs`` should be enough. .. index:: triple: compile; OmpSs; shared-memory Compiling an OmpSs program for shared-memory -------------------------------------------- For OmpSs programs that run in SMP or NUMA systems, you do not have to do anything. Just pick one of the drivers above. .. index:: single: OmpSs; first program Your first OmpSs program ^^^^^^^^^^^^^^^^^^^^^^^^ .. highlight:: c Following is a very simple OmpSs program in C:: /* test.c */ #include int main(int argc, char *argv[]) { int x = argc; #pragma omp task inout(x) { x++; } #pragma omp task in(x) { printf("argc + 1 == %d\n", x); } #pragma omp taskwait return 0; } .. highlight:: none Compile it using `mcc`:: $ mcc -o test --ompss test.c Nanos++ prerun Nanos++ phase .. important:: Do not forget the flag ``--ompss`` otherwise the OmpSs directives will be ignored. And run it:: $ ./test argc + 1 == 2 .. index:: triple: compile; OmpSs; CUDA .. _compile-ompss-cuda: Compiling an OmpSs program with CUDA tasks ------------------------------------------ To compile an OmpSs + CUDA program you must use one of the OmpSs/OmpSs-2 profiles (see :ref:`mercurium-profiles`) and the ``--cuda`` flag. Note that the Nanos++ installation that was provided to Mercurium must have been configured with CUDA (see :ref:`nanos-configure-flags`). The usual structure of an OmpSs program with CUDA tasks involves a set of C/C++ files (usually ``.c`` or ``.cpp``) and a set of CUDA files (``.cu``). You can use one of our profiles to compile the CUDA files, but the driver will simply call the NVIDIA Cuda Compiler ``nvcc``. We discourage mixing C/C++ code and CUDA code in the same file when using Mercurium. While this was supported in the past, such approach is regarded as deprecated. The recommended approach is to move the CUDA code to a ``.cu`` file. .. note:: When calling kernels from C, make sure the ``.cu`` file contains an ``extern "C"``, otherwise C++ mangling in the name of the kernels will cause the link step to fail. .. warning:: Fortran support for CUDA is experimental. Currently you cannot pass a ``.cu`` file to Mercurium: compile them separatedly using ``nvcc``. .. highlight:: cpp Consider the following CUDA kernels:: /* cuda-kernels.cu */ extern "C" { // We specify extern "C" because we will call them from a C code __global__ void init(int n, int *x) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i >= n) return; x[i] = i; } __global__ void increment(int n, int *x) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i >= n) return; x[i]++; } } /* extern "C" */ .. highlight:: c You can use the following OmpSs program in C to call them:: /* cuda-test.c */ #include #include #pragma omp target device(cuda) copy_deps ndrange(1, n, 1) #pragma omp task out(x[0 : n-1]) __global__ void init(int n, int *x); #pragma omp target device(cuda) copy_deps ndrange(1, n, 1) #pragma omp task inout(x[0 : n-1]) __global__ void increment(int n, int *x); int main(int argc, char *argv[]) { int x[10]; init(10, x); increment(10, x); #pragma omp taskwait int i; for (i = 0; i < 10; i++) { fprintf(stderr, "x[%d] == %d\n", i, x[i]); assert(x[i] == (i+1)); } return 0; } .. highlight:: none Compile these files using ``mcc``:: $ mcc -o cuda-test cuda-test.c cuda-kernels.cu --ompss --cuda cuda-test.c:5: note: adding task function 'init' cuda-test.c:9: note: adding task function 'increment' cuda-test.c:16: note: call to task function 'init' cuda-test.c:17: note: call to task function 'increment' cudacc_cuda-test.cu: info: enabling experimental CUDA support .. note:: You can compile this example separatedly, as usual: using ``-c`` and linking the intermediate ``.o`` files. Now you can run it:: $ ./cuda-test x[0] == 1 x[1] == 2 x[2] == 3 x[3] == 4 x[4] == 5 x[5] == 6 x[6] == 7 x[7] == 8 x[8] == 9 x[9] == 10 MSG: [12] GPU 0 TRANSFER STATISTICS MSG: [12] Total input transfers: 0 B MSG: [12] Total output transfers: 0 B MSG: [12] Total device transfers: 0 B MSG: [13] GPU 1 TRANSFER STATISTICS MSG: [13] Total input transfers: 0 B MSG: [13] Total output transfers: 40 B MSG: [13] Total device transfers: 0 B Note that due to the caching mechanism of Nanos++ only 40 bytes (10 integers) have been transferred from the GPU to the host. See :ref:`running-ompss-programs` for detailed information about options affecting the execution of OmpSs programs. .. See :ref:`ompss-language-syntax` for detailed information about the OmpSs .. syntax used in the example above. .. index:: triple: compile; OmpSs; OpenCL .. _compile-ompss-opencl: Compiling an OmpSs program with OpenCL tasks -------------------------------------------- To compile an OmpSs + OpenCL program you must use one of the OmpSs/OmpSs-2 profiles (see :ref:`mercurium-profiles`) and the ``--opencl`` flag. Note that the Nanos++ installation that was provided to Mercurium must have been configured with OpenCL (see :ref:`nanos-configure-flags`). .. highlight:: cpp Consider this OpenCL file with kernels similar to the CUDA example above:: /* ocl_kernels.cl */ __kernel void init(int n, int __global * x) { int i = get_global_id(0); if (i >= n) return; x[i] = i; } __kernel void increment(int n, int __global * x) { int i = get_global_id(0); if (i >= n) return; x[i]++; } We can call these kernels from the OmpSs program:: #include #include #pragma omp target device(opencl) copy_deps ndrange(1, n, 8) file(ocl_kernels.cl) #pragma omp task out(x[0 : n-1]) void init(int n, int *x); #pragma omp target device(opencl) copy_deps ndrange(1, n, 8) file(ocl_kernels.cl) #pragma omp task inout(x[0 : n-1]) void increment(int n, int *x); int main(int argc, char *argv[]) { int x[10]; init(10, x); increment(10, x); #pragma omp taskwait int i; for (i = 0; i < 10; i++) { fprintf(stderr, "x[%d] == %d\n", i, x[i]); assert(x[i] == (i+1)); } return 0; } .. note:: OpenCL files must be passed to the driver if they appear in ``file`` directives. This also applies to separate compilation using ``-c``. .. highlight:: none Compile the OmpSs program using ``mcc``:: $ mcc -o ocl_test ocl_test.c ocl_kernels.cl --ompss ocl_test.c:5: note: adding task function 'init' ocl_test.c:9: note: adding task function 'increment' ocl_test.c:18: note: call to task function 'init' ocl_test.c:19: note: call to task function 'increment' Now you can execute it:: $ ./ocl_test x[0] == 1 x[1] == 2 x[2] == 3 x[3] == 4 x[4] == 5 x[5] == 6 x[6] == 7 x[7] == 8 x[8] == 9 x[9] == 10 MSG: [12] OpenCL dev0 TRANSFER STATISTICS MSG: [12] Total input transfers: 0 B MSG: [12] Total output transfers: 0 B MSG: [12] Total dev2dev(in) transfers: 0 B MSG: [13] OpenCL dev1 TRANSFER STATISTICS MSG: [13] Total input transfers: 0 B MSG: [13] Total output transfers: 40 B MSG: [13] Total dev2dev(in) transfers: 0 B Again, due to the caching mechanism of Nanos++, only 40 bytes (10 integers) have been transferred from the OpenCL device (in the example above it is a GPU) to the host. .. index:: double: compilation; problems Problems during compilation --------------------------- While we put big efforts to make a reasonably robust compiler, you may encounter a bug or problem with Mercurium. There are several errors of different nature that you may run into. * Mercurium ends abnormally with an internal error telling you to open a ticket. * Mercurium does not crash but gives an error on your input code and compilation stops, as if your code were not valid. * Mercurium does not crash, but gives an error involving an ``internal-source``. * Mercurium generates wrong code and native compilation fails on an intermediate file. * Mercurium forgets something in the generated code and linking fails. How can you help us to solve the problem quicker? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In order for us to fix your problem we need the *preprocessed* file. If your program is C/C++ we need you to do: #. Figure out the compilation command of the file that fails to compile. Make sure you can replicate the problem using that compilation command alone. #. If your compilation command includes ``-c``, replace it by ``-E``. If it does not include ``-c`` simply add ``-E``. #. If your compilation command includes ``-o file`` (or ``-o file.o``) replace it by ``-o file.ii``. If it does not include ``-o``, simply add ``-o file.ii``. #. Now run the compiler with this modified compilation command. It should have generated a ``file.ii``. #. These files are usually very large. Please compress them with ``gzip`` (or ``bzip2`` or any similar tool). Send us an email to pm-tools@bsc.es with the error message you are experiencing and the (compressed) preprocessed file attached. If your program is Fortran just the input file may be enough, but you may have to add all the ``INCLUDE``\ d files and modules.