<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<channel>
		<title>Programming Models @ BSC</title>
		<link>https://pm.bsc.es</link>
		<atom:link href="https://pm.bsc.es/feed.xml" rel="self" type="application/rss+xml" />
		
			<item>
				<title>Tutorial at HiPEAC2019: Heterogeneous Parallel Programming with OmpSs</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Place&lt;/em&gt;: Valencia, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: January 23rd, 2019 (associated to HiPEAC Conference 2019)&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Martorell and Xavier Teruel&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;abstract&quot;&gt;Abstract&lt;/h4&gt;

&lt;p&gt;OmpSs is a task-based programming model developed at BSC that we use as a forerunner for OpenMP.
Like OpenMP, it is based on compiler directives.
It is the base platform where we have developed OpenMP tasking, support for dependences, priorities, task reductions, support for heterogeneous devices, and our last addition is the support for application acceleration on FPGAs.&lt;/p&gt;

&lt;p&gt;In this tutorial we are going to learn how to program using OmpSs, and its heterogeneous architectures support.
We will introduce the OmpSs basic concepts related to task-based parallelism for the SMP cores and then quickly move to the support for heterogeneous devices.
OmpSs supports offloading tasks to a variety of accelerators, including CUDA and OpenCL GPUs, and also FPGAs using High-Level Synthesis (HLS) from vendors.
OmpSs facilitates programming because it leverages existing OpenCL and CUDA kernels without the burden to have to deal with data copies to/from the devices.
Data copies are just triggered automatically by the OmpSs runtime, based on the task data dependence annotations.
On the FPGAs environment with HLS, plain C/C++ applications can offload kernels to the FPGA.&lt;/p&gt;

&lt;p&gt;OmpSs for FPGA devices is the result of our work at the AXIOM, EuroEXA and Legato European Projects.
We will also show how the same directives are being used to outline code that can be compiled, run on FPGA devices, and analyzed with the BSC analysis tool Paraver thanks to the internal FPGA tracing facilities.&lt;/p&gt;

&lt;p&gt;The tutorial will include two laboratory sessions.
We will provide student accounts to attendees in our Minotauro machine (Intel-based with NVidia GPUs), and several exercises will be provided to be completed online (cholesky, matrix multiplication, nbody, 3d-stencil, merge-sort, histogram…), and learn better the details of the OmpSs support for both the SMP and heterogeneous architectures.&lt;/p&gt;

&lt;h4 id=&quot;agenda&quot;&gt;Agenda&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;08.30h – Introduction to the OmpSs Programming Model. Basic directives and support for heterogeneous systems&lt;/li&gt;
  &lt;li&gt;10:00h – — Coffee Break —&lt;/li&gt;
  &lt;li&gt;10:30h – Hands-on - OmpSs with CUDA and OpenCL support&lt;/li&gt;
  &lt;li&gt;12:00h – — Lunch —&lt;/li&gt;
  &lt;li&gt;13:30h –  OmpSs with support for FPGA devices&lt;/li&gt;
  &lt;li&gt;15:00h – — Coffee Break —&lt;/li&gt;
  &lt;li&gt;15:30h – Hands-on - OmpSs @FPGA&lt;/li&gt;
  &lt;li&gt;17:00h – End of the tutorial&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Tue, 09 Oct 2018 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2018/10/09/tutorial-hipeac2019.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2018/10/09/tutorial-hipeac2019.html</guid>
			</item>
		
			<item>
				<title>Parallel Programming Workshop</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: Barcelona, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: October 18-20th, 2018&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Teruel &amp;amp; Xavier Martorell&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;description&quot;&gt;Description&lt;/h2&gt;

&lt;p&gt;The objectives of this course are to understand the fundamental concepts
supporting message-passing and shared memory programming models. The course
covers the two widely used programming models: MPI for the distributed-memory
environments, and OpenMP for the shared-memory architectures. It also presents
the main tools developed at BSC to get information and analyze the execution of
parallel applications, Paraver and Extrae. Moreover it sets the basic
foundations related with task decomposition and parallelization inhibitors,
using a tool to analyze potential parallelism and dependences, Tareador.&lt;/p&gt;

&lt;h2 id=&quot;agenda&quot;&gt;Agenda&lt;/h2&gt;

&lt;p&gt;Day 1 (Wednesday) 2:00 pm - 5:30 pm:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Shared-memory programming models, OpenMP fundamentals&lt;/li&gt;
  &lt;li&gt;Parallel regions and work sharing constructs&lt;/li&gt;
  &lt;li&gt;Synchronization mechanisms in OpenMP&lt;/li&gt;
  &lt;li&gt;Practical: heat diffusion in OpenMP&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Day 2 (Thursday) 9:30am – 1:00 pm:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Tasking in OpenMP 3.0/4.0/4.5&lt;/li&gt;
  &lt;li&gt;Programming using a hybrid MPI/OpenMP approach&lt;/li&gt;
  &lt;li&gt;Practical: multisort in OpenMP and hybrid MPI/OpenMP&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Day 2 (Thursday) 2:00am – 5:30 pm:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Parallware: guided parallelization&lt;/li&gt;
  &lt;li&gt;Practical session with Parallware examples&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Day 3 (Friday) 9:30 am – 1:00 pm:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Introduction to the OmpSs programming model&lt;/li&gt;
  &lt;li&gt;Practical: heat equation example and divide-and-conquer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Day 3 (Friday)  2:00pm – 5:30 pm&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Programming using a hybrid MPI/OmpSs approach&lt;/li&gt;
  &lt;li&gt;Practical: heat equation example and divide-and-conquer&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;external-references&quot;&gt;External references&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://www.bsc.es/education/training/patc-courses&quot;&gt;https://www.bsc.es/education/training/patc-courses&lt;/a&gt;&lt;/p&gt;
</description>
				<pubDate>Mon, 27 Aug 2018 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2018/08/27/parallel-programming.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2018/08/27/parallel-programming.html</guid>
			</item>
		
			<item>
				<title>Heterogeneous Parallel Programming with OmpSs (at PACT 2018, Cyprus)</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Place&lt;/em&gt;: Limassol, CYPRUS&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: November 4th, 2018 (associated to PACT 2018)&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Martorell&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;abstract&quot;&gt;Abstract&lt;/h4&gt;

&lt;p&gt;OmpSs is a task-based programming model developed at BSC that we use as a forerunner for OpenMP.
Like OpenMP, it is based on compiler directives.
It is the base platform where we have developed OpenMP tasking, support for dependences, priorities, task reductions, support for heterogeneous devices, and our last addition is the support for application acceleration on FPGAs.&lt;/p&gt;

&lt;p&gt;In this tutorial we are going to learn how to program using OmpSs, and its heterogeneous architectures support.
We will introduce the OmpSs basic concepts related to task-based parallelism for the SMP cores and then quickly move to the support for heterogeneous devices.
OmpSs supports offloading tasks to a variety of accelerators, including CUDA and OpenCL GPUs, and also FPGAs using High-Level Synthesis (HLS) from vendors.
OmpSs facilitates programming because it leverages existing OpenCL and CUDA kernels without the burden to have to deal with data copies to/from the devices.
Data copies are just triggered automatically by the OmpSs runtime, based on the task data dependence annotations.
On the FPGAs environment with HLS, plain C/C++ applications with the OmpSs annotations offload kernels to the FPGA.&lt;/p&gt;

&lt;p&gt;OmpSs for FPGA devices is the result of our work at the AXIOM, EuroEXA and Legato European Projects.
We will also show how the same directives are being used to outline code that can be compiled, run on FPGA devices, and analyzed with the BSC analysis tool Paraver thanks to the internal FPGA tracing facilities.&lt;/p&gt;

&lt;p&gt;The tutorial will include two laboratory sessions.
We will provide student accounts to attendees in our Minotauro machine (Intel-based with NVidia GPUs), and several exercises will be provided to be completed online (cholesky, matrix multiplication, nbody, 3d-stencil, merge-sort, histogram…), and learn better the details of the OmpSs support for both the SMP and heterogeneous architectures.&lt;/p&gt;
</description>
				<pubDate>Thu, 21 Jun 2018 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2018/06/21/heterogeneous-pact-2018.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2018/06/21/heterogeneous-pact-2018.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at PUMPS 2018</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Place&lt;/em&gt;: Barcelona, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: July 20th, 2018&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Martorell and Xavier Teruel&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;

&lt;p&gt;The ninth edition of the Programming and Tuning Massively Parallel Systems + Artificial Intelligence summer school (PUMPS+AI) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.&lt;/p&gt;

&lt;h4 id=&quot;important-dates&quot;&gt;Important dates&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Applications due: May 31, 2018&lt;/li&gt;
  &lt;li&gt;Notification of acceptance: June 12, 2018&lt;/li&gt;
  &lt;li&gt;Hackathon day: 15 July (only for selected applicants)&lt;/li&gt;
  &lt;li&gt;Summer school: 16-20 July&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More info&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://pumps.bsc.es/2018/&quot;&gt;PUMPS Website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Thu, 10 May 2018 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2018/05/10/tutorial-pumps.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2018/05/10/tutorial-pumps.html</guid>
			</item>
		
			<item>
				<title>Heterogeneous Programming on GPUs with MPI + OmpSs</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: BSC, Barcelona, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: May 9-10th, 2018&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Martorell and Xavier Teruel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs.&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;

&lt;p&gt;More specifically, the tutorial will introduce the hybrid MPI/OmpSs parallel programming model for future exascale systems and it will demonstrate how to use MPI/OmpSs to incrementally parallelize/optimize: 1) MPI applications on clusters of SMPs, and 2) Leverage CUDA kernels with OmpSs on clusters of GPUs&lt;/p&gt;

&lt;h4 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Good knowledge of C/C++&lt;/li&gt;
  &lt;li&gt;Basic knowledge of CUDA/OpenCL&lt;/li&gt;
  &lt;li&gt;Basic knowledge of Paraver/Extrae&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;learning-outcomes&quot;&gt;Learning Outcomes&lt;/h4&gt;

&lt;p&gt;The students who finish this course will be able to develop benchmarks and simple applications with the MPI/OmpSs programming model to be executed in clusters of GPUs.&lt;/p&gt;

&lt;h4 id=&quot;agenda&quot;&gt;Agenda&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;Day 1
    &lt;ul&gt;
      &lt;li&gt;09.00h – Introduction to OmpSs&lt;/li&gt;
      &lt;li&gt;11.30h – OmpSs single node programming hands-on&lt;/li&gt;
      &lt;li&gt;13.00h – Lunch Break&lt;/li&gt;
      &lt;li&gt;14.00h – More on OmpSs: GPU/CUDA programming&lt;/li&gt;
      &lt;li&gt;15.00h – OmpSs single node programming hands-on with GPUs&lt;/li&gt;
      &lt;li&gt;17.30h – Adjourn&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Day 2
    &lt;ul&gt;
      &lt;li&gt;09.00h – Introduction to MPI/OmpSs&lt;/li&gt;
      &lt;li&gt;10.00h – MPI/OmpSs hands-on&lt;/li&gt;
      &lt;li&gt;13.00h – Lunch Break&lt;/li&gt;
      &lt;li&gt;14.00h – Free hands-on session&lt;/li&gt;
      &lt;li&gt;17.30h – Adjourn&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;external-links&quot;&gt;External links&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.bsc.es/education/training/patc-courses/patc-heterogeneous-programming-gpus-mpi-ompss&quot;&gt;BSC Website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Tue, 03 Apr 2018 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2018/04/03/heterogeneous-programming.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2018/04/03/heterogeneous-programming.html</guid>
			</item>
		
			<item>
				<title>Release DLB 2.0</title>
				<description>&lt;p&gt;New release of the DLB library (2.0) offering a new asynchronous API for runtime systems and the novel DROM module that provides a public API to manage the computational resources assigned to running processes&lt;/p&gt;

&lt;figure class=&quot;post-figure-small post-figure-centered&quot;&gt;
  &lt;img src=&quot;/files/dlb/logo.png&quot; alt=&quot;DLB logo&quot; /&gt;
&lt;/figure&gt;

&lt;p&gt;The Computer Sciences department at BSC is proud to announce the release of DLB 2.0 the Dynamic Load Balance library improves the load balance of parallel applications at runtime. “Fixing load imbalance on applications is not only important to improve a single application’s performance, but it is also key to boost the utilization of supercomputing systems” says Marta Garcia, lead scientist of the DLB tool.&lt;/p&gt;

&lt;p&gt;“DLB has a tremendous potential to address for free the imbalance issues in hybrid applications that would otherwise require significant refactoring efforts” says Jesús Labarta, Computer Science department director.&lt;/p&gt;

&lt;p&gt;DLB is today helping to improve balance in different European projects, such as the Human Brain Project, HPC Europa 3, MontBlanc 3, and it is used for a wide range of applications of different domains, like for instance neuroscience, computational mechanics, molecular dynamics, cosmological simulations or climate modeling.&lt;/p&gt;

&lt;p&gt;“DLB is our preferred tool to mitigate imbalances occurring on Alya executions. These imbalances appear spontaneously or come from inaccurate load distributions. DLB solves both problems at runtime, acting only when necessary, making our code much more resilient for modern HPC systems. We save millions CPU hours every year by using DLB” says Ricard Borrell, Senior Researcher at BSC’s CASE department.&lt;/p&gt;

&lt;p&gt;The newest version of DLB (2.0) offers a new module called the DROM module, which allows external entities (e.g. Job scheduler, resource manager), to request a change of resources for a running process. Thus, now the DLB library is organized in two different modules, LeWI and DROM, that are independent between them but can work coordinated.&lt;/p&gt;

&lt;p&gt;More specifically, in this new version, apart from several bug fixes, we have introduced the following new features:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;DROM (Dynamic Resource Ownership Management) module.&lt;/strong&gt;&lt;br /&gt;
    DROM offers an API for external entities (i.e. Job Scheduler, Resource Manager…), it allows to remove CPUs from a running process to assign them to a new process or an existing one.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Asynchronous version of LeWI (Lend When Idle) load balancing algorithm.&lt;/strong&gt;&lt;br /&gt;
    The load balancing algorithm LeWI can work in a synchronous and asynchronous mode. The new asynchronous mode provides an interaction between the runtime and DLB without polling.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;New DLB public API&lt;/strong&gt;&lt;br /&gt;
    - More clear, with the unification of names&lt;br /&gt;
    - More exhaustive, supporting more use cases&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Callback system for parallel runtimes&lt;/strong&gt;&lt;br /&gt;
    The callback system allows registering functions as callbacks for DLB actions, providing a friendly interface for integrating new parallel runtimes with DLB.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Support for interoperability of multiple runtimes&lt;/strong&gt;&lt;br /&gt;
    DLB provides support for several parallel runtimes within the same process sharing computational resources.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;New mechanism to set DLB options based in DLB_ARGS environment variable&lt;/strong&gt;&lt;br /&gt;
    Now all the options passed to DLB are contained in an environment variable, facilitating the configuration of DLB and the detection of errors when setting of options.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can freely download DLB (distributed as open source under LGPL-3.0 license) and get more information at &lt;a href=&quot;/dlb&quot;&gt;DLB’s website&lt;/a&gt;&lt;/p&gt;
</description>
				<pubDate>Thu, 21 Dec 2017 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2017/12/21/release-dlb.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2017/12/21/release-dlb.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at PUMPS 2017</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Barcelona, SPAIN&lt;br /&gt;
&lt;strong&gt;Event date:&lt;/strong&gt; June 30th, 2017&lt;br /&gt;
&lt;strong&gt;Speakers:&lt;/strong&gt; Xavier Martorell &amp;amp; Xavier Teruel&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;CONTENTS&lt;/h4&gt;
&lt;p&gt;The eighth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.&lt;/p&gt;

&lt;h4 id=&quot;important-dates&quot;&gt;IMPORTANT DATES&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;Applications due: April 30, 2017 (Due to space limitations, early application is strongly recommended. You may also be suggested to attend an online prerequisite training on basic CUDA programming before joining PUMPS).&lt;/li&gt;
  &lt;li&gt;Notification of acceptance: May 15, 2017&lt;/li&gt;
  &lt;li&gt;Summer school: June 26-30, 2017&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;more-info&quot;&gt;MORE INFO&lt;/h4&gt;
&lt;p&gt;Summer School &lt;a href=&quot;http://bcw.ac.upc.edu/PUMPS2017/&quot;&gt;website&lt;/a&gt;&lt;/p&gt;
</description>
				<pubDate>Wed, 19 Apr 2017 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2017/04/19/ompss-tutorial-at-pumps-2017.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2017/04/19/ompss-tutorial-at-pumps-2017.html</guid>
			</item>
		
			<item>
				<title>Basic Programming of Multicore and Many-Core Processors for Image and Video Processing</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Barcelona, SPAIN&lt;br /&gt;
&lt;strong&gt;Event date:&lt;/strong&gt; June 22-23th, 2017&lt;br /&gt;
&lt;strong&gt;Speakers:&lt;/strong&gt; Juan Gómez Luna&lt;/p&gt;

&lt;h4 id=&quot;objectives&quot;&gt;OBJECTIVES&lt;/h4&gt;
&lt;p&gt;This course is delivered by the GPU Center of Excellence (GCOE) awarded by NVIDIA to the Barcelona Supercomputing Center (BSC) in association with the Universitat Politecnica de Catalunya (UPC) as a Severo Ochoa workshop.&lt;br /&gt;
The course will present the parallelization of several widely-known image and video processing algoriths such as color space conversion, gaussian filtering, and histogramming. Its aim is to be an initial approach to parallel programming to those who may be interested in the potential parallelization of the applications they work with.&lt;br /&gt;
Current processors can be classified into multicore and many-core processors, depending on the number of available cores. Among the many-core processors, GPUs are the most popular. Both multicores and many-cores are suitable for exploiting the inherent parallelism in many applications. This way, they can speed up these applications, in order to achieve certain requirements (for instance, real-time performance in image and video applications). The aim of this course is to serve as an initial approach to parallel programming to those who may be interested in the potential parallelization of the applications they work with. We will use several widely-known image and video processing algorithms as case studies: color space conversion, gaussian filtering, histogramming… By taking advantage of the data parallelism available in these algorithms, we will introduce OpenMP for programming multicore processors and CUDA for GPUs. The course will be eminently practical with seven hands-on labs.&lt;/p&gt;

&lt;h4 id=&quot;agenda&quot;&gt;AGENDA&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;Day 1
    &lt;ul&gt;
      &lt;li&gt;09:00 Parallel computing: OpenMP and CUDA&lt;/li&gt;
      &lt;li&gt;10:45 Coffee break&lt;/li&gt;
      &lt;li&gt;11:15 Hands-on lab 1: Brightness adjustment&lt;/li&gt;
      &lt;li&gt;13:00 Lunch break&lt;/li&gt;
      &lt;li&gt;14:00 Hands-on lab 2: RGB to YUV conversion&lt;/li&gt;
      &lt;li&gt;15:45 Coffee break&lt;/li&gt;
      &lt;li&gt;16:15 Hands-on lab 3: Gaussian filter&lt;/li&gt;
      &lt;li&gt;18:00 Adjourn&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Day 2
    &lt;ul&gt;
      &lt;li&gt;09:00 Hands-on lab 4: Your own filter&lt;/li&gt;
      &lt;li&gt;10:45 Coffee break&lt;/li&gt;
      &lt;li&gt;11:15 Hands-on lab 5: Histogram calculation&lt;/li&gt;
      &lt;li&gt;13:00 Lunch break&lt;/li&gt;
      &lt;li&gt;14:00 Hands-on lab 6: Edge Detection&lt;/li&gt;
      &lt;li&gt;15:45 Coffee break&lt;/li&gt;
      &lt;li&gt;16:15 Hands-on lab 7: Asynchronous transfers&lt;/li&gt;
      &lt;li&gt;18:00 Adjourn&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;more-info&quot;&gt;MORE INFO&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://www.bsc.es/education/training/cuda-training/basic-programming-multicore-and-many-core-processors-image-and-video-processing&quot;&gt;https://www.bsc.es/education/training/cuda-training/basic-programming-multicore-and-many-core-processors-image-and-video-processing&lt;/a&gt;&lt;/p&gt;
</description>
				<pubDate>Tue, 18 Apr 2017 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2017/04/18/basic-programming-of-multicore-and-many-core-processors-for-image-and-video-processing.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2017/04/18/basic-programming-of-multicore-and-many-core-processors-for-image-and-video-processing.html</guid>
			</item>
		
			<item>
				<title>Heterogeneous Programming on GPUs with MPI + OmpSs</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Barcelona, SPAIN&lt;br /&gt;
&lt;strong&gt;Event date:&lt;/strong&gt; May 10-11th, 2017&lt;br /&gt;
&lt;strong&gt;Speakers:&lt;/strong&gt; Xavier Martorell &amp;amp; Xavier Teruel&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;CONTENTS&lt;/h4&gt;
&lt;p&gt;The tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs.&lt;br /&gt;
More specifically, the tutorial will:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Introduce the hybrid MPI/OmpSs parallel programming model for future exascale systems&lt;/li&gt;
  &lt;li&gt;Demonstrate how to use MPI/OmpSs to incrementally parallelize/optimize:
    &lt;ul&gt;
      &lt;li&gt;MPI applications on clusters of SMPs, and&lt;/li&gt;
      &lt;li&gt;Leverage CUDA kernels with OmpSs on clusters of GPUs&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;more-info&quot;&gt;MORE INFO&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://events.prace-ri.eu/event/540/&quot;&gt;https://events.prace-ri.eu/event/540/&lt;/a&gt;&lt;/p&gt;
</description>
				<pubDate>Mon, 10 Apr 2017 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2017/04/10/heterogeneous-programming-on-gpus-with-mpi-ompss.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2017/04/10/heterogeneous-programming-on-gpus-with-mpi-ompss.html</guid>
			</item>
		
			<item>
				<title>Heterogeneous Programming on GPUs with MPI + OmpSs</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: Barcelona, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: May 10-11th, 2017&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Martorell&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;

&lt;p&gt;The tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs.&lt;/p&gt;

&lt;p&gt;More specifically, the tutorial will:
Introduce the hybrid MPI/OmpSs parallel programming model for future exascale systems
Demonstrate how to use MPI/OmpSs to incrementally parallelize/optimize:
MPI applications on clusters of SMPs, and
Leverage CUDA kernels with OmpSs on clusters of GPUs&lt;/p&gt;

&lt;h4 id=&quot;agenda&quot;&gt;Agenda&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Day 1
    &lt;ul&gt;
      &lt;li&gt;09.00h – Introduction to OmpSs&lt;/li&gt;
      &lt;li&gt;11.30h – OmpSs single node programming hands-on&lt;/li&gt;
      &lt;li&gt;13.00h – Lunch Break&lt;/li&gt;
      &lt;li&gt;14.00h – More on OmpSs: GPU/CUDA programming&lt;/li&gt;
      &lt;li&gt;15.00h – OmpSs single node programming hands-on with GPUs&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Day 2
    &lt;ul&gt;
      &lt;li&gt;09.00h – Introduction  to MPI/OmpSs&lt;/li&gt;
      &lt;li&gt;10.00h – MPI/OmpSs hands-on&lt;/li&gt;
      &lt;li&gt;13.00h – Lunch Break&lt;/li&gt;
      &lt;li&gt;14.00h – Free hands-on session&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;external-references&quot;&gt;External references&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.bsc.es/education/training/patc-courses/patc-course-heterogeneous-programming-gpus-mpi-ompss-2&quot;&gt;BSC Website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Fri, 10 Mar 2017 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2017/03/10/heterogeneous-programming.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2017/03/10/heterogeneous-programming.html</guid>
			</item>
		
			<item>
				<title>PATC Introduction to OpenACC</title>
				<description>&lt;p&gt;&lt;strong&gt;Venue:&lt;/strong&gt; Barcelona, SPAIN&lt;br /&gt;
&lt;strong&gt;Event date:&lt;/strong&gt; April 27-28, 2017&lt;br /&gt;
&lt;strong&gt;Speaker:&lt;/strong&gt; Antonio J. Peña&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;
&lt;p&gt;This is an expansion of the topic “OpenACC and other approaches to GPU computing” covered on this year’s and last year’s editions of the Introduction to CUDA Programming. This course is delivered by the GPU Center of Excellence (GCOE) awarded by NVIDIA to the Barcelona Supercomputing Center (BSC) in association with Universitat Politecnica de Catalunya (UPC). It will provide very good introduction to the PUMPS Summer School run jointly with NVIDIA - 26 - 30 June also at Campus Nord, Barcelona. For further information visit the school website. You may be also interested in our new course: Basic Programming Multicore and many-core processors image and video processing.&lt;/p&gt;

&lt;h4 id=&quot;agenda&quot;&gt;Agenda&lt;/h4&gt;
&lt;p&gt;Agenda to be announced shortly: &lt;a href=&quot;https://www.bsc.es/education/training/patc-courses/patc-introduction-openacc/agenda&quot;&gt;https://www.bsc.es/education/training/patc-courses/patc-introduction-openacc/agenda&lt;/a&gt;&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://www.bsc.es/education/training/patc-courses/patc-introduction-openacc&quot;&gt;https://www.bsc.es/education/training/patc-courses/patc-introduction-openacc&lt;/a&gt;&lt;/p&gt;
</description>
				<pubDate>Tue, 07 Mar 2017 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2017/03/07/patc-introduction-to-openacc.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2017/03/07/patc-introduction-to-openacc.html</guid>
			</item>
		
			<item>
				<title>Introduction to CUDA Programming</title>
				<description>&lt;div&gt;&lt;strong&gt;Venue&lt;/strong&gt;: Barcelona, SPAIN&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Event date&lt;/strong&gt;: April 18-21, 2017&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Speaker&lt;/strong&gt;: Manuel Ujaldon&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;CONTENTS&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors.&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;AGENDA&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;Day 1 (April 18th)&lt;/div&gt;
&lt;div&gt;09:00 The GPU hardware: Many-core Nvidia developments&lt;/div&gt;
&lt;div&gt;10:45 Coffee break&lt;/div&gt;
&lt;div&gt;11:15 CUDA Programming: Threads, blocks, kernels, grids&lt;/div&gt;
&lt;div&gt;13:00 Lunch break&lt;/div&gt;
&lt;div&gt;14:00 CUDA Tools: Compiling, debugging, profiling, occupancy calculator&lt;/div&gt;
&lt;div&gt;15:45 Coffee break&lt;/div&gt;
&lt;div&gt;16:15 CUDA Examples(1): VectorAdd, Stencil,&lt;/div&gt;
&lt;div&gt;18:00 Adjourn&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;Day 2 (April 19th)&lt;/div&gt;
&lt;div&gt;09:00 CUDA Examples(2): Matrices Multiply. Assorted optimizations&lt;/div&gt;
&lt;div&gt;10:45 Coffee break&lt;/div&gt;
&lt;div&gt;11:15 CUDA Examples(3): Dynamic parallelism, Hyper-Q, unified memory&lt;/div&gt;
&lt;div&gt;13:00 Lunch break&lt;/div&gt;
&lt;div&gt;14:00 Hands-on Lab 1&lt;/div&gt;
&lt;div&gt;15:45 Coffee break&lt;/div&gt;
&lt;div&gt;16:15 Hands-on Lab 2&lt;/div&gt;
&lt;div&gt;18:00 Adjourn&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;Day 3 (April 20th)&lt;/div&gt;
&lt;div&gt;09:00 Inside Pascal: Multiprocessors, stacked memory, NV-link&lt;/div&gt;
&lt;div&gt;10:45 Coffee break&lt;/div&gt;
&lt;div&gt;11:15 OpenACC and other approaches to GPU computing&lt;/div&gt;
&lt;div&gt;13:00 Lunch break&lt;/div&gt;
&lt;div&gt;14:00 Hands-on Lab 3&lt;/div&gt;
&lt;div&gt;15:45 Coffee break&lt;/div&gt;
&lt;div&gt;16:15 Hands-on Lab 4&lt;/div&gt;
&lt;div&gt;18:00 Adjourn&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;Day 4 (April 21st)&lt;/div&gt;
&lt;div&gt;09:00 Hands-on Lab 5&lt;/div&gt;
&lt;div&gt;10:45 Coffee break&lt;/div&gt;
&lt;div&gt;11:15 Hands-on Lab 6&lt;/div&gt;
&lt;div&gt;13:00 Lunch break&lt;/div&gt;
&lt;div&gt;14:00 Hands-on Lab 7&lt;/div&gt;
&lt;div&gt;15:45 Coffee break&lt;/div&gt;
&lt;div&gt;16:15 Free Hands-on Lab&lt;/div&gt;
&lt;div&gt;18:00 Adjourn&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;MORE INFO&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;https://www.bsc.es/education/training/patc-courses/introduction-cuda-programming&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
</description>
				<pubDate>Tue, 07 Mar 2017 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2017/03/07/introduction-to-cuda-programming.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2017/03/07/introduction-to-cuda-programming.html</guid>
			</item>
		
			<item>
				<title>GPU Programming Models and their Combinations</title>
				<description>&lt;div&gt;&lt;strong style=&quot;font-size: 16.26px;&quot;&gt;Venue&lt;/strong&gt;&lt;span style=&quot;font-size: 16.26px;&quot;&gt;: Cordoba, SPAIN&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Event date&lt;/strong&gt;: April 21, 2017&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Speaker&lt;/strong&gt;: Antonio J. Peña&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;CONTENTS&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;The aim of these courses is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;MORE INFO&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;While OpenACC focuses on coding productivity and portability, CUDA enables extracting the maximum performance from NVIDIA GPUs. OmpSs, on the other hand, is a GPU-aware task-based programming model which may be combined with CUDA, and recently with OpenACC as well. Using OpenACC we will start benefiting from GPU computing, obtaining great coding productivity and nice performance improvements. We can next fine-tune the critical application parts developing CUDA kernels to hand-optimize the problem. OmpSs combined with either OpenACC or CUDA will enable seamless task parallelism leveraging all system devices.&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;Web site: http://www.uco.es/~el1goluj/cuda_teaching_center.html&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
</description>
				<pubDate>Tue, 07 Mar 2017 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2017/03/07/gpu-programming-models-and-their-combinations.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2017/03/07/gpu-programming-models-and-their-combinations.html</guid>
			</item>
		
			<item>
				<title>UAM Course: Parallel Programming</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: Madrid, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: November 2-4th, 2016&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Teruel &amp;amp; Xavier Martorell&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;contents&quot;&gt;CONTENTS&lt;/h4&gt;

&lt;p&gt;Los objetivos del curso son entender y practicar conceptos fundamentales sobre
programación paralela y distribuída con paso de mensajes (MPI) y memoria
compartida (OpenMP). También se presentarán algunas herramientas útiles en la
depuración como Valgrind, Paraver y Tareador.&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;MORE INFO&lt;/h4&gt;
</description>
				<pubDate>Sun, 02 Oct 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/10/02/uam-tutorial.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/10/02/uam-tutorial.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at Splash 2016</title>
				<description>&lt;div&gt;&lt;span style=&quot;font-size: 16.26px;&quot;&gt;&lt;strong&gt;Tutorial&lt;/strong&gt;: Amsterdam, NETHERLANDS&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Event date&lt;/strong&gt;: November 1, 2016&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Speaker&lt;/strong&gt;: Jaume Bosch&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;CONTENTS&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;This tutorial presents task-based programming models, such as the OmpSs and OpenMP 4.0 Models. OmpSs is a programming model developed at Barcelona Supercomputing Center (BSC). Like OpenMP, it is based on compiler directives. It is the base platform where BSC has developed OpenMP tasking, support for dependences, priorities, task reductions, and it also includes support for heterogeneous devices.&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;MORE INFO&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;http://2016.splashcon.org/event/seps2016-tutorial-task-based-programming-for-embedded-multicore-systems&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
</description>
				<pubDate>Sat, 01 Oct 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/10/01/ompss-tutorial-at-splash-2016.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/10/01/ompss-tutorial-at-splash-2016.html</guid>
			</item>
		
			<item>
				<title>Parallel Programming Workshop</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: Barcelona, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: October 26-28th, 2016&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Teruel &amp;amp; Xavier Martorell&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;

&lt;p&gt;The objectives of this course are to understand the fundamental concepts
supporting message-passing and shared memory programming models. The course
covers the two widely used programming models: MPI for the distributed-memory
environments, and OpenMP for the shared-memory architectures. It also presents
the main tools developed at BSC to get information and analyze the execution of
parallel applications, Paraver and Extrae. Moreover it sets the basic
foundations related with task decomposition and parallelization inhibitors,
using a tool to analyze potential parallelism and dependences, Tareador.&lt;/p&gt;

&lt;h4 id=&quot;external-references&quot;&gt;External references&lt;/h4&gt;
</description>
				<pubDate>Fri, 26 Aug 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/08/26/parallel-programming.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/08/26/parallel-programming.html</guid>
			</item>
		
			<item>
				<title>Heterogeneous Parallel Programming with OmpSs</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: Haifa, ISRAEL&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: September 15th, 2016&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Martorell &amp;amp; Xavier Teruel&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;

&lt;p&gt;This tutorial will show the OmpSs Programming Model. It will be based on both
teaching and laboratory sessions. OmpSs is a programming model developed at BSC
that we use as a forerunner for OpenMP. Like OpenMP, it is based on compiler
directives. It is the base platform where we have developed OpenMP tasking,
support for dependences, priorities, task reductions, and it also includes
support for heterogeneous devices.&lt;/p&gt;

&lt;p&gt;We will introduce the OmpSs basic concepts related to task-based parallelism
for the SMP cores and then quickly move to the support for heterogeneous
devices. OmpSs allows to leverage existing OpenCL and CUDA kernels without the
burden to have to deal with data copies to/from the devices. Data copies are
just triggered automatically by the OmpSs runtime, based on the task dependence
annotations.&lt;/p&gt;

&lt;p&gt;OmpSs is currently being extended for FPGA devices, in the context of the AXIOM
European Project. We will also show how the same directives are being used to
outline code that can be compiled and run on FPGA devices.&lt;/p&gt;

&lt;p&gt;The tutorial will include two laboratory sessions. We will provide student
accounts to attendees in our Minotauro machine (Intel-based with NVidia GPUs),
and several exercises will be provided to be completed online (cholesky, matrix
multiplication, nbody, 3d-stencil, merge-sort, histogram…), and learn better
the details of the OmpSs support for both the SMP and heterogeneous
architectures.&lt;/p&gt;

&lt;h4 id=&quot;agenda&quot;&gt;Agenda&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Session 1. Introduction to OmpSs (8.00 - 10:00)
    &lt;ul&gt;
      &lt;li&gt;OmpSs tasking (fundamentals of OmpSs)&lt;/li&gt;
      &lt;li&gt;Task dependences (execution model)&lt;/li&gt;
      &lt;li&gt;Additional concurrent, commutative clauses&lt;/li&gt;
      &lt;li&gt;Development environment: Mercurium compiler and Nanos++&lt;/li&gt;
      &lt;li&gt;Hands-on on simple benchmarks: how to compile and execute applications&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Session 2. OmpSs support for heterogeneous architectures (10:30 - 12:15)
    &lt;ul&gt;
      &lt;li&gt;OmpSs target extensions&lt;/li&gt;
      &lt;li&gt;Automatic data transfers, software cache&lt;/li&gt;
      &lt;li&gt;Leveraging CUDA and OpenCL kernels&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Session 3. Hands-on (13:50 - 15:35)
    &lt;ul&gt;
      &lt;li&gt;Parallelizing applications on heterogeneous architectures with OmpSs&lt;/li&gt;
      &lt;li&gt;Cholesky, matrix multiplication, nbody, 3d-stencil, merge-sort, histogram…&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Session 4. FPGA support in OmpSs (15:50 - 17:00)
    &lt;ul&gt;
      &lt;li&gt;Exploiting parallelism on FPGA devices&lt;/li&gt;
      &lt;li&gt;Integrating the development environment with support for FPGAs&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;external-references&quot;&gt;External references&lt;/h4&gt;
</description>
				<pubDate>Fri, 15 Jul 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/07/15/heterogeneous-programming.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/07/15/heterogeneous-programming.html</guid>
			</item>
		
			<item>
				<title>New OmpSs release 16.06</title>
				<description>&lt;p&gt;&lt;strong&gt;Mercurium compiler:&lt;/strong&gt; 2.0.0&lt;br /&gt;
&lt;strong&gt;Nanos++ RT Library:&lt;/strong&gt; 0.10&lt;br /&gt;
&lt;strong&gt;Download this version &lt;a href=&quot;/ompss-downloads&quot;&gt;here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The programming models team is glad to announce you the release of the new stable version of OmpSs, which is based on the latest Mercurium and Nanos++ 0.10.&lt;/p&gt;

&lt;p&gt;Apart from several bug-fixes in both tools, the main features introduced in this new version are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;New cluster support
    &lt;ul&gt;
      &lt;li&gt;Execute OmpSs programs transparently on top of a distributed memory system (CUDA &amp;amp; OpenCL devices are also supported).&lt;/li&gt;
      &lt;li&gt;Network communication is implemented using GASNet, which provides support for modern High Performance networking technologies.&lt;/li&gt;
      &lt;li&gt;Several optimization mechanisms allow to maximize the performance of applications running on a cluster: a data affinity scheduling policy distributes and minimizes the network activity, task presend allows the OmpSs run-time to overlap communication with computation.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Support for non-contiguous data
    &lt;ul&gt;
      &lt;li&gt;Tasks can reference non-contiguous, multi-dimensional data, which eases the implementation of some applications.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Thread manager
    &lt;ul&gt;
      &lt;li&gt;The Thread Manager module dynamically controls the amount of working threads needed for a specific amount of workload &lt;a href=&quot;https://pm.bsc.es/ftp/ompss/doc/user-guide/run-programs-plugin-threadmanager.html&quot;&gt;(info)&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Task reductions
    &lt;ul&gt;
      &lt;li&gt;Extend the task construct adding support to the reduction clause &lt;a href=&quot;https://pm.bsc.es/ftp/ompss/doc/spec/programming_model.html#task-reductions&quot;&gt;(info)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Enhance the support of user-defined reductions &lt;a href=&quot;https://pm.bsc.es/ftp/ompss/doc/spec/language.html#declare-reduction-construct&quot;&gt;(info)&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Enhanced support for &lt;a href=&quot;/dlb&quot;&gt;Dynamic Load Balancing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any doubt or question, feel free to contact us at: pm-tools [at] bsc.es&lt;/p&gt;
</description>
				<pubDate>Thu, 16 Jun 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/06/16/new-ompss-release-16-06.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/06/16/new-ompss-release-16-06.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at PUMPS</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: Barcelona, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: July 15th, 2016&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Martorell &amp;amp; Xavier Teruel&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;

&lt;p&gt;The sixth edition of the Programming and Tuning Massively Parallel Systems
summer school (PUMPS) is aimed at enriching the skills of researchers, graduate
students and teachers with cutting-edge technique and hands-on experience in
developing applications for many-core processors with massively parallel
computing resources like GPU accelerators.&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More info&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://bcw.ac.upc.edu/PUMPS2016/&quot;&gt;PUMPS Website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Sun, 15 May 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/05/15/pumps.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/05/15/pumps.html</guid>
			</item>
		
			<item>
				<title>OmpSs Workshop at AXIOM project</title>
				<description>&lt;div&gt;&lt;strong style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;Workshop&lt;/strong&gt;&lt;span style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;: Siena, ITALY&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Event date&lt;/strong&gt;: May 31st&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Speaker&lt;/strong&gt;: Javier Bueno&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;&lt;span style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;DESCRIPTION&lt;/span&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;span style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;Build your own supercomputer with OmpSs, UDOO and Arduino.&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;Makers are revolutionary people that consider the physical world just another brick of a house always under construction. With this workshop we aim at guiding these hackers-at-heart through the setup and configuration of a cluster of UDOO QUAD boards powered by AXIOM’s OmpSs, a programming model developed to build clusters in a simple way and take them to the supercomputer level.&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;MORE INFO&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;span style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;Project&apos;s &lt;a href=&quot;http://www.axiom-project.eu/workshop-build-your-own-supercomputer-with-ompss-udoo-and-arduino/&quot;&gt;website&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
</description>
				<pubDate>Fri, 29 Apr 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/04/29/ompss-workshop-at-axiom-project.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/04/29/ompss-workshop-at-axiom-project.html</guid>
			</item>
		
			<item>
				<title>Heterogeneous Programming on GPUs with MPI + OmpSs</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Barcelona, SPAIN&lt;br /&gt;
&lt;strong&gt;Event date:&lt;/strong&gt; May 11-12th, 2016&lt;br /&gt;
&lt;strong&gt;Speaker:&lt;/strong&gt; Xavier Martorell&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;
&lt;p&gt;The tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs.&lt;/p&gt;

&lt;p&gt;More specifically, the tutorial will:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Introduce the hybrid MPI/OmpSs parallel programming model for future exascale systems&lt;/li&gt;
  &lt;li&gt;Demonstrate how to use MPI/OmpSs to incrementally parallelize/optimize:
    &lt;ul&gt;
      &lt;li&gt;MPI applications on clusters of SMPs, and&lt;/li&gt;
      &lt;li&gt;Leverage CUDA kernels with OmpSs on clusters of GPUs&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://events.prace-ri.eu/event/424&quot;&gt;https://events.prace-ri.eu/event/424&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Mon, 18 Apr 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/04/18/heterogeneous-programming-on-gpus-with-mpi-ompss.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/04/18/heterogeneous-programming-on-gpus-with-mpi-ompss.html</guid>
			</item>
		
			<item>
				<title>BigStorage Initial Training School</title>
				<description>&lt;div&gt;&lt;strong&gt;Tutorial&lt;/strong&gt;: Barcelona, SPAIN&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Event date&lt;/strong&gt;: March 3-9, 2016&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Speaker&lt;/strong&gt;: Xavier Martorell&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;CONTENTS&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;BigStorage is an European Training Network (ETN) whose main goal is to train future data scientists in order to enable them and us to apply holistic and interdisciplinary approaches for taking advantage of a data-overwhelmed world, which requires HPC and Cloud infrastructures with a redefinition of storage architectures underpinning them – focusing on meeting highly ambitious performance and energy usage objectives.&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;MORE INFO&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://www.bigstorage-project.eu/index.php/events&quot;&gt;http://www.bigstorage-project.eu/index.php/events&lt;/a&gt;&lt;/div&gt;
</description>
				<pubDate>Thu, 11 Feb 2016 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2016/02/11/bigstorage-initial-training-school.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2016/02/11/bigstorage-initial-training-school.html</guid>
			</item>
		
			<item>
				<title>ITN Course: Parallel Programming Workshop</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: Barcelona, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: January 18-22th, 2016&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Teruel &amp;amp; Xavier Martorell&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;

&lt;p&gt;The objectives of this course are to understand the fundamental concepts
supporting message-passing and shared memory programming models. The course
covers the two widely used programming models: MPI for the distributed-memory
environments, and OpenMP for the shared-memory architectures. It also presents
the main tools developed at BSC to get information and analyze the execution of
parallel applications, Paraver and Extrae. Moreover it sets the basic
foundations related with task decomposition and parallelization inhibitors,
using a tool to analyze potential parallelism and dependences, Tareador.&lt;/p&gt;

&lt;h4 id=&quot;external-references&quot;&gt;External references&lt;/h4&gt;

&lt;p&gt;https://www.bsc.es/marenostrum-support-services/hpc-education-and-training/2015-16-workshops-and-seasonal-schools/tccm&lt;/p&gt;
</description>
				<pubDate>Wed, 18 Nov 2015 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2015/11/18/parallel-programming.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2015/11/18/parallel-programming.html</guid>
			</item>
		
			<item>
				<title>Parallel Programming Workshop</title>
				<description>&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&lt;strong style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;Tutorial&lt;/strong&gt;&lt;span style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;: Barcelona, SPAIN&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&lt;strong&gt;Event date&lt;/strong&gt;: November 23-27, 2015&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&lt;strong style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;Speaker&lt;/strong&gt;&lt;span style=&quot;font-size: 16.26px; line-height: 1.538em;&quot;&gt;: Xavier Martorell&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&lt;strong&gt;CONTENTS&lt;/strong&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;The objectives of this course are to understand the fundamental concepts supporting message-passing and shared memory programming models. The course covers the two widely used programming models: MPI for the distributed-memory environments, and OpenMP for the shared-memory architectures. It also presents the main tools developed at BSC to get information and analyze the execution of parallel applications, Paraver and Extrae. Moreover it sets the basic foundations related with task decomposition and parallelization inhibitors, using a tool to analyze potential parallelism and dependences, Tareador.&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&lt;strong&gt;MORE INFO&lt;/strong&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.26px; line-height: 25.0079px;&quot;&gt;&lt;a href=&quot;http://www.bsc.es/marenostrum-support-services/hpc-education-and-training/patc-training/2014-13-17-oct-patc-parallel&quot;&gt;PATC Course: Parallel Programming Workshop&lt;/a&gt;&lt;/div&gt;
</description>
				<pubDate>Thu, 15 Oct 2015 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2015/10/15/parallel-programming-workshop.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2015/10/15/parallel-programming-workshop.html</guid>
			</item>
		
			<item>
				<title>UAM Course: Parallel Programming</title>
				<description>&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Venue&lt;/em&gt;: Madrid, SPAIN&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Event date&lt;/em&gt;: November 30th, 2015 - December 4th, 2015&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Speakers&lt;/em&gt;: Xavier Teruel &amp;amp; Xavier Martorell&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;

&lt;p&gt;Los objetivos del curso son entender y practicar conceptos fundamentales sobre
programación paralela y distribuída con paso de mensajes (MPI) y memoria
compartida (OpenMP). También se presentarán algunas herramientas útiles en la
depuración como Valgrind, Paraver y Tareador.&lt;/p&gt;

&lt;h4 id=&quot;external-references&quot;&gt;External References&lt;/h4&gt;

&lt;p&gt;http://www.uam.es/ss/Satellite/es/1234886350331/1242691057233/evento/evento/1242691057233.htm&lt;/p&gt;
</description>
				<pubDate>Wed, 30 Sep 2015 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2015/09/30/uam-tutorial.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2015/09/30/uam-tutorial.html</guid>
			</item>
		
			<item>
				<title>New OmpSs release 15.06</title>
				<description>&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;Me&lt;/strong&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong&gt;rcurium compiler:&lt;/strong&gt;&amp;nbsp;1.99.7&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong&gt;Nanos++ RT Library:&lt;/strong&gt;&amp;nbsp;0.7.10&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 1.538em;&quot;&gt;&lt;strong&gt;Download this version&lt;/strong&gt;&amp;nbsp;&lt;strong&gt;&lt;a href=&quot;/ompss-downloads&quot;&gt;here&lt;/a&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
</description>
				<pubDate>Thu, 04 Jun 2015 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2015/06/04/new-ompss-release-15-06.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2015/06/04/new-ompss-release-15-06.html</guid>
			</item>
		
			<item>
				<title>New OmpSs release 15.04</title>
				<description>&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;Me&lt;/strong&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong&gt;rcurium compiler:&lt;/strong&gt; 1.99.7&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong&gt;Nanos++ RT Library:&lt;/strong&gt; 0.7.9&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 1.538em;&quot;&gt;&lt;strong&gt;Download this version&lt;/strong&gt;&amp;nbsp;&lt;strong&gt;&lt;a href=&quot;ompss-downloads&quot;&gt;here&lt;/a&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;!--break--&gt;&lt;/div&gt;
</description>
				<pubDate>Wed, 15 Apr 2015 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2015/04/15/new-ompss-release-15-04.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2015/04/15/new-ompss-release-15-04.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at PUMPS 2015</title>
				<description>&lt;div&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 1.538em;&quot;&gt;&lt;strong&gt;Tutorial&lt;/strong&gt;: Barcelona, SPAIN&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Event date&lt;/strong&gt;: July 6-10th, 2015&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Speakers&lt;/strong&gt;: TBD&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;CONTENTS&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;The sixth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;MORE INFO&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://bcw.ac.upc.edu/PUMPS2015/start&quot;&gt;PUMPS course website&lt;/a&gt;&lt;/div&gt;
</description>
				<pubDate>Tue, 24 Mar 2015 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2015/03/24/ompss-tutorial-at-pumps-2015.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2015/03/24/ompss-tutorial-at-pumps-2015.html</guid>
			</item>
		
			<item>
				<title>Course on OmpSs PATC</title>
				<description>&lt;div&gt;&lt;strong style=&quot;font-size: 16.2600002288818px; line-height: 1.538em;&quot;&gt;Tutorial&lt;/strong&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 1.538em;&quot;&gt;: Barcelona, SPAIN&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;Event date&lt;/strong&gt;: May 13-14th, 2015&lt;/div&gt;
&lt;div&gt;&lt;strong style=&quot;font-size: 16.2600002288818px; line-height: 1.538em;&quot;&gt;Speaker&lt;/strong&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 1.538em;&quot;&gt;: Xavier Martorell&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;CONTENTS&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;The tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs.&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;MORE INFO&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://www.bsc.es/patc-programming-2015&quot;&gt;BSC PATC courses website&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
</description>
				<pubDate>Tue, 24 Mar 2015 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2015/03/24/course-on-ompss-patc.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2015/03/24/course-on-ompss-patc.html</guid>
			</item>
		
			<item>
				<title>New OmpSs release 15.02</title>
				<description>&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;Me&lt;/strong&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong&gt;rcurium compiler:&lt;/strong&gt;&amp;nbsp;1.99.6&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;strong&gt;Nanos++ RT Library:&lt;/strong&gt;&amp;nbsp;0.7.6&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;span style=&quot;font-size: 16.2600002288818px; line-height: 1.538em;&quot;&gt;&lt;strong&gt;Download this version&lt;/strong&gt;&amp;nbsp;&lt;strong&gt;&lt;a href=&quot;/ompss-downloads&quot;&gt;here&lt;/a&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div style=&quot;font-size: 16.2600002288818px; line-height: 25.0078811645508px;&quot;&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
</description>
				<pubDate>Sun, 15 Feb 2015 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2015/02/15/new-ompss-release-15-02.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2015/02/15/new-ompss-release-15-02.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at PUMPS 2014</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Barcelona, SPAIN&lt;br /&gt;
&lt;strong&gt;Event date:&lt;/strong&gt; July 7-11th, 2014&lt;br /&gt;
&lt;strong&gt;Speakers:&lt;/strong&gt; Rosa M. Badia and Xavier Martorell&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;
&lt;p&gt;The fifth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;p&gt;Summer School website&lt;/p&gt;
</description>
				<pubDate>Wed, 02 Jul 2014 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2014/07/02/ompss-tutorial-at-pumps-2014.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2014/07/02/ompss-tutorial-at-pumps-2014.html</guid>
			</item>
		
			<item>
				<title>Contributions at Joint laboratory for Petascale computing</title>
				<description>&lt;p&gt;&lt;strong&gt;Workshop:&lt;/strong&gt; Sophia Antipolis, FRANCE&lt;br /&gt;
&lt;strong&gt;Event Date:&lt;/strong&gt; June 9-11, 2014&lt;br /&gt;
&lt;strong&gt;Speakers:&lt;/strong&gt; Jesus Labarta, Victor Lopez and Florentino Sainz&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;Presentation of BSC activities (Jesus Labarta)&lt;/li&gt;
  &lt;li&gt;DLB: Dynamic Load Balancing Library (Victor Lopez)&lt;/li&gt;
  &lt;li&gt;DEEP Collective offload (Florentino Sainz)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;abstracts&quot;&gt;Abstracts&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Load Balancing Library:&lt;/strong&gt; DLB is a dynamic library designed to speed up hybrid applications by improving its load balance with little or none intervention from the user. The idea behind the library is to redistribute the computational resources of the second level of parallelism (OpenMP, OmpSs) to improve the load balance of the outer level of parallelism (MPI). DLB library uses an interposition technique at run time, so it is not necessary to do a previous analysis or modify the application; although finer control is also supported through an API. Finally, we also present a case study with CESM (Community Earth System Model), a global climate model that provides computer simulations of the Earth climate states. The application already uses a hybrid parallel programming model (MPI+OpenMp), so with few modifications in the source code we have compiled it to use the OmpSs programming model where DLB will benefit from the high malleability of it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deep Collective offload:&lt;/strong&gt; We present a new extension of OmpSs programming model which allows users to dynamically offload C/C++ or Fortran code from one or many nodes to a group of remote nodes. Communication between remote nodes executing offloaded code is possible through MPI. It aims to improve programmability of Exascale and nowadays supercomputers which use different type of processors and interconnection networks which have to work together in order to obtain the best performance. We can find a good example of these architectures in the DEEP project, which has two separated clusters (CPUs and Xeon Phis). With our technology, which works in any architecture which fully supports MPI, users will be able to easily offload work from the CPU cluster to the accelerators cluster without the constraint of falling back to the CPU cluster in order to perform MPI communications.&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://jointlab.ncsa.illinois.edu/events/workshop11&quot;&gt;Event information&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Fri, 06 Jun 2014 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2014/06/06/contributions-at-joint-laboratory-for-petascale-computing.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2014/06/06/contributions-at-joint-laboratory-for-petascale-computing.html</guid>
			</item>
		
			<item>
				<title>Course on programming models using OmpSs</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Bucaramanga, COLOMBIA&lt;br /&gt;
&lt;strong&gt;Event date:&lt;/strong&gt; June 3-6th, 2014&lt;br /&gt;
&lt;strong&gt;Speakers:&lt;/strong&gt; Vicenç Beltran and Florentino Sainz&lt;/p&gt;

&lt;h4 id=&quot;abstract&quot;&gt;Abstract&lt;/h4&gt;
&lt;p&gt;OmpSs is an effort to integrate features from the StarSs programming model developed by BSC into a single programming model. In particular, our objective is to extend OpenMP with new directives to support asynchronous parallelism and heterogeneity (devices like GPUs). However, it can also be understood as new directives extending other accelerator based APIs like CUDA or OpenCL. Our OmpSs environment is built on top of our Mercurium compiler and Nanos++ runtime system.&lt;/p&gt;

&lt;h4 id=&quot;place&quot;&gt;Place&lt;/h4&gt;
&lt;p&gt;Universidad Industrial de Santander - Campus Principal&lt;br /&gt;
Bucaramanga, Santander, Colombia&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://sc3.uis.edu.co/ompss14&quot;&gt;News at Universidat Industrial de Santander&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.redclara.net/index.php?option=com_wrapper&amp;amp;view=wrapper&amp;amp;Itemid=669&amp;amp;url=eventos.redclara.net/indico/events.py?tag=311&amp;amp;lang=es&quot;&gt;Course web site&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Fri, 30 May 2014 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2014/05/30/course-on-programming-models-using-ompss.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2014/05/30/course-on-programming-models-using-ompss.html</guid>
			</item>
		
			<item>
				<title>PATC Course: Heterogeneous Programming on GPUs with MPI + OmpSs</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Barcelona, SPAIN&lt;br /&gt;
&lt;strong&gt;Event Date:&lt;/strong&gt; May 14-15, 2014&lt;br /&gt;
&lt;strong&gt;Speakers:&lt;/strong&gt; Rosa M. Badia and Xavier Martorell&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;
&lt;p&gt;The tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs, and more specifically, the tutorial will introduce the hybrid MPI/OmpSs parallel programming model for future exascale systems. It will also demonstrate how to use MPI/OmpSs to incrementally parallelize/optimize: first using MPI applications on clusters of SMPs, but alos leveraging CUDA kernels with OmpSs on clusters of GPUs.&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.bsc.es/marenostrum-support-services/hpc-events-trainings/prace-trainings/clone-patc-course-23-24-may12&quot;&gt;Course web site&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Mon, 14 Apr 2014 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2014/04/14/patc-course-heterogeneous-programming-on-gpus-with-mpi-ompss.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2014/04/14/patc-course-heterogeneous-programming-on-gpus-with-mpi-ompss.html</guid>
			</item>
		
			<item>
				<title>Public Release DLB library version 1.0</title>
				<description>&lt;p&gt;&lt;img alt=&quot;&quot; src=&quot;/files/dlb/logo.png&quot; height=&quot;50&quot; width=&quot;162&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The Programming models grup at &lt;a href=&quot;https://www.bsc.es&quot;&gt;Barcelona Supercomputing Center&lt;/a&gt; is proud to announce the first official release of the Dynamic Load Balancing Library &lt;a href=&quot;/dlb&quot;&gt;DLB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;DLB is a dynamic library that aims to improve the performance of hybrid applications by decreasing the load imbalance of the outter level of parallelism (usually MPI) by redistributing the computational resources in the inner level (shared memory parallelism).&lt;/p&gt;

&lt;p&gt;For more information please visit our webpage: &lt;a href=&quot;/dlb&quot;&gt;https://pm.bsc.es/dlb&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The latest releases of DLB, detailed documentation and source code are available at our &lt;a href=&quot;//github.com/bsc-pm/dlb/&quot;&gt;github&lt;/a&gt;.&lt;/p&gt;
</description>
				<pubDate>Wed, 11 Dec 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/12/11/public-release-dlb-library-version-1-0.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/12/11/public-release-dlb-library-version-1-0.html</guid>
			</item>
		
			<item>
				<title>BSC @ SC13: Tutorial and HPC Educators Session</title>
				<description>&lt;p&gt;BSC is contributing to SC13 with a tutorial and a session on the HPC Educators session OmpSs task-based programming model and its use.&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://sc13.supercomputing.org/schedule/event_detail.php?evid=tut117&quot;&gt;OmpSs tutorial&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://sc13.supercomputing.org/schedule/event_detail.php?evid=eps104&quot;&gt;HPC Educators session&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Tue, 29 Oct 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/10/29/bsc-sc13-tutorial-and-hpc-educators-session.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/10/29/bsc-sc13-tutorial-and-hpc-educators-session.html</guid>
			</item>
		
			<item>
				<title>Heterogeneous Programming on GPUs with MPI + OmpSs (SBAC-PAD 2013)</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Porto de Galinhas, BRAZIL&lt;br /&gt;
&lt;strong&gt;Event date:&lt;/strong&gt; October 23, 2013&lt;br /&gt;
&lt;strong&gt;Speaker:&lt;/strong&gt; Rosa Maria Badia&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;
&lt;p&gt;The course starts with the objective of setting up the basic foundations related with task decomposition and parallelization inhibitors, using a tool to analyze potential parallelism and dependences. The course follows with the objective of understanding the fundamental concepts supporting shared-memory and efficient programming using GPUs.&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.cin.ufpe.br/~sbac2013/sbac/overall_program_new.php&quot;&gt;Course web site&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Mon, 30 Sep 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/09/30/heterogeneous-programming-on-gpus-with-mpi-ompss-sbac-pad-2013.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/09/30/heterogeneous-programming-on-gpus-with-mpi-ompss-sbac-pad-2013.html</guid>
			</item>
		
			<item>
				<title>Parallel Programming Workshop (PATC Course)</title>
				<description>&lt;p&gt;&lt;strong&gt;Tutorial:&lt;/strong&gt; Barcelona, SPAIN - October 14-18, 2013&lt;br /&gt;
&lt;strong&gt;Speakers:&lt;/strong&gt; Rosa M. Badia and Xavier Martorell&lt;/p&gt;

&lt;h4 id=&quot;objectives&quot;&gt;Objectives&lt;/h4&gt;
&lt;p&gt;The course starts with the objective of setting up the basic foundations related with task decomposition and parallelization inhibitors, using a tool to analyze potential parallelism and dependences. The course follows with the objective of understanding the fundamental concepts supporting shared-memory and message-passing programming models. The course is taught using formal lectures and practical/programming sessions to reinforce the key concepts and set up the compilation/execution environment. The course covers the two widely used programming models: OpenMP for the shared-memory architectures and MPI for the distributed-memory counterparts. The use of OpenMP in conjunction with MPI to better exploit the shared-memory capabilities of current compute nodes in clustered architectures is also considered. Paraver will be used along the course as the tool to understand the behavior and performance of parallelized codes.&lt;/p&gt;

&lt;h4 id=&quot;more-info&quot;&gt;More Info&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.bsc.es/marenostrum-support-services/hpc-education-and-training/patc-training/about-patc-bsc/patc-parallel&quot;&gt;Course web site&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Fri, 20 Sep 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/09/20/parallel-programming-workshop-patc-course.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/09/20/parallel-programming-workshop-patc-course.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at PUMPS 2013</title>
				<description>&lt;div&gt;&lt;strong style=&quot;line-height: 1.538em;&quot;&gt;OmpSs: Leveraging GPU/CUDA Programming&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;Barcelona, SPAIN -- July 8-12, 2013&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;CONTENTS&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;The fourth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;More information at:&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://bcw.ac.upc.edu/PUMPS2013/start&quot;&gt;Summer School website&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
</description>
				<pubDate>Sat, 15 Jun 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/06/15/ompss-tutorial-at-pumps-2013.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/06/15/ompss-tutorial-at-pumps-2013.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at Colombia</title>
				<description>&lt;div&gt;&lt;strong&gt;&lt;span style=&quot;line-height: 1.538em;&quot;&gt;Asynchronous Hybrid and Heterogeneous Parallel Programming with MPI/OmpSs Course&lt;/span&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;July 1-5, 2013&lt;/div&gt;
&lt;div&gt;&lt;!--break--&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;ABSTRACT&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;span style=&quot;line-height: 1.538em;&quot;&gt;OmpSs is an effort to integrate features from the StarSs programming model developed by BSC into a single programming model. In particular, our objective is to extend OpenMP with new directives to support asynchronous parallelism and heterogeneity (devices like GPUs). However, it can also be understood as new directives extending other accelerator based APIs like CUDA or OpenCL. Our OmpSs environment is built on top of our Mercurium compiler and Nanos++ runtime system.&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;PLACE&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;span style=&quot;line-height: 1.538em;&quot;&gt;Universidad Industrial de Santander - Campus Principal&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;Bucaramanga, Santander, Colombia&lt;/div&gt;
&lt;div&gt;Sala: Sala de conferencias EISI - Facultad de Ingenierías Fisicomecánicas&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;strong&gt;More information at:&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://grid.uis.edu.co/index.php/Asynchronous_Hybrid_and_Heterogeneous_Parallel_Programming_with_MPI/OmpSs_Course&quot;&gt;Course website&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
</description>
				<pubDate>Sat, 01 Jun 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/06/01/ompss-tutorial-at-colombia.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/06/01/ompss-tutorial-at-colombia.html</guid>
			</item>
		
			<item>
				<title>Ompss tutorial at XSEDE project</title>
				<description>&lt;h3 id=&quot;parallel-cpu-programming-ompss-at-the-university-of-new-york&quot;&gt;Parallel CPU programming (OmpSs) at the University of New York&lt;/h3&gt;
&lt;p&gt;June 24, 2013&lt;/p&gt;

&lt;h4 id=&quot;contents&quot;&gt;Contents&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;BLOCK 1: OmpSs Quick Overview
    &lt;ul&gt;
      &lt;li&gt;High Performance Computing
        &lt;ul&gt;
          &lt;li&gt;Some basic concepts&lt;/li&gt;
          &lt;li&gt;Supercomputers nowadays&lt;/li&gt;
          &lt;li&gt;Parallel programming models&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;OmpSs Introduction
        &lt;ul&gt;
          &lt;li&gt;OmpSs main features&lt;/li&gt;
          &lt;li&gt;A Practical Example: Cholesky factorization&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;BSC’s Implementation
        &lt;ul&gt;
          &lt;li&gt;Mercurium Compiler&lt;/li&gt;
          &lt;li&gt;Nanos++ Runtime Library&lt;/li&gt;
          &lt;li&gt;Visualization Tools&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;BLOCK 2: Basics of OmpSs
    &lt;ul&gt;
      &lt;li&gt;Tasking and Synchronization
        &lt;ul&gt;
          &lt;li&gt;Data Sharing Attributes&lt;/li&gt;
          &lt;li&gt;Dependence Model&lt;/li&gt;
          &lt;li&gt;Other Tasking Directive Clauses&lt;/li&gt;
          &lt;li&gt;Taskwait&lt;/li&gt;
          &lt;li&gt;Synchronization&lt;/li&gt;
          &lt;li&gt;Outlined Task Syntax&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;Memory Regions
        &lt;ul&gt;
          &lt;li&gt;Introduction&lt;/li&gt;
          &lt;li&gt;Syntax&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;Nesting and Dependences
        &lt;ul&gt;
          &lt;li&gt;Memory regions and dependences&lt;/li&gt;
          &lt;li&gt;Nested tasks and dependences&lt;/li&gt;
          &lt;li&gt;Using dependence sentinels&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;Programming Methodology&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More information at: &lt;a href=&quot;https://www.xsede.org/web/international-hpc-summer-school/2013-wiki/-/wikid/0qDR/2013+Wiki/Abstracts&quot;&gt;Course website&lt;/a&gt;&lt;/p&gt;
</description>
				<pubDate>Wed, 29 May 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/05/29/ompss-tutorial-at-xsede-project.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/05/29/ompss-tutorial-at-xsede-project.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at ISCA 2013</title>
				<description>&lt;h3 id=&quot;hybrid-and-heterogeneous-parallel-programming-with-mpiompss-for-exascale-systems&quot;&gt;Hybrid and Heterogeneous Parallel Programming with MPI/OmpSs for Exascale Systems&lt;/h3&gt;

&lt;p&gt;June 24, 2013&lt;/p&gt;

&lt;h4 id=&quot;abstract&quot;&gt;Abstract&lt;/h4&gt;
&lt;p&gt;Due to its asynchronous nature and look-ahead capabilities, MPI/OmpSs is a promising programming model approach for future exascale systems, with the potential to exploit unprecedented amounts of parallelism, while coping with memory latency, network latency and load imbalance. Many large-scale applications are already seeing very positive results from their ports to MPI/OmpSs (see EU projects Montblanc, TEXT). We will first cover the basic concepts of the programming model. OmpSs can be seen as an extension of the OpenMP model. Unlike OpenMP, however, task dependencies are determined at runtime thanks to the directionality of data arguments. The OmpSs runtime supports asynchronous execution of tasks on heterogeneous systems such as SMPs, GPUs and clusters thereof. The integration of OmpSs with MPI facilitates the migration of current MPI applications and improves, automatically, the performance of these applications by overlapping computation with communication between tasks on remote nodes. The tutorial will also cover the constellation of development and performance tools available for the MPI/OmpSs programming model: the methodology to determine OmpSs tasks, the Tareador tool, and the Paraver performance analysis tools. The tutorial will also include practical sessions on application development and analysis on single many-core nodes, heterogeneous environments with GPUs, and cluster environments with MPI/OmpSs.&lt;/p&gt;

&lt;p&gt;More information at:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://pm.bsc.es/content/ompss-tutorial-isca-2013&quot;&gt;Tutorial website&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://isca2013.eew.technion.ac.il/&quot;&gt;ISCA 2013 website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Fri, 24 May 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/05/24/ompss-tutorial-at-isca-2013.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/05/24/ompss-tutorial-at-isca-2013.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at PRACE project</title>
				<description>&lt;h3 id=&quot;patc-course-heterogeneous-programming-on-gpus-with-mpi--ompss&quot;&gt;PATC Course: Heterogeneous Programming on GPUs with MPI + OmpSs&lt;/h3&gt;

&lt;p&gt;May 15-16, 2013&lt;/p&gt;

&lt;h3 id=&quot;objectives&quot;&gt;Objectives&lt;/h3&gt;
&lt;p&gt;The tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs. More specifically, the tutorial will:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Introduce the hybrid MPI/OmpSs parallel programming model for future exascale systems&lt;/li&gt;
  &lt;li&gt;Demonstrate how to use MPI/OmpSs to incrementally parallelize/optimize:
    &lt;ul&gt;
      &lt;li&gt;MPI applications on clusters of SMPs, and&lt;/li&gt;
      &lt;li&gt;Leverage CUDA kernels with OmpSs on clusters of GPUs&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More information at:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.bsc.es/marenostrum-support-services/hpc-events-trainings/prace-trainings/clone-patc-course-23-24-may12&quot;&gt;Course website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Mon, 15 Apr 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/04/15/ompss-tutorial-at-prace-project.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/04/15/ompss-tutorial-at-prace-project.html</guid>
			</item>
		
			<item>
				<title>Ompss tutorial at CAPAP-H</title>
				<description>&lt;h3 id=&quot;programación-de-aplicaciones-con-mpi--ompss&quot;&gt;Programación de aplicaciones con MPI + OmpSs&lt;/h3&gt;

&lt;p&gt;March 15, 2013&lt;/p&gt;

&lt;h4 id=&quot;abstract&quot;&gt;Abstract&lt;/h4&gt;
&lt;p&gt;Dada su naturaleza asíncrona y posibilidades de prever tareas a ejecutar, MPI/OmpSs es un modelo de programación paralelo muy prometedor para sistemas exascale. El modelo tiene un gran potencial para explotar el paralelismo inherente de las aplicaciones, a la vez que oculta la latencia con memoria y con la red o mejora el balanceo de carga entre los diferentes procesos. Un número significativo de aplicaciones están viendo una mejora importante en su rendimiento cuando son adaptados al modelo MPI/OmpSs (por ejemplo, aplicaciones de los proyectos Montblanc o TEXT). En el curso, se describirán primero conceptos básicos del modelo de programación. OmpSs puede considerarse una extensión del estándar OpenMP. Sin embargo, a diferencia de OpenMP las dependencias de datos entre las tareas son determinadas en tiempo de ejecución teniendo en cuenta la direccionalidad de los argumentos de las tareas. La librería de ejecución de OmpSs da soporte a sistemas heterogéneos compuestos de procesadores de propósito general (multicores), GPUs, y clusters. La integración de OmpSs con MPI facilita la migración de aplicaciones actuales y mejora su comportamiento mediante el solape de tareas de computación con comunicación.&lt;/p&gt;

&lt;p&gt;Ponentes: Xavier Martorell y Rosa M. Badía.&lt;/p&gt;

&lt;p&gt;More information at:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://capap-h.uji.es/?q=node/142&quot;&gt;Tutorial website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
				<pubDate>Fri, 15 Feb 2013 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2013/02/15/ompss-tutorial-at-capap-h.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2013/02/15/ompss-tutorial-at-capap-h.html</guid>
			</item>
		
			<item>
				<title>OmpSs tutorial at Supercomputing 2012</title>
				<description>&lt;p&gt;&lt;strong&gt;Asynchronous Hybrid and Heterogeneous Parallel Programming with MPI/OmpSs for Exascale Systems&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Nov 12, 2012&lt;/p&gt;
&lt;p&gt;&lt;!--break--&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;TIME: 1:30PM - 5:00PM\n&lt;li&gt;PRESENTERS: Jesus Labarta, Xavier Martorell, Christoph Niethammer and Costas Bekas.\n&amp;lt;/ul&amp;gt;&lt;p&gt;&lt;strong&gt;ABSTRACT&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Due to its asynchronous nature and look-ahead capabilities, MPI/OmpSs is a promising programming model approach for future exascale systems, with the potential to exploit unprecedented amounts of parallelism, while coping with memory latency, network latency and load imbalance. Many large-scale applications are already seeing very positive results from their ports to MPI/OmpSs (see EU projects Montblanc, TEXT). We will first cover the basic concepts of the programming model. OmpSs can be seen as an extension of the OpenMP model. Unlike OpenMP, however, task dependencies are determined at runtime thanks to the directionality of data arguments. The OmpSs runtime supports asynchronous execution of tasks on heterogeneous systems such as SMPs, GPUs and clusters thereof. The integration of OmpSs with MPI facilitates the migration of current MPI applications and improves, automatically, the performance of these applications by overlapping computation with communication between tasks on remote nodes. The tutorial will also cover the constellation of development and performance tools available for the MPI/OmpSs programming model: the methodology to determine OmpSs tasks, the Ayudame/Temanejo debugging toolset, and the Paraver performance analysis tools. Experiences on the parallelization of real applications using MPI/OmpSs will also be presented. The tutorial will also include a demo.&lt;/p&gt;&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;/li&gt;&lt;/li&gt;&lt;/ul&gt;
</description>
				<pubDate>Tue, 02 Oct 2012 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2012/10/02/ompss-tutorial-at-supercomputing-2012.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2012/10/02/ompss-tutorial-at-supercomputing-2012.html</guid>
			</item>
		
			<item>
				<title>NANOS++ for Clusters</title>
				<description>&lt;p&gt;A few weeks ago we started the development of NANOS++ for Clusters. As everyone would expect for the name, the main goal is to support the execution of parallel applications on cluster environments using the current programming models available at NANOS++. The starting design has been driven by the following ideas:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Low cohesion with the rest of the runtimei&lt;/li&gt;
  &lt;li&gt;Minimal impact on applications code&lt;/li&gt;
  &lt;li&gt;Independent of the underlying network technology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far the initial design has been quite successful following these three objectives. The cluster support has been added as a plugin which can be enabled during application runtime. It has been developed on MareNostrum, a cluster of PowerPC nodes, but it should work on other platforms as the code is architecture-independent. Network support is provided by &lt;a href=&quot;http://gasnet.cs.berkeley.edu&quot;&gt;GasNet&lt;/a&gt;, a low-level networking layer oriented to build runtime libraries on top of it which supports several network technologies.&lt;/p&gt;

&lt;p&gt;We have succeed executing applications coded using the OmpSs pragmas. A dense Matrixmultiply has been able to run up to 64 nodes, however the current design limits the scalability of the system up to 4 nodes.&lt;/p&gt;

&lt;p&gt;The next stages of the development will be focused on allowing the system to scale while using a higher number of nodes, enabling instrumentation features and adding more applications.&lt;/p&gt;
</description>
				<pubDate>Thu, 09 Sep 2010 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2010/09/09/nanos-for-clusters.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2010/09/09/nanos-for-clusters.html</guid>
			</item>
		
			<item>
				<title>Instrumenting Nanos++ (3rd part)</title>
				<description>&lt;p&gt;We conclude in this article the series of posts about Instrumenting Nanos++. In the &lt;a href=&quot;/2010/06/30/instrumenting-nanos-1st-part.html&quot;&gt;first&lt;/a&gt; article we discussed about the different components which take part on the instrumentation process, the different types of events which can be generated by the runtime and also, how the instrumentation output can be adapted to different formats (using plugins). In the &lt;a href=&quot;/2010/07/23/instrumenting-nanos-2nd-part.html&quot;&gt;second&lt;/a&gt; one we discussed about internal implementation details and how programmers can use instrumentation services in order to generate events. In this one we will talk about Instrumentation modules which can help programmers when instrumenting the code and we will show some practical examples using these modules.&lt;/p&gt;

&lt;h4 id=&quot;instrumentation-modules&quot;&gt;Instrumentation Modules&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;Instrumentation modules&lt;/em&gt; help programmers in the instrumentation process by doing automatically some of the duties that users need to follow for correct instrumentation. So far its main utility is to take care about multiple exits in a given piece of code. As a module is a C++ object, we can use the constructor to open an instrumentation burst leaving the responsibility of closing it to the corresponding destructor. The simplest Instrumentation module is InstrumentState:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;InstrumentState&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;nl&quot;&gt;private:&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;Instrumentation&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_inst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt;                &lt;span class=&quot;n&quot;&gt;_closed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
   &lt;span class=&quot;nl&quot;&gt;public:&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;InstrumentState&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nanos_event_state_value_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_inst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getInstrumentor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_closed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;_inst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;raiseOpenStateEvent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InstrumentState&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_closed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;close&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;close&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_closed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_inst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;raiseCloseStateEvent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Creating a new InstrumentState object will produce the opening of a &lt;em&gt;State&lt;/em&gt; event (the value is specified in the object constructor). Once the object goes out of the scope where is declared the destructor will close it (if programmer has not closed it before). As most of instrumentation phases affect a whole function the programmer has just to create an object of a Instrumentation module at the beginning of the function.&lt;/p&gt;

&lt;h4 id=&quot;example-instrumenting-the-api&quot;&gt;Example: Instrumenting the API&lt;/h4&gt;
&lt;p&gt;API functions have – generally – a common behaviour. They open a &lt;em&gt;Burst&lt;/em&gt; event with a pair &amp;lt;key,value&amp;gt;. The key is the internal code “api” and the value is an specific identifier of the function we are instrumenting on. API functions also open a &lt;em&gt;State&lt;/em&gt; event with a value according with the function duty. Both events will be closed once the function execution finishes. Here it is an example using nanos_yield() implementation:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;nanos_err_t&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;nanos_yield&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InstrumentStateAndBurst&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;api&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;yield&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SCHEDULING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;Scheduler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;yield&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NANOS_UNKNOWN_ERR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NANOS_OK&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Yield function will wrap its execution between &amp;lt;“api”,”yield”&amp;gt; Burst and SCHEDULING &lt;em&gt;State&lt;/em&gt; events. Although the function can have other exit points (apart from the return) &lt;em&gt;InstrumentStateAndBurst&lt;/em&gt; destructor will throw closing events automatically.&lt;/p&gt;

&lt;h4 id=&quot;example-instrumenting-runtime-internal-functions&quot;&gt;Example: Instrumenting Runtime Internal Functions&lt;/h4&gt;
&lt;p&gt;Different Nanos++ functions have different instrumentation approaches. In this sections we have chosen a scheduling related function: &lt;em&gt;Scheduler::waitOnCondition()&lt;/em&gt;. Due space limitations we have abridged the code focusing our interest in the instrumentation parts.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scheduler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;waitOnCondition&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GenericSyncCond&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InstrumentState&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SYNCHRONIZATION&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

   &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nspins&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getSchedulerConf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getNumSpins&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
   &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;spins&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nspins&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

   &lt;span class=&quot;n&quot;&gt;WD&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;myThread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getCurrentWD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;

   &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;check&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;BaseThread&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;kr&quot;&gt;thread&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getMyThreadSafe&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;spins&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;spins&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
         &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;check&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;addWaiter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

            &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InstrumentState&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inst1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SCHEDULING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;WD&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_schedulePolicy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;atBlock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;thread&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inst1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;close&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;next&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InstrumentState&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inst2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RUNTIME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;switchTo&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;next&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unlock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InstrumentState&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inst3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;YIELD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
               &lt;span class=&quot;kr&quot;&gt;thread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;yield&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
         &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unlock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
         &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;spins&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nspins&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this function the instrumentation changes the thread state in several parts of the code. First, all the function code is surrounded by a SYNCHRONIZATION state (&lt;em&gt;inst&lt;/em&gt;). A Opening state event is raised at the very beginning of the function and the corresponding close event will be thrown once the execution flow gets out from the function scope. During the function execution the thread state may change to SCHEDULING when calling &lt;em&gt;_schedulePolicy.atBlock()&lt;/em&gt;, RUNTIME when we are context switching &lt;em&gt;WorkDescriptors&lt;/em&gt; and YIELD when we are forcing a thread yield. In this case the SCHEDULING state change is the only one we have to force to close before getting out from its scope. Note, that if an C++ exception is raised by any of the lower layers the states that are open at point will close automatically. So, the use of the Instrumentation modules improves the general exception safety of the code.&lt;/p&gt;
</description>
				<pubDate>Fri, 13 Aug 2010 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2010/08/13/instrumenting-nanos-3rd-part.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2010/08/13/instrumenting-nanos-3rd-part.html</guid>
			</item>
		
			<item>
				<title>Instrumenting Nanos++ (2nd part)</title>
				<description>&lt;p&gt;We continue in this article with the Nanos++ instrumentation overview started in &lt;a href=&quot;/2010/06/30/instrumenting-nanos-1st-part.html&quot;&gt;previously&lt;/a&gt;. In the previous article we discussed about the different components which take part on the instrumentation process, the different types of events which can be generated by the runtime and also, how the instrumentation output can be adapted to different formats (using plugins). In this article we focus on the internal implementation and how programmers can use instrumentation services in order to generate events.&lt;/p&gt;

&lt;h4 id=&quot;instrumentation-class&quot;&gt;Instrumentation class&lt;/h4&gt;
&lt;p&gt;The main component of the instrumentation is the &lt;em&gt;Instrumentation&lt;/em&gt; class. This class offers several services which can be grouped in:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Create event’s services:&lt;/strong&gt; these services are focused in create specific event objects. Usually they are not called by external agents but they are used by raise event’s services (explained below).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Raise event’s services:&lt;/strong&gt; these services are focused in effectively producing an event (or list of events) which will be visible by the user. Usually these functions will call one or several create event’s service(s) and finally produce an effective output by calling plugin’s addEventList() service.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Context switch’s services:&lt;/strong&gt; they are used to backup/restore the instrumentation information history for the current &lt;em&gt;WorkDescriptor&lt;/em&gt; (see InstrumentationContext class).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Instrumentation&lt;/em&gt; class also offers two more services to enable/disable state instrumentation. Once the user calls &lt;em&gt;disableStateInstrumentation()&lt;/em&gt; the runtime will not produce more state events until the user enable it by calling &lt;em&gt;enableStateInstrumentation()&lt;/em&gt;. Although no state events will be produced during this interval of time Instrumentation class will keep all potential state changes by creating a special event object: the substate event.&lt;/p&gt;

&lt;h4 id=&quot;instrumentationcontext-class&quot;&gt;InstrumentationContext class&lt;/h4&gt;
&lt;p&gt;In order to reproduce the history of events in WorkDescriptor’s context switches (and taking into account that we are producing a thread-centered trace) the Instrumentation class needs a repository for this kind of information related with each WorkDescriptor. The &lt;em&gt;InstrumentationContext&lt;/em&gt; is the responsible of keeping a history of state transitions, still opened bursts or delayed event list. &lt;em&gt;InstrumentationContext&lt;/em&gt; is implemented through two different classes: &lt;em&gt;InstrumentationContext&lt;/em&gt; (which defines the behavior of this component) and &lt;em&gt;InstrumentationContextData&lt;/em&gt; (which actually keeps the information and it is embedded in the WorkDescriptor class).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;InstrumentationContext&lt;/em&gt; behaviour is defined by the plugin itself and has several implementation according with State and Burst generation scheme. These two elements can have different behavior in a context switch. In one case we want only to generate the last event of this type (this is the usual implementation) but in other cases we wanted to generate a complete sequence of events of the same type in the same order they occur (this is the showing stacked event behavior). Currently they are four &lt;em&gt;InstrumentationContext&lt;/em&gt; implementations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;InstrumentationContext&lt;/li&gt;
  &lt;li&gt;InstrumentationContextStackedStates&lt;/li&gt;
  &lt;li&gt;InstrumentationContextStackedBursts&lt;/li&gt;
  &lt;li&gt;InstrumentationContextStackedStatesAndBursts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The plugin itself is responsible for defining the InstrumentationContext behaviour by defining an object of this class and initializing the field _instrumentationContext with a reference to it. Example:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;InstrumentationExample&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Instrumentation&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;nl&quot;&gt;private:&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;InstrumentationContextStackedBursts&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_icLocal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
   &lt;span class=&quot;nl&quot;&gt;public:&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;InstrumentationExtrae&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Instrumentation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_icLocal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;_instrumentationContext&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_icLocal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;instrumentation-examples&quot;&gt;Instrumentation examples&lt;/h4&gt;
&lt;p&gt;In this section we focus in the runtime instrumentation code. We discuss two different examples: a critical runtime piece of code and a work descriptor’s context switch.&lt;/p&gt;

&lt;p&gt;Some runtime chunks of code are bound by instrumentation events in order to measure the duration of this piece of code. An example is a cache allocation. This function is bound by a state event and a burst event. State event will change the current thread’s state to CACHE and the Burst event will keep information of the memory allocation size for the specific call. Here is the example:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nanos_event_key_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Instrumentor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getInstrumentorDictionary&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getEventKey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;cache-malloc&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Instrumentor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;raiseOpenStateAndBurst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CACHE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nanos_event_value_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Instrumentor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;raiseCloseStateAndBurst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;WorkDescriptor’s context switch uses two instrumentation servicesi: wdLeaveCPU() and wdEnterCPU(). The wdLeaveCPU() is called from the leaving task context execution and wdEnterCPU() is called once we are executing the new task.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;   &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getInstrumentor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wdLeaveCPU&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oldWD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;myThread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;switchHelperDependent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oldWD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;newWD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

   &lt;span class=&quot;n&quot;&gt;myThread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setCurrentWD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;newWD&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;NANOS_INSTRUMENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getInstrumentor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wdEnterCPU&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;newWD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the next article we will conclude these series of articles about the instrumentation module of Nanos++ giving an overview of the external runtime instrumentation API and showing some mechanisms which will make easier programmer’s duty.&lt;/p&gt;
</description>
				<pubDate>Fri, 23 Jul 2010 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2010/07/23/instrumenting-nanos-2nd-part.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2010/07/23/instrumenting-nanos-2nd-part.html</guid>
			</item>
		
			<item>
				<title>Instrumenting Nanos++ (1st part)</title>
				<description>&lt;p&gt;Continuing with the series of Nanos++ articles we wanted to shortly describe the instrumentation mechanism. In this post we give an overview of the Instrumentation main components and concepts. In a future post will show how to use them to instrument the runtime.&lt;/p&gt;

&lt;p&gt;The main goal of instrumentation is to get some information about the program execution. In other words, we want to know “&lt;em&gt;What&lt;/em&gt; happens in this &lt;em&gt;WorkDescriptor&lt;/em&gt; running on this &lt;em&gt;Thread&lt;/em&gt; ?”. There are the three main components involved in the instrumentation process: What (we also call it &lt;em&gt;Event&lt;/em&gt;, &lt;em&gt;WorkDescriptor&lt;/em&gt; and &lt;em&gt;Thread&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Events&lt;/strong&gt; are something that happen at a given time or at a given interval of time.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;WorkDescriptors&lt;/strong&gt; are the runtime basic unit of work. They offer a context to execute a piece of code.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Threads&lt;/strong&gt; are logical (or virtual) processors that execute WorkDescriptors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instrumentation is driven through &lt;em&gt;Key&lt;/em&gt;/&lt;em&gt;Value&lt;/em&gt; pairs in which the item Key identifies the semantic of the associated Value (e.g., &lt;em&gt;WorkDescriptor ID&lt;/em&gt; as a Key and a numerical identifier as the associated Value). Keys and Values can be registered in a global dictionary (InstrumentationDictionary) which can be used as a repository.&lt;/p&gt;

&lt;p&gt;Nanos++ defines four different type of events:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Point:&lt;/strong&gt; Punctual events. They have a list of KV pairs.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Bursts:&lt;/strong&gt; Interval events. They have a single KV pair which identify the type of burst that we are creating. The runtime automatically manages a stack of burst of with the same key.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;State:&lt;/strong&gt; Thread state events. They have no KV pair, they have just a numerical code which identifies a runtime state.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Point-to-Point&lt;/strong&gt; (PtP): Two connected punctual events. They have a domain and identifier in order to match origin and destination and also, they have a list of KV pairs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core of the instrumentation behavior is specified through the Instrumentation class. This class implements several type of methods: methods to create events, methods to raise event, WorkDescriptor context swhich methods and finally, specific Instrumentation methods which are actually defined into each derived class (plugins). Specific Instrumentation methods are (ideally) the ones that have to be implemented in each derived Instrumentation class. They are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;initialize():&lt;/strong&gt; this method is executed at runtime startup and can be used to create buffers, auxiliary structures, initialize values (e.g. time stamp), etc.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;finalize():&lt;/strong&gt; this method is executed at runtime shutdown and can be used to dump remaining data into a file or standard output, post-process trace information, delete buffers and auxiliary structures, etc.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;addEventList():&lt;/strong&gt; this method is executed each time the runtime raises an event. It receives a list of events (EventList) and the specific instrumentation class has to deal with each event in this list in order to generate (or not) a valid output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But specific Instrumentation programmers can also overload other base methods in order to get an specific behavior when the plugin is invoked.&lt;/p&gt;
</description>
				<pubDate>Wed, 30 Jun 2010 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2010/06/30/instrumenting-nanos-1st-part.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2010/06/30/instrumenting-nanos-1st-part.html</guid>
			</item>
		
			<item>
				<title>Into Nanos++</title>
				<description>&lt;p&gt;For the last several months we have been working on &lt;a href=&quot;http://nanos.ac.upc.edu/projects/nanox&quot;&gt;NANOS++&lt;/a&gt;, the replacement of our old OpenMP nanos4 runtime. The design objectives that are driving the development are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Extensible support&lt;/li&gt;
  &lt;li&gt;Heterogeneity support&lt;/li&gt;
  &lt;li&gt;Multiple programming model support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our aim has been to enable easy development of different parts of the runtime so researchers have a platform that allows them to try different mechanisms. So far, there are several parts of the runtime which are quite extensible: the scheduling policy, the throttling policy, the barrier implementations, the slicers implementation, the instrumentation layer and the architectural level. This extensibility does not come for free. The runtime overheads are slightly increased, but there should be low enough for results to be meaningful except for cases of extreme-fine grain applications.&lt;/p&gt;

&lt;p&gt;The execution model of the runtime is asynchronous task parallelism. Task can be spawned and then synchronized between them based on point-to-point dependencies. This execution model allows us to implement several programming models on top. Currently we support the &lt;a href=&quot;http://www.bsc.es/smpsuperscalar&quot;&gt;StarSs&lt;/a&gt; model, partially the &lt;a href=&quot;http://www.openmp.org&amp;quot;&amp;gt;OpenMP&quot;&gt;OpenMP&lt;/a&gt; model and the &lt;a href=&quot;http://chapel.cray.com&quot;&gt;Chapel&lt;/a&gt; language model. This model also simplifies support for heterogeneity as one task is easy to spawn to accelerators.&lt;/p&gt;

&lt;p&gt;One of our current focus is to bring our OpenMP support to the level that we had with the previous runtime (note that you also need our &lt;a href=&quot;http://nanos.ac.upc.edu/projects/mcxx&quot;&gt;Mercurium&lt;/a&gt; compiler for this). Another current focus is management of data transfer for heterogeneous platforms such as GPUs or other environments where explicit data management may make sense (such as NUMA architectures). The current git version already has some minimal support for GPUs that use CUDA and where the runtime does all the data transfers.&lt;/p&gt;

&lt;p&gt;Of course the runtime has a lot internal development to enable support features such as instrumentation, debug, synchronization mechanism, etc.&lt;/p&gt;

&lt;p&gt;We’ll probably to do more posts on different specific aspects of the runtime in the future.&lt;/p&gt;
</description>
				<pubDate>Tue, 25 May 2010 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2010/05/25/into-nanos.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2010/05/25/into-nanos.html</guid>
			</item>
		
			<item>
				<title>Where is OpenMP going?</title>
				<description>&lt;p&gt;Christian Terboven, from Aachen University, wrote a nice &lt;a href=&quot;http://terboven.wordpress.com/2009/10/04/how-openmp-is-moving-towards-version-3-1-4-0/&quot;&gt;summary&lt;/a&gt; of recent discussions in the OpenMP language committee which I recommend for people interested in the development of OpenMP.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://nanos.ac.upc.edu/git?p=mcxx.git;a=summary&quot;&gt;git version&lt;/a&gt; of our Mercurium compiler already implements a prototype version of the future user defined reductions (UDRs) for OpenMP. I’ll try to write a bit more about them in the near future so people can give it a try :-)&lt;/p&gt;
</description>
				<pubDate>Wed, 07 Oct 2009 00:00:00 +0000</pubDate>
				<link>https://pm.bsc.es/2009/10/07/where-is-openmp-going.html</link>
				<guid isPermaLink="true">https://pm.bsc.es/2009/10/07/where-is-openmp-going.html</guid>
			</item>
		
	</channel>
</rss>
