The Open MP Parallel Programming-MAH_041216_58507_2_501983

Research into models and languages for parallel and distributed programming

(ANY LANGUAGE EXCEPT JAVA, C AND C++)

NOTE: This assignment involves the in-depth evaluation of a programming language in terms of history, language features, paradigms supported, application areas, and future development.

Section 1: INTRODUCTION

1.1 What was the goal of the project? Why you choose this topic?

1.2 How does the report achieve this goal? You can give a brief overview of the sections that comprise the report.

Section 2: HISTORY

2.1 Introduction – What is the goal of this section?

2.2 Creation -Who created the language? Why was the language created? What application domains is the language good for? Why?

2.3 Lifetime

What has happened to the language since it was created? Rate the popularity of the language, then, now, and in the future. What is your evidence for this rating?

Section 3: BASIC FEATURES

3.1 Introduction – What is the goal of this section?

3.2 Paradigm

3.3 Data Types

3.4 Control Flow

3.5 Subprograms

3.6 Object-orientation

3.7 Other.

Section 4: TUTORIAL

4.1 Introduction – What is the goal of this section?

4.2 How Do you create a basic hello world program? Include a description of how you use the Programming environment to edit, build, and run your program.

4.3 – 4.X 2 or 3 major features, one for each section.

4.X Resources – What books/manuals/web sites should I go to for further information about the language? Make sure you cite the material here and include an entry in the bibliography.

Section 5: EVALUATION

5.1 Introduction – What is the goal of this section? Sebesta has a list of language evaluation criteria in chapter one. The criteria are grouped into three broad categories: readability, writability, and reliability. Evaluate the language with respect to the following characteristics:

5.2 Simplicity/orthogonality

5.3 Control Structures

5.4 Data types

5.5 Syntax Design

5.6 Support for abstraction

5.7 Expressivity

5.8 Type checking

5.9 Conclusion – what is your overall feeling about the language?

REFERENCES Use references to strengthen your report. Don’t make a bold statement without backup! Since many of you will need to use the web as a major source of information, pay particular attention to how web references are documented. Also avoid referencing non-authoritative websites such as Wikipedia. Seek out information with some weight, such as software company sites, standards bodies, language designers, or other official documentation for the programming language.

Use this format for citing a reference in the main text of the report:

… The language evaluation characteristics were taken from [Sebesta, 2005]….

This citation would link the reader to the following entry in the references section of the report: Sebesta, R. (2005).

Concepts of Programming Languages, Seventh Edition. Boston, MA: Addison-Wesley.

The OpenMP Parallel Programming

Name of the Student

Name of the University

Table of Contents

Section 1: Introduction. 3

1.1 Selection of Programming Language (OpenMP) 3

1.2 Brief Overview of the Report 3

Section 2: History. 3

2.1 Introduction. 3

2.2 Creation. 4

2.3 Lifetime. 4

Section 3: Basic Features. 6

3.1 Introduction. 6

3.2 Paradigm: 6

3.3 Data Types: 8

3.4 Control Flow.. 8

3.5 Sub Programs: 8

3.6 Object Orientation: 9

3.7 Others (Permitting Multiparadigm Parallelism with C++): 11

Section 4: Tutorial 11

4.1 Introduction. 11

4.2 Process of Creating Hello World Programming through OpenMP Language: 11

4.3 Major Feature. 14

Section 5: Evaluation. 16

5.1 Introduction. 16

5.2 Simplicity: 16

5.3 Control Structure: 17

5.4 Data Types: 17

5.5 Syntax Design: 18

5.6 Support of Abstraction: 18

5.7 Expressivity: 18

5.8 Type Checking: 19

5.9 Conclusion: 19

References: 20

 

 

 

Section 1: Introduction

1.1 Selection of Programming Language (OpenMP)

Within the study, the description of the one parallel programming language such as OpenMp is provided. Parallel processing is a type of calculation that permits numerous guidelines in a program to run all the while, in parallel (McCool, Robison and Reinders 2012). Parallel programming is accomplished on a private PC that is equipped with several processors, various person PCs associated by a system or a mix of the two. For the study the OpenMp has been chosen because OpenMP is required like never before to convey superior process in this period of parallel figuring and I’m pleased to welcome James as our first keynote (Joppich et al. 2015).

1.2 Brief Overview of the Report

For describing the aspects of the OpenMP the structure of the repost has been created in a meaningful and understandable manner. At first the description of history, the creation and lifetime, is stated in the report. Then the basic features such as paradigm, data types, sub programs and many more has been introduced in the study. A tutorial for making the report more specific is provided. At last, the evaluation of the programming language is provided.

Section 2: History

2.1 Introduction

Within the section, the overview of OpenMp parallel programming language, from the creation to the lifecycle, is provided.

2.2 Creation

The OpenMP ARB or Architecture Review Board distributed its first API details, OpenMP regarding Fortran 1.0, on October 1997. On October, the next year, the organization released the C/C++ standard. 2000 saw rendition 2.0 of the Fortran particulars with adaptation 2.0 of the C/C++ determinations being discharged in 2002. Form 2.5 is a consolidated C/C++/Fortran determination that was discharged in 2005 (Schwarz et al. 2016).

2.3 Lifetime

Up to form 2.0, OpenMP principally determined approaches to parallelize exceptionally standard circles, as they happen in grid arranged numerical programming, where the quantity of emphasess of the circle is known at passage time. This was perceived as an impediment, and different assignment parallel augmentations were added to usage (Diaz, Munoz-Caro and Nino 2012). In 2005, a push to institutionalize undertaking parallelism was framed, which distributed a proposition in 2007, taking motivation from assignment parallelism highlights in Cilk, X10 and Chapel.

Adaptation 3.0 was discharged in May 2008. Incorporated into the new components in 3.0 is the idea of assignments and the undertaking build, fundamentally expanding the extent of OpenMP past the parallel circle develops that made up the vast majority of OpenMP 2.0 (Schwarz et al. 2016).

In the middle of nineties, shared-memory machines’ vendors provided comparative, order based, augmentations that is FORTRAN programming:

  1. Customer would increase a sequential Fortran program along with authorizations representing the circles that needed to be parallelized
  2. Compiler is in command of consequently parallelizing such circles over the SMP processors

Developments were entirely practically comparative, through the process was separating. First endeavor at a standard was the draft for ANSI X3H5 in 1994.

Software wholesalers:

  1. Absoft Corporation
  2. Edinburgh Portable Compilers (Diaz, Munoz-Caro and Nino 2012)
  • GENIAS Software GmBH o Myrias Computer Technologies, Inc.
  1. The Portland Group, Inc. (PGI)

Application developers:

  1. ADINA R&D, Inc.
  2. ANSYS, Inc.
  • Dash Associates
  1. Fluent, Inc. (Cesar et al. 2015)
  2. ILOG CPLEX Division
  3. Livermore Software Technology Corporation (LSTC)
  • MECALOG SARL
  • Oxford Molecular Group PLC
  1. The Numerical Algorithms Group Ltd.(NAG)

OpenMp’s future:

  1. Device develop improvements:
  • more control and adaptability in indicating information development amongst host and gadgets
  • asynchronous, information stream execution bolster with expansion of nowait and depends (Ankerholz 2016)
  • multiple gadgets
  • “deep duplicate” for pointer-based structures/objects
    1. Loop parallelism improvements:
  • extended requested condition to bolster do-crosswise over (e.g. wavefront) parallelism for circle homes (Forum.openmp.org 2016)
  • new taskloop develop for nonconcurrent circle parallelism with control over errand grain estimate
  • Array diminishments for C and C++
  1. Under thought:
  • memory proclivity (Ankerholz 2016)
  • task needs (likely)

Section 3: Basic Features

3.1 Introduction

Within this section the description of the basic characteristics, such as data types, paradigm, control flow and many more, of OpenMp is described.

3.2 Paradigm:

Three types of paradigm is supported by OpenMp. The paradigms are multi-threaded parallelism, Hybrid MPI/OpenMP Parallelism and pure MPI parallelism.

Multi-threaded parallelism: The expected two-cluster strategy entertains a settled do loopsto perform list rearranges. Opening following calculation can likewise be effortlessly parallelized utilizing a multi-strung approach as a part of a shared memory multi-processor setting for accelerating information rearranges, and furthermore for re-composing a database on a SMP server (Duy et al. 2012). As specified in Ding, the opportunity following cycles are non-covering. In the event that we allocate a string to every opportunity subsequent cycle, they can continue freely and all the while. The cycle era code runs first in the instatement stage before the genuine information reshuffle, to decide the quantity of autonomous opening subsequent cycles and connected cycle time-spans as well as beginning areas. This cycle information can be consumed within a table, each cycle route with a cycle length and beginning area balance. The beginning counterbalance remarkably decides the cycle, and the cycle length chooses the workload.

Pure MPI parallelism: It includes nearby cluster list reshuffles and worldwide information trades (Grosset et al. 2015). The purpose is to rearrange 3D exhibit within processors through the end goal that information focuses along a specific measurement is totally locally reachable with the processor. The information access alongside this measurement evaluates to the speediest compiling stockpiling file, similarly such as the standard cluster reverse.

Hybrid MPI/OpenMP Parallelism: A variation of the half-and-half parallelism is to make a few MPI undertakings (Unix forms) on a SMP hub and utilize multi-strung OpenMP inside each such MPI errand (Losada et al. 2016).

3.3 Data Types:

  1. omp_lock_t: A sort that holds the status of a bolt, whether the bolt is accessible or if a string claims a bolt.
  2. omp_nest_lock_t: A sort that holds one of the accompanying bits of data about a bolt: whether the bolt is accessible, and the personality of the string that claims the bolt and a settling tally (Docs.oracle.com 2016).

3.4 Control Flow

The control flow of the parallel programs are way more complex than the sequential programs as individual threads has independent control flow. Normally, as the individual elements of the control flow symbols is the supply code sections equivalent to entire OpenMP creates such as critical sections, parallel regions, functions or user-defined regions.

3.5 Sub Programs:

  1. Subprograms is identified from parallel region
  2. Static quantity is code enclosed lexically
  • Dynamic degree includes static degree as well as announcements within the requesting tree (Computing.llnl.gov 2016)
  1. The requested subprogram can include openMP orders for controlling the parallel environment
      • commands within factor level however is not within static degree are requested Orphan orders

!$OMP PARALLEL

call whoami

!$OMP PARALLEL

Subroutine whoami

iam = omp_get_thread_num()

!$OMP CRITICAL

print *, Hello from , iam

!$OMP END CRITICAL

return

end

static

dynamic

orphan

3.6 Object Orientation:

In programming dialects, a polymorphic question is an element, for example, a variable or a methodology, that can hold or work on benefits of varying sorts amid the program’s execution. Since a polymorphic question can work on an assortment of qualities and sorts, it can likewise be utilized as a part of an assortment of projects, now and again with next to zero change by the software engineer. Write once, run many, otherwise called code reusability, is a critical trademark to the programming worldview known as Object-Oriented Programming (OOP) (Wolf et al. 2013). OOP depicts a way to deal with programming where a program is seen as a gathering of interfacing, yet generally free programming parts. These product segments are known as articles in OOP and they are commonly executed in a programming dialect as an element that embodies both information and systems.

OpenMP is considered as a business customary in terms of conducting programming on shared memory multiprocessors. OpenMP is particularly proper regarding communicating circle on the basis of parallelism within multithreaded C, Fortran, and C++ programs. The existing OpenMP customary have constraints recognized by means of the arrangement and booking of singular levels of parallelism (Pinho and Carvalho 2014). Production line differs as of OpenMP within so as to provide a insipid protest situated programming atmosphere to communicate several kinds of parallelism specifically and within a hurdle together way, at the time of giving the significant runtime sustain for adequately planning every types of parallelism (Wolf et al. 2013).

Every single distinctive sort of Factory employment units trade distinctive API to the developer as an idea for upgrading ability of programming. However, with a explicit end goal to separate inside between different kinds of labor units and provide the necessary worth in terms of every situation. Factory work units are sorted out in a legacy progression. The chain of importance structure encourages the expansion of innovative sorts of job, or the modification of accessible types, with no meddling through disconnected sorts (Locans, Adelmann and Suter 2015).

3.7 Others (Permitting Multiparadigm Parallelism with C++):

Legacy permits the outflow of various types of parallelism, with various properties, by means of a typical interface. Manufacturing plant misuses the C++ layouts system with a specific end goal to adjust the usefulness as per prerequisites of the distinguishing sorts of parallel job. Therefore, Factory authorizes software engineers to effectively express various types of parallel job, by means of several properties, throughout a distinctive interface (Computing.llnl.gov 2016). In the meantime, they be able to effectively carry out the parallel job, straightforwardly utilizing suitable calculations as well as components to supervise parallelism.

Section 4: Tutorial

4.1 Introduction

Within this section the example of the OpenMp programming such as printing hello world is given for providing the basic layout.

4.2 Process of Creating Hello World Programming through OpenMP Language:

#include <omp.h>

#include <stdio.h>

#include <stdlib.h>

 

int main (int argc, char *argv[])

{

int nthreads, tid;

 

/* Fork a team of threads giving them their own copies of variables */

#pragma omp parallel private(nthreads, tid)

{

Edit, build and run in OpenMp: Using your favorite text editor (vi/vim, emacs, nedit, gedit, nano…) open a new file – call it whatever you’d like.

  1. Create a simple OpenMP program that does the following:
    • Generates a parallel region
    • Has every thread in the parallel area get hold of its thread id
    • Has every thread print “Hello World” all along with its sole thread id (Computing.llnl.gov 2016)
    • Has the master thread merely, get hold of and after that print the whole quantity of threads

If you need help, see the provided  or  file.

  1. Using your choice of compiler (see above section 4), compile your hello world OpenMP program. This may take several attempts if there are any code errors. For example:
C: icc -openmp omp_hello.c -o hello

pgcc -mp omp_hello.c -o hello

gcc -fopenmp omp_hello.c -o hello

Fortran: ifort -openmp omp_hello.f -o hello

pgf90 -mp omp_hello.f -o hello

gfortran -fopenmp omp_hello.f -o hello

  1. When you get a clean compile, proceed.
  2. Run your hello executable and notice its output.
    • Is it what you anticipated? As an examination, you can gather and run the gave omp_hello.c or omp_hello.f case program.
    • How many strings were made? As a matter of course, the Intel and GNU compilers will make 1 string for every center. The PGI compiler will make just 1 string all out (Docs.oracle.com 2016).
  1. Notes:
    • For the rest of this work out, you can utilize your preferred compiler charge unless demonstrated generally.
    • Compilers will contrast in which notices they issue, yet all can be disregarded for this work out. Blunders are distinctive, obviously.

 

/* Obtain thread number */

tid = omp_get_thread_num();

printf(“Hello World from thread = %d\n”, tid);

 

/* Only master thread does this */

if (tid == 0)

{

nthreads = omp_get_num_threads();

printf(“Number of threads = %d\n”, nthreads);

}

 

}  /* All threads join master thread and disband */

 

}

4.3 Major Feature

C / C++ Directives Format: The general rules are as following.

  1. Case responsive
  2. Commands pursue principles of the C or C++ principles in terms of compiler commands
  • Simply one directive-name might be precise as per command

Shared memory programming: Shared-memory frameworks ordinarily give both static and element prepare creation. That is, procedures can be made toward the start of program execution by an order to the working framework, or they can be made amid the execution of the program. The best-known element handle creation capacity is fork (Hoefler et al. 2013). A run of the mill usage will permit a procedure to begin another, or kid, prepare by a fork. Three procedures commonly oversee organizing among procedures in shared memory programs. The beginning, or a parent, process can sit tight for the end of the youngster procedure by calling join. The second keeps forms from shamefully getting to shared assets. The third gives a way to synchronizing the procedures.

The common memory model is like the information parallel model. It has a solitary address space. It is like the message passing model in that it is multithreading and synchronous. In any case, information dwell in a solitary, shared address space, therefore does not need to be unequivocally distributed (Hoefler et al. 2012). Workload can be either unequivocally or certainly apportioned. Correspondence is done certainly through shared peruses and composes of factors. Be that as it may, synchronization is unequivocal.

Section 5: Evaluation

5.1 Introduction

Within this section, the evaluation of OpenMp is conducted as per the writ ability, readability and reliability.

5.2 Simplicity:

The interpreter changes over a C program utilizing OpenMP parallel develops into a message-passing project. This is refined by first changing the OpenMP program to a SPMD shape; the subsequent code speaks to a hub program for each thread3 that works on parceled information (Liu et al. 2014). The SPMD-style representation has the accompanying properties: (i) the work of parallel locales is equally partitioned among the procedures; (ii) serial areas are needlessly executed by every single taking an interest procedure; and (iii) the virtual address space of shared information is duplicated on all procedures. Shared information is not physically duplicated – just the information really got to by a procedure is physically dispensed on that procedure. Next, the compiler performs cluster information stream investigations, which gather composed and read shared exhibit references for the hub program as typical expressions. The compiler embeds work calls to pass these expressions to the runtime framework (Park et al. 2014). At last, at every synchronization point, the runtime framework utilizes the expressions from the compiler to figure the between string cover amongst composed and read information and decides the suitable MPI correspondence messages.

The interpretation framework ensures that mutual information is rational at OpenMP synchronization develops just, as characterized by OpenMP semantics. An information race that is not appropriately synchronized in the info OpenMP program may prompt to unspecified aftereffects of the interpreted code. The interpretation framework is executed utilizing the Cetus compiler foundation (Kang, Ha and Jun 2014). We likewise built up the runtime library. To bolster interprocedural investigation, we utilize subroutine inline development.

5.3 Control Structure:

OpenMP takes a stab at a moderate arrangement of control structures. Encounter has demonstrated that lone a couple control structures are really essential for composing most parallel applications. For instance, in the Silicon Graphics DOACROSS demonstrate, the main control structure is the DOACROSS mandate, yet this is ostensibly the most broadly utilized shared memory programming model for logical processing (Saad 2016). A large portion of the control structures gave by X3H5 can be insignificantly modified in OpenMP with no execution punishment. OpenMP incorporates control structures just in those examples where a compiler can give both usefulness and execution over what a client could sensibly program.

The cases above utilized just three control structures: PARALLEL, DO and SINGLE. Obviously the compiler includes usefulness in PARALLEL and DO orders. For SINGLE, the compiler includes execution by permitting the primary string achieving the SINGLE mandate to execute the code (Kandalla et al. 2016). This is nontrivial for a client to program.

5.4 Data Types:

The data type of the programming language mainly depend on language, such as FORTAN or C++, we chose to follow. As the basic data types are used in the language, the use of the data types make the language more writeable.

5.5 Syntax Design:

The compiler orders are called pragmas, with language structure # pragma where the # shows up in segment 1 and the rest of the mandate is adjusted to whatever remains of the code. pragmas are just permitted to be one line long; so on the off chance that one happens to require more than one line, the line can be kept utilizing or toward the end of halfway lines (Olivier 2013).

5.6 Support of Abstraction, Encapsulation and Polymorphism:

Abstraction: Reflection for parallelism is vital so we can abstain from coding in the realm of POSIX strings (p-strings) and Windows strings. Our first thought ought to go to OpenMP, a reflection accessible in many compilers, on most frameworks, for C++ and Fortran.The OpenMP particular is currently in form 2.5, which cleaned up the report to make these augmentations archived for C++ and Fortran in one basic determination (Scogland et al. 2015).

The heart of OpenMP is an arrangement of compiler orders that stretch out C++ and Fortran to exploit shared memory parallelism. You include these orders as pragmas in C++, or uncommon remarks in Fortran. This implies compilers without support for OpenMP will basically disregard the mandates and your program will order and keep running as though you had done nothing. This regressive similarity is extremely valuable (Protze et al. 2015). At the point when a compiler supports OpenMP, it will utilize the mandates as clues from the software engineer on when and how to endeavor parallelism in the code it produces.

Polymorphism: The interfaces that OpenMP uses do not provide the ability of sharing attributes. At compile time, clauses for supporting variable privatizations, deductions and mainly data mapping in terms of device constructs are supported by the language. As the language support those features, the static polymorphism is not supported by OpenMP (Scogland et al. 2015).

Encapsulation: OpenMP augmentations now give the capacity to run code on both the host and a gadget in a “work sharing” way inside a solitary program. The execution demonstrate begins on a host processor. Areas of code encapsulated by OpenMP target mandates are propelled for execution on a gadget, while alternatively permitting the host to execute in parallel with the gadget (Pop and Cohen 2013). The host controls all the assignment of gadget memory, exchange of information, lining target executions on a line, and dealing with their culmination. Through the encapsulation the language allow the developer to avoid unintended data dependencies and improve quality of data.

5.7 Expressivity:

The current OpenMP particular gives the capacity to make autonomous errands yet does not have an arrangement for undertaking to-assignment synchronization, which restrains its expressivity for parallelizing some regular calculations (LaGrone 2013). Errands can be the fundamental execution unit of future OpenMP runtime usage, so the capacity to utilize them in an adaptable way will require expanded expressivity and adaptability before appropriation by clients will get to be across the board. Improvements to the API will empower adaptability and profitability in existing and future usage.

5.8 Type Checking:

Static sort checking is sort watching that is done at aggregate time. This is the main kind of sort watching that C++ does. Dynamic sort checking is sort checking done at run time (Pop and Cohen 2013). This is generally found in element deciphered dialects, yet is less regular in incorporated dialects.

5.9 Conclusion:

Within the study, it has been examined that the parallel effectiveness of a 3-D OSEM calculation executed with an unadulterated MPI and a half and half MPI-OpenMP approach. Our underlying inspiration was to enhance the productivity past that of a strict message-passing methodology. The half and half MPI-OpenMP approach exploited the heap adjusting component of OpenMP and the innately bring down inertness of shared memory strings crosswise over processors inside a hub and demonstrated a 7% to 17% change as far as PE on 4 to 64 processors, when contrasted with the unadulterated MPI approach. This crossover approach is especially critical as patterns towards multicore processors and bigger SMPs keep on proving more financially savvy in bunch figuring. While we have centered in this work on a specific application (a 3-D OSEM recreation code), we anticipate that this half-and-half approach will turn out to be progressively significant to quicken applications with comparative attributes.

References:

Ankerholz, A. (2016). What’s Ahead for OpenMP? » ADMIN Magazine. [online] ADMIN Magazine. Available at: http://www.admin-magazine.com/HPC/Articles/What-s-Ahead-for-OpenMP [Accessed 28 Nov. 2016].

Cesar, E., Cortés, A., Espinosa, A., Margalef, T., Moure, J.C., Sikora, A. and Suppi, R., 2015. Teaching Parallel Programming in Interdisciplinary Studies. In European Conference on Parallel Processing(pp. 66-77). Springer International Publishing.

Computing.llnl.gov. (2016). OpenMP Exercise. [online] Available at: https://computing.llnl.gov/tutorials/openMP/exercise.html [Accessed 28 Nov. 2016].

Computing.llnl.gov. (2016). OpenMP. [online] Available at: https://computing.llnl.gov/tutorials/openMP/ [Accessed 28 Nov. 2016].

Diaz, J., Munoz-Caro, C. and Nino, A., 2012. A survey of parallel programming models and tools in the multi and many-core era. IEEE Transactions on parallel and distributed systems23(8), pp.1369-1386.

Docs.oracle.com. (2016). 2.3 OpenMP Environment Variables (Sun Studio 12: OpenMP API User’s Guide). [online] Available at: https://docs.oracle.com/cd/E19205-01/819-5270/aewcb/index.html [Accessed 28 Nov. 2016].

Docs.oracle.com. (2016). Chapter 2 Compiling and Running OpenMPPrograms (Sun Studio 12 Update 1: OpenMP API User’s Guide). [online] Available at: https://docs.oracle.com/cd/E19205-01/820-7883/aewbx/index.html [Accessed 28 Nov. 2016].

Duy, T.V.T., Yamazaki, K., Ikegami, K. and Oyanagi, S., 2012. Hybrid MPI-OpenMP Paradigm on SMP clusters: MPEG-2 Encoder and n-body Simulation. arXiv preprint arXiv:1211.2292.

Forum.openmp.org. (2016). OpenMP® Forum • View topic – On the future of global threadprivate variables in OpenMP. [online] Available at: http://forum.openmp.org/forum/viewtopic.php?f=7&t=1001 [Accessed 28 Nov. 2016].

Grosset, A.P., Prasad, M., Christensen, C., Knoll, A. and Hansen, C.D., 2015. TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism. In EGPGV (pp. 67-76).

Hoefler, T., Dinan, J., Buntinas, D., Balaji, P., Barrett, B., Brightwell, R., Gropp, W., Kale, V. and Thakur, R., 2013. MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory. Computing95(12), pp.1121-1136.

Hoefler, T., Dinan, J., Buntinas, D., Balaji, P., Barrett, B.W., Brightwell, R., Gropp, W., Kale, V. and Thakur, R., 2012. Leveraging MPI’s one-sided communication interface for shared-memory programming. In European MPI Users’ Group Meeting (pp. 132-141). Springer Berlin Heidelberg.

Joppich, M., Schmidl, D., Bolger, A.M., Kuhlen, T. and Usadel, B., 2015. PAGANtec: OpenMP parallel error correction for next-generation sequencing data. In International Workshop on OpenMP (pp. 3-17). Springer International Publishing.

Kandalla, K., Mendygral, P., Radcliffe, N., Cernohous, B., Knaak, D., McMahon, K. and Pagel, M., 2016. Optimizing Cray MPI and SHMEM Software Stacks for Cray-XC Supercomputers based on Intel KNL Processors.

Kang, M.S., Ha, O.K. and Jun, Y.K., 2014. Visualization tool for debugging data races in structured fork-join parallel programs. International Journal of Software Engineering and Its Applications8(4), pp.157-168.

LaGrone, J., 2013. Enhancing the Expressivity of OpenMP API through Task-to-Task Synchronization (Doctoral dissertation, University of Houston).

Liu, J., He, M., Zhang, K., Wang, B. and Qiu, Q., 2014. Parallelization of the Multilevel Fast Multipole Algorithm by Combined Use of OpenMP and VALU Hardware Acceleration. IEEE Transactions on Antennas and Propagation62(7), pp.3884-3889.

Locans, U., Adelmann, A. and Suter, A., 2015. DYNAMIC KERNEL SCHEDULER (DKS)–ACCELERATING THE OBJECT ORIENTED PARTICLE ACCELERATOR LIBRARY (OPAL).

Losada, N., Martin, M.J., Rodriguez, G. and González, P., 2016. Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications. Procedia Computer Science80, pp.19-29.

McCool, M.D., Robison, A.D. and Reinders, J., 2012. Structured parallel programming: patterns for efficient computation. Elsevier.

Olivier, S.L., 2013. Design issues in the semantics and scheduling of asynchronous tasks. Technical report, Sandia National Laboratories (SNLNM), Albuquerque, NM (United States).

Park, M.C., Ha, O.K., Ha, S.W. and Jun, Y.K., 2014. Real-time 3D simulation for the trawl fishing gear based on parallel processing of sonar sensor data. International Journal of Distributed Sensor Networks2014.

Pinho, E.G. and de Carvalho, F.H., 2014. An object-oriented parallel programming language for distributed-memory parallel computing platforms. Science of Computer Programming80, pp.65-90.

Pop, A. and Cohen, A., 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Transactions on Architecture and Code Optimization (TACO)9(4), p.53.

Protze, J., Laguna, I., Ahn, D.H., Del Signore, J., Burton, A., Schulz, M. and Müller, M.S., 2015. Lessons Learned from Implementing OMPD: A Debugging Interface for OpenMP. In International Workshop on OpenMP (pp. 89-101). Springer International Publishing.

Saad, M.M., 2016. Extracting Parallelism from Legacy Sequential Code Using Transactional Memory (Doctoral dissertation, Virginia Polytechnic Institute and State University).

Scogland, T.R., Keasler, J., Gyllenhaal, J., Hornung, R., de Supinski, B.R. and Finkel, H., 2015. Supporting indirect data mapping in openmp. In International Workshop on OpenMP (pp. 260-272). Springer International Publishing.

Wolf, C., Dotzler, G., Veldema, R. and Philippsen, M., 2013. Object support for OpenMP-style programming of GPU clusters in Java. In Advanced Information Networking and Applications Workshops (WAINA), 2013 27th International Conference on (pp. 1405-1410). IEEE.