R-Stream®

Learn more about our advanced compiler technology »

Uniform Random Sampling in Polyhedra



Publication Source: 10th International Workshop on Polyhedral Compilation Techniques; IMPACT 2020, Bologna, Italy

We propose a method for generating uniform samples among a domain of integer points defined by a polyhedron in a multidimensional space. The method extends to domains defined by parametric polyhedra, in which a subset of the variables are symbolic. We motivate this work by a list of applications for the method in computer science. The proposed method relies on polyhedral ranking functions, as well as a recent inversion method for them, named trahrhe expressions.

Google Scholar    Article

Systems and Methods for Minimizing Communications



Publication Source: Patent US20160196086A1

A system for allocation of one or more data structures used in a program across a number of processing units takes into account a memory access pattern of the data structure, and the amount of total memory available for duplication across the several processing units. Using these parameters duplication factors are determined for the one or more data structures such that the cost of remote communication is minimized when the data structures are duplicated according to the respective duplication factors while allowing parallel execution of the program.
Google Scholar    Article

Automatic Parallelization to Asynchronous Task- Based Runtimes Through a Generic Runtime Layer



Publication Source: IEEE High Performance Extreme Computing Conference (HPEC) 2019, Waltham, MA

With the end of Moore’s law, asynchronous taskbased parallelism has seen growing support as a parallel programming paradigm, with the runtime system offering such advantages as dynamic load balancing, locality, and scalability. However, there has been a proliferation of such programming systems in recent years, each of which presents different performance tradeoffs and runtime semantics. Developing applications on top of these systems thus requires not only application expertise but also deep familiarity with the runtime, exacerbating the perennial problems of programmability and portability. This work makes three main contributions to this growing landscape. First, we extend a polyhedral optimizing compiler with techniques to extract task-based parallelism and data management for a broad class of asynchronous task-based runtimes. Second, we introduce a generic runtime layer for asynchronous task-based systems with representations of data and tasks that are sparse and tiled by default, which serves as an abstract target for the compiler backend. Finally, we implement this generic layer using OpenMP and Legion, demonstrating the flexibility and viability of the generic layer and delivering an end-to-end path for automatic parallelization to asynchronous task-based runtimes. Using a wide range of applications from deep learning to scientific kernels, we obtain geometric mean speedups of 23.0 (OpenMP) and 9.5 (Legion) using 64 threads.
Google Scholar    Article

Polyhedral Tensor Schedulers



Publication Source: 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland

Compiler optimizations based on the polyhedral model are able to automatically parallelize and optimize loopbased code. We acknowledge that while polyhedral techniques can represent a broad set of program transformations, important classes of programs could be parallelized just as well using less general but more tractable techniques. We apply this general idea to the polyhedral scheduling phase, which is one of the typical performance bottlenecks of polyhedral compilation. We focus on a class of programs in which enough parallelism is already exposed in the source program, and which includes Deep Learning layers and combinations thereof, as well as multilinear algebra kernels. We call these programs ”tensor codes”, and consequently call ”tensor schedulers” the tractable polyhedral scheduling techniques presented here. The general idea is that we can significantly speed up polyhedral scheduling by restricting the set of transformations considered. As an extra benefit, having a small search space allows us to introduce non-linear cost models, which fills a gap in polyhedral cost models.


Systems and Methods for Automatic Data Management for an Asynchronous Task-Based Runtime



Publication Source: Patent US10241811B2

A compilation system can define, at compile time, the data blocks to be managed by an Even Driven Task (EDT) based runtime/platform, and can also guide the runtime/platform on when to create and/or destroy the data blocks, so as to improve the performance of the runtime/platform. The compilation system can also guide, at compile time, how different tasks may access the data blocks they need in a manner that can improve performance of the tasks.
Google Scholar    Article

1 2 3 9