High Performance Compilers

Uniform Random Sampling in Polyhedra

Publication Source: 10th International Workshop on Polyhedral Compilation Techniques; IMPACT 2020, Bologna, Italy

We propose a method for generating uniform samples among a domain of integer points defined by a polyhedron in a multidimensional space. The method extends to domains defined by parametric polyhedra, in which a subset of the variables are symbolic. We motivate this work by a list of applications for the method in computer science. The proposed method relies on polyhedral ranking functions, as well as a recent inversion method for them, named trahrhe expressions.

Google Scholar    Article

Automatic Parallelization to Asynchronous Task- Based Runtimes Through a Generic Runtime Layer

Publication Source: IEEE High Performance Extreme Computing Conference (HPEC) 2019, Waltham, MA

With the end of Moore’s law, asynchronous taskbased parallelism has seen growing support as a parallel programming paradigm, with the runtime system offering such advantages as dynamic load balancing, locality, and scalability. However, there has been a proliferation of such programming systems in recent years, each of which presents different performance tradeoffs and runtime semantics. Developing applications on top of these systems thus requires not only application expertise but also deep familiarity with the runtime, exacerbating the perennial problems of programmability and portability. This work makes three main contributions to this growing landscape. First, we extend a polyhedral optimizing compiler with techniques to extract task-based parallelism and data management for a broad class of asynchronous task-based runtimes. Second, we introduce a generic runtime layer for asynchronous task-based systems with representations of data and tasks that are sparse and tiled by default, which serves as an abstract target for the compiler backend. Finally, we implement this generic layer using OpenMP and Legion, demonstrating the flexibility and viability of the generic layer and delivering an end-to-end path for automatic parallelization to asynchronous task-based runtimes. Using a wide range of applications from deep learning to scientific kernels, we obtain geometric mean speedups of 23.0 (OpenMP) and 9.5 (Legion) using 64 threads.
Google Scholar    Article

Systems and Methods for Solving Unrestricted Incremental Constraint Problems

Publication Source: Patent US10402747B2

We present the architecture of a high-performance constraint solver R-Solve that extends the gains made in SAT performance over the past fifteen years on static decision problems to problems that require on-the-fly adaptation, solution space exploration and optimization. R-Solve facilitates collaborative parallel solving and provides an efficient system for unrestricted incremental solving via Smart Repair. R-Solve can address problems in dynamic planning and constrained optimization involving complex logical and arithmetic constraints.
Google Scholar    Article

Polyhedral Tensor Schedulers

Publication Source: 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland

Compiler optimizations based on the polyhedral model are able to automatically parallelize and optimize loopbased code. We acknowledge that while polyhedral techniques can represent a broad set of program transformations, important classes of programs could be parallelized just as well using less general but more tractable techniques. We apply this general idea to the polyhedral scheduling phase, which is one of the typical performance bottlenecks of polyhedral compilation. We focus on a class of programs in which enough parallelism is already exposed in the source program, and which includes Deep Learning layers and combinations thereof, as well as multilinear algebra kernels. We call these programs ”tensor codes”, and consequently call ”tensor schedulers” the tractable polyhedral scheduling techniques presented here. The general idea is that we can significantly speed up polyhedral scheduling by restricting the set of transformations considered. As an extra benefit, having a small search space allows us to introduce non-linear cost models, which fills a gap in polyhedral cost models.

Systems and Methods for Automatic Data Management for an Asynchronous Task-Based Runtime

Publication Source: Patent US10241811B2

A compilation system can define, at compile time, the data blocks to be managed by an Even Driven Task (EDT) based runtime/platform, and can also guide the runtime/platform on when to create and/or destroy the data blocks, so as to improve the performance of the runtime/platform. The compilation system can also guide, at compile time, how different tasks may access the data blocks they need in a manner that can improve performance of the tasks.
Google Scholar    Article

1 2 3 11