High Performance Compilers



Polyhedral Tensor Schedulers



Publication Source: 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland

Compiler optimizations based on the polyhedral model are able to automatically parallelize and optimize loopbased code. We acknowledge that while polyhedral techniques can represent a broad set of program transformations, important classes of programs could be parallelized just as well using less general but more tractable techniques. We apply this general idea to the polyhedral scheduling phase, which is one of the typical performance bottlenecks of polyhedral compilation. We focus on a class of programs in which enough parallelism is already exposed in the source program, and which includes Deep Learning layers and combinations thereof, as well as multilinear algebra kernels. We call these programs ”tensor codes”, and consequently call ”tensor schedulers” the tractable polyhedral scheduling techniques presented here. The general idea is that we can significantly speed up polyhedral scheduling by restricting the set of transformations considered. As an extra benefit, having a small search space allows us to introduce non-linear cost models, which fills a gap in polyhedral cost models.


Systems and Methods for Footprint Based Scheduling



Publication Source: Patent US10095494B2

A system can generate and impose constraints on a compiler/scheduler so as to specifically minimize the footprints of one or more program variables. The constraints can be based on scopes of the variables and/or on dependence distances between statements specifying operations that use the one or more program variables.
Google Scholar    Article

Systems and Methods for Efficient Determination of Task Dependences After Loop Tiling



Publication Source: Patent US9613163B2

A compilation system can compile a program to be executed using an event driven tasks (EDT) system that requires knowledge of dependencies between program statement instances, and generate the required dependencies efficiently when a tiling transformation is applied. To this end, the system may use pre-tiling dependencies and can derive post-tiling dependencies via an analysis of the tiling to be applied.
Google Scholar    Article

Methods and Apparatus for Data Transfer Optimization



Publication Source: Patent US9858053B2

Methods, apparatus and computer software product for optimization of data transfer between two memories includes determining access to master data stored in one memory and/or to local data stored in another memory such that either or both of the size of total data transferred and the number of data transfers required to transfer the total data can be minimized. The master and/or local accesses are based on, at least in part, respective structures of the master and local data.
Google Scholar    Article

Methods and Apparatus for Automatic Communication Optimizations in a Compiler Based on a Polyhedral Representation



Publication Source: Patent US9830133B1

Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one local memory unit that allows for data reuse opportunities. The first custom computing apparatus optimizes the code for reduced communication execution on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Google Scholar    Article

1 2 3 9