R-StreamLearn more about our advanced compiler technology »
August 29, 2017
November 13, 2016
Hardware scaling and low-power considerations associated with the quest for exascale and extreme scale computing are driving system designers to consider new runtime and execution models such as the event-driven-task (EDT) models that enable more concurrency and reduce the amount of synchronization. Further, for performance, productivity, and code sustainability reasons, there is an increasing demand for autoparallelizing compiler technologies to automatically produce code for EDT-based runtimes. However achieving scalable performance in extreme-scale systems with auto-generated codes is a non-trivial challenge. Some of the key requirements that are important for achieving good scalable performance across many EDT-based systems are: (1) scalable dynamic creation of task-dependence graph and spawning of tasks, (2) scalable creation and management of data and communications, and (3) dynamic scheduling of tasks and movement of data for scalable asynchronous execution. In this paper, we develop capabilities within R-Stream - an automatic source-to-source optimization compiler - for automatic generation and optimization of code and data management targeted towards Open Community Runtime (OCR) - an exascale-ready asynchronous task-based runtime. We demonstrate the effectiveness of our techniques through performance improvements on various benchmarks and proxy application kernels that are relevant to the extreme-scale computing community.
September 1, 2016
In the last decade, the scope of software optimizations expanded to encompass energy consumption on top of the classical runtime minimization objective. In that context, several optimizations have been developed to improve the software energy efficiency. However, these optimizations commonly rely on long profiling steps and are often implemented as unstable runtime systems, which limits their applicability.
We propose in this paper a new energy model and two associated energy optimizations that can be performed at compilation time, without having to profile the compiled programs. The model predicts the energy consumption of programs at compilation time using the precise software representation available in the polyhedral model. The energy model is then used at the core of two compiler optimizations to generate more efficient programs. The model and the two optimizations have been implemented in the R-Stream compiler.
August 16, 2016
Computers across the board, from embedded to future exascale computers, are consistently designed with deeper memory hierarchies. While this opens up exciting opportunities for improving software performance and energy efficiency, it also makes it increasingly difficult to efficiently exploit the hardware. Advanced compilation techniques are a possible solution to this difficult problem and, among them, the polyhedral compilation technology provides a pathway for performing advanced auto-matic parallelization and code transformations. However, the polyhedral model is also known for its poor scalability with respect to the number of dimensions in the polyhedra that are used for representing programs. Although current compilers can cope with such limitation when targeting shallow hierarchies, polyhedral optimizations often become intractable as soon as deeper hardware hierarchies are considered.
We address this problem by introducing two new operators for polyhedral compilers: focalisation and defocalisation. When applied in the compilation flow, the new operators reduce the dimensionality of polyhedra, which drastically simplifies the mathematical problems solved during the compilation. We prove that the presented operators preserve the original program semantics, allowing them to be safely used in compilers. We implemented the operators in a production compiler, which dras-tically improved its performance and scalability when targeting deep hierarchies.
August 16, 2016
Writing high performance software requires the programmer to take advantage of multi-core processing. This can be done through tools like OpenMP; which allow the programmer to mark parallel loops. Identifying parallelizable loops, however, is a non-trivial task. Furthermore, transformations can be applied to a loop nest to expose parallelism. Polyhedral compilation has become an increasingly popular technique for exposing parallelism in computationally intensive loop nests. These techniques can simultaneously optimize for a number of performance parameters (i.e. parallelism, locality, etc). This is typically done using a cost model designed to give good performance in the general case. For some problems, the compiler may miss optimization opportunities or even produce a transformation that leads to worse performance. In these cases, the user has little recourse; since there are few options for the user to affect the transformation decisions. In this paper we present PUMA-V, a visualization interface that helps the user understand and affect the transformations made by an optimizing compiler based on the polyhedral model. This tool visualizes performance heuristics and runtime performance statistics to help the user identify missed optimization opportunities. Changes to the transformed code can be made by directly manipulating the visualizations. We show an example where performance is greatly improved over the polyhedral model alone by using our tool.