R-Stream

Advanced Polyhedral Compiler

Powering the next generation of hardware.

R-Stream is a source-to-source software tool for high performance computing that enables algorithm and application developers to focus on their algorithms and applications, not on the complexity of programming for complex hardware. Developers can express the program in high-level program code once, then use R-Stream to automatically produce optimized parallelized source code.  

Porting is achieved by changing the machine model of the target and compiler flags so that the same input source code can be re-generated for several target machines and parallel APIs, removing the need to write separate versions for each target. Productivity is achieved because the programmer can write simple source code, and R-Stream automatically takes care of managing complexities of the architecture for performance.

Learn more about R-Stream

Core Capabilities

R-Stream can automatically generate optimized code in a broad set of forms that includes standard OpenMP, Asynchronous Multi-Threaded runtime APIs, CUDA, and OpenCL. The output programs are deeply optimized using structural idioms (loop tiling, etc.) that achieve good target specific performance.

Exascale, DMA generation, spatial organization/architecture, deep hierarchy, energy proportional scheduling. Emits detailed code detailing the control of these hardware architectures to get performance and save energy

A special input adapter to R-Stream can translate Tensorflow graphs into “chunks” of C source; R-Stream then generates optimized custom operators that can be substituted into the TF graph for high performance

The same input source code can be targeted at multiple types of processors; substitute a new target processor machine model files to regenerate code across a diversity of targets.

From the same input source as used to generate standard parallel programming APIs, R-Stream also has initial support to generate optimized code in Legion and Kokkos.

Dataflow execution models provide even more concurrency in parallel programs. R-Stream can generate dataflow programs, expressed within abstractions for different kinds of processors (CPU, GPU).

Greatest range of parallelizing transformations compared to any other polyhedral compiler, in an integrated, well engineered form. Power tool at your desk.

Use of advanced ILP solvers, such as Gurobi, and propriety polyhedral algorithms allow optimization with greater scope and on larger programs than other tools

For parallelizing to accelerators, R-Stream accepts hierarchical machine descriptions and can generate both host code and accelerator code, as well as the controls from the host to the accelerator.

The output code for R-Stream preserves variable names and can be linked with other parts of the application that are not optimized with R-Stream.

R-Stream is a source-to-source compiler. Input is sequential C code, and the output is parallelized source code.

Enables use with a variety of languages, and flows based on clang and other LLVM front ends.

Meet Some of our Experts

Benoit Meister

Fellow & Managing Engineer
Bio

Muthu Baskaran

Fellow & Managing Engineer
Bio

Ryan Senanayake

Engineer
Bio

Adithya Dattatri

Engineer
Bio

Explore our Technology

Reservoir Labs provides our clients services and solutions powered by our knowledge of the mathematics underpinning polyhedral optimization. Our team has a record of innovating polyhedral model algorithms that solve the challenges of generating optimized mappings of complex applications to complex computing targets. We have engineered these algorithms into solutions to deliver to customers.

The polyhedral model of programs is geometric. The model represents iteration spaces of programs as high dimensional polyhedra, and dependencies within cross products of these polyhedra. This representation of programs, first articulated by Paul Feautrier in the early 1990s, improves on previous parallel program representations through its precision and also because key optimizations for high-performance and efficient execution on novel processors can be framed analytically. With an analytical framing, the search for an optimal mapping happens via efficient mathematical optimization libraries.

 

Reservoir’s work on polyhedral optimization began in the early 2000s in work for DARPA’s Polymorphous Computing Architecture (PCA) program, which sought to map complex signal processing algorithms to computational accelerators. Reservoir’s initial work focused on developing a practical and integrated source-to-source mapping solution. We developed both general algorithms to complete the pipeline, as well as specific solutions addressing the needs of the efficient processors being innovated in PCA.

 

Reservoir’s work continues today in developing new optimizations for mapping deep learning and exascale applications, and broadening the applicability of the polyhedral model. The original optimizations developed for PCA are now relevant for modern accelerators such as GPUs and deep learning hardware. Recent optimizations specialize to particular application domains (e.g., neural networks) and provide features needed for new processors, such as generation of power controls and dataflow execution models. We have also developed optimizations for improving scalability.

 

Reservoir’s polyhedral algorithms, patents, and code are available for license.  They can be delivered in technology enabled services projects, integrating them with your compiler, or delivered as a solution for your application and architecture based on our R-Stream platform.

Polyhedral Innovations

Reservoir has developed algorithms that improve the scalability of polyhedral scheduling. This includes engineering within classical affine scheduling to reduce the number of constraints, as well as to break the problem into smaller chunks solved independently. Reservoir has also developed approximations to reduce the dimensionality of problems. These scalability improvements help apply the polyhedral model to modern challenges such as computing on tensors and deep hierarchical architecture targets.

Reservoir’s polyhedral optimizations are designed for performance; they include parallelization optimizations that jointly optimize parallelism, locality, contiguity, vectorization and data layout (JPLCVD). They also include unique tiling optimizations for imperfect loop nests using analytic counting techniques. These optimizations can target hierarchical and hybrid machines, performing tailored optimizations at each level of hierarchy.

Reservoir has experience engineering complete solutions for the polyhedral model. This includes engineering the polyhedral model within a classical compiler feeding source code regeneration (“unparsing”). We have implemented efficient intermediate representations and special testing procedures for polyhedral optimizations. Our solutions use and ship with reliable 3rd party mathematical optimization solvers.

Reservoir has developed optimization algorithms that improve the power efficiency of compiled code, beyond improved performance. These include forms of energy proportional scheduling at the microarchitectural level, and automatic generation of voltage and clock controls.

Reservoir has extended the polyhedral model to perform optimizations beyond the original domain of affine loop nests. We have special support for optimizing machine learning/Neural Network codes and sparse tensor codes.

Reservoir has developed polyhedral optimizations targeted at special processors such as GPU, wide-SIMD processors, spatial arrays, and systolic arrays.

 

Get in touch with one of our experts today

Recent Publications

Efficient and scalable computations with sparse tensors

In a system for storing in memory a tensor that includes at least three modes, elements of the tensor are stored in a mode-based order for improving locality of references when the elements are accessed during an operation on the tensor. To facilitate efficient data reuse in a tensor transform

Read More »

Static Versioning in the Polyhedral Model

We present an approach to enhancing the optimization process in a polyhedral compiler by introducing compile-time versioning, i.e., the production of several versions of optimized code under varying assumptions on its run-time parameters. We illustrate this process by enabling versioning in the polyhedral processor placement pass. We propose an efficient

Read More »