R-Stream®

Learn more about our advanced compiler technology »

Methods and Apparatus for Data Transfer Optimization



Publication Source: Patent US9858053B2

Methods, apparatus and computer software product for optimization of data transfer between two memories includes determining access to master data stored in one memory and/or to local data stored in another memory such that either or both of the size of total data transferred and the number of data transfers required to transfer the total data can be minimized. The master and/or local accesses are based on, at least in part, respective structures of the master and local data.
Google Scholar    Article

Methods and Apparatus for Automatic Communication Optimizations in a Compiler Based on a Polyhedral Representation



Publication Source: Patent US9830133B1

Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one local memory unit that allows for data reuse opportunities. The first custom computing apparatus optimizes the code for reduced communication execution on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Google Scholar    Article

Polyhedral Optimization of TensorFlow Computation Graphs



Publication Source: The 6th Workshop on Extreme-scale Programming Tools (ESPT-2017) at The International Conference for High Performance Computing, Networking, Storage and Analysis (SC17)

We present R-Stream·TF, a polyhedral optimization tool for neural network computations. R-Stream·TF transforms computations performed in a neural network graph into C programs suited to the polyhedral representation and uses R-Stream, a polyhedral compiler, to parallelize and optimize the computations performed in the graph. R-Stream·TF can exploit the optimizations available with R-Stream to generate a highly optimized version of the computation graph, specifically mapped to the targeted architecture. During our experiments, R-Stream·TF was able to automatically reach performance levels close to the hand-optimized implementations, demonstrating its utility in porting neural network computations to parallel architectures.

Google Scholar    Article

Automatic Code Generation and Data Management for an Asynchronous Task-based Runtime



Publication Source: 5th Workshop on Extreme-scale Programming Tools (ESPT-2016) at The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)

Hardware scaling and low-power considerations associated with the quest for exascale and extreme scale computing are driving system designers to consider new runtime and execution models such as the event-driven-task (EDT) models that enable more concurrency and reduce the amount of synchronization. Further, for performance, productivity, and code sustainability reasons, there is an increasing demand for autoparallelizing compiler technologies to automatically produce code for EDT-based runtimes. However achieving scalable performance in extreme-scale systems with auto-generated codes is a non-trivial challenge. Some of the key requirements that are important for achieving good scalable performance across many EDT-based systems are:  (1) scalable dynamic creation of task-dependence graph and spawning of tasks, (2) scalable creation and management of data and communications, and (3) dynamic scheduling of tasks and movement of data for scalable asynchronous execution. In this paper, we develop capabilities within R-Stream - an automatic source-to-source optimization compiler - for automatic generation and optimization of code and data management targeted towards Open Community Runtime (OCR) - an exascale-ready asynchronous task-based runtime. We demonstrate the effectiveness of our techniques through performance improvements on various benchmarks and proxy application kernels that are relevant to the extreme-scale computing community.
Google Scholar    Article

Methods and Apparatus for Joint Scheduling and Layout Optimization to Enable Multi-level Vectorization



Publication Source: Patent US9489180B1

Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one vector execution unit that allow for parallel execution of tasks on constant-strided memory locations. The first custom computing apparatus optimizes the code for parallelism, locality of operations, constant-strided memory accesses and vectorized execution on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Google Scholar    Article

1 2 3 8