Trading off Latency for Memory Bandwidth in Spatial Architectures using the Polyhedral Model

We present a fast algorithm for transforming loop nests represented in the polyhedral model, which have been placed onto a spatial grid of processors. The goal of the transformation is to avoid broadcasts, which can put extra pressure on remote memory bandwidth, and turn them mainly into neighbor-to-neighbor reuse. The transformation preserves existing parallelism and locality properties of the original program as much as possible, and in particular does not reduce its amount of parallelism.

 

For information on Reservoir’s technology related to this paper, visit R-Stream.