@TECHREPORT\{IMM2011-06041,
    author       = "P. Larsen and R. Ladelsky and J. Lidman and S. A. McKee and S. Karlsson and A. Zaks",
    title        = "Automatic Loop Parallelization via Compiler Guided Refactoring",
    year         = "2011",
    number       = "",
    series       = "IMM-Technical Report-2011-12",
    institution  = "Technical University of Denmark, {DTU} Informatics, {E-}mail: reception@imm.dtu.dk",
    address      = "Asmussens Alle, Building 305, {DK-}2800 Kgs. Lyngby, Denmark",
    type         = "",
    url          = "http://www.imm.dtu.dk/English.aspx",
    abstract     = "For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core {IBM} POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should be combined with target-specific optimizations. Furthermore, comparing the first benchmark to hand-parallelized, hand-optimized pthreads and OpenMP versions, we find that code generated using our approach typically outperforms the pthreads code (within 93-339\%). It also performs competitively against the OpenMP code (within 75-111\%). The second benchmark outperforms hand-parallelized and optimized OpenMP code (within 109-242\%)."
}