Files
Abstract
The recent development of the compressed sensing (CS) theory has given rise to several algorithms being proposed for signal acquisition and reconstruction. Implementing these algorithms are computationally demanding and pose several challenges for effective shared resource utilization. We describe a native, parallelized realization of the compressed sensing problem on a commercially available multicore architecture. A quick and efficient reconstruction algorithm, Smoothed L0 (SL0), is parallelized and adapted to benefit from the multicore implementation of the sampling basis, the Walsh-Hadamard transform (WHT).Valuable insights on data cache locality are presented from the characterization of miss rate patterns using a cache profiler. We develop performance models for the algorithms using regression, correlating response time with application parameters, memory utilization and other overheads. The matrix generation algorithm shows a high degree of parallelizability with speedup up to 5.6 on an 8-core Intel Xeon, while the recovery algorithm shows a speedup up to 4.5. Our models also demonstrate the synchronization bottlenecks and cache limitations of such threaded multicore implementations.