Complementing user-level coarse-grain parallelism with implicit speculative parallelism

ridm@nrct.go.th ระบบคลังข้อมูลงานวิจัยไทย รายการโปรดที่คุณเลือกไว้

Complementing user-level coarse-grain parallelism with implicit speculative parallelism

หน่วยงาน Edinburgh Research Archive, United Kingdom

รายละเอียด

ชื่อเรื่อง	:	Complementing user-level coarse-grain parallelism with implicit speculative parallelism
นักวิจัย	:	Ioannou, Nikolas
คำค้น	:	many-core architecture , thread-level speculation , thread-level parallelism , power management
หน่วยงาน	:	Edinburgh Research Archive, United Kingdom
ผู้ร่วมงาน	:	Cintra, Marcelo , O'Boyle, Michael , Engineering and Physical Sciences Research Council (EPSRC)
ปีพิมพ์	:	2555
อ้างอิง	:	http://hdl.handle.net/1842/7900
ที่มา	:	-
ความเชี่ยวชาญ	:	-
ความสัมพันธ์	:	Nikolas Ioannou and Macelo Cintra. Complementing explicit coarse-grain parallelism with implicit speculative parallelism. In Intl. Symp. on Microarchitecture (MICRO), pages 284–295, December 2011. , Nikolas Ioannou,Michael Kauschke,Matthias Gries, andMacelo Cintra. Phasebased application-driven power management on the single-chip cloud computer. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 131 –142, October 2011. , Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paris Yiapanis, Adam Pocock, Gavin Brown, Mikel Lujan, Ian Watson, and Marcelo Cintra. Toward a more accurate understanding of the limits of the tls execution paradigm. In Intl. Symp. on Workload Characterization (IISWC), pages 1 –12, December 2010. , Salman Khan, Nikolas Ioannou, Polychronis Xekalakis, andMarcelo Cintra. Increasing the energy efficiency of tls systems using intermediate checkpointing. In Intl. Conf on High Performance Computing (HiPC), pages 1–11, December 2011. , Polychronis Xekalakis, Nikolas Ioannou, and Marcelo Cintra. Combining thread level speculation, helper threads and runahead execution. In Intl. Conf. on Supercomputing (ICS), pages 410–420, June 2009.
ขอบเขตของเนื้อหา	:	-
บทคัดย่อ/คำอธิบาย	:	Multi-core and many-core systems are the norm in contemporary processor technology and are expected to remain so for the foreseeable future. Parallel programming is, thus, here to stay and programmers have to endorse it if they are to exploit such systems for their applications. Programs using parallel programming primitives like PThreads or OpenMP often exploit coarse-grain parallelism, because it offers a good trade-off between programming effort versus performance gain. Some parallel applications show limited or no scaling beyond a number of cores. Given the abundant number of cores expected in future many-cores, several cores would remain idle in such cases while execution performance stagnates. This thesis proposes using cores that do not contribute to performance improvement for running implicit fine-grain speculative threads. In particular, we present a many-core architecture and protocols that allow applications with coarse-grain explicit parallelism to further exploit implicit speculative parallelism within each thread. We show that complementing parallel programs with implicit speculative mechanisms offers significant performance improvements for a large and diverse set of parallel benchmarks. Implicit speculative parallelism frees the programmer from the additional effort to explicitly partition the work into finer and properly synchronized tasks. Our results show that, for a many-core comprising 128 cores supporting implicit speculative parallelism in clusters of 2 or 4 cores, performance improves on top of the highest scalability point by 44% on average for the 4-core cluster and by 31% on average for the 2-core cluster. We also show that this approach often leads to better performance and energy efficiency compared to existing alternatives such as Core Fusion and Turbo Boost. Moreover, we present a dynamic mechanism to choose the number of explicit and implicit threads, which performs within 6% of the static oracle selection of threads. To improve energy efficiency processors allow for Dynamic Voltage and Frequency Scaling (DVFS), which enables changing their performance and power consumption on-the-fly. We evaluate the amenability of the proposed explicit plus implicit threads scheme to traditional power management techniques for multithreaded applications and identify room for improvement. We thus augment prior schemes and introduce a novel multithreaded power management scheme that accounts for implicit threads and aims to minimize the Energy Delay2 product (ED2). Our scheme comprises two components: a “local” component that tries to adapt to the different program phases on a per explicit thread basis, taking into account implicit thread behavior, and a “global” component that augments the local components with information regarding inter-thread synchronization. Experimental results show a reduction of ED2 of 8% compared to having no power management, with an average reduction in power of 15% that comes at a minimal loss of performance of less than 3% on average.
บรรณานุกรม	:	APA Chicago MLA Vancouver Ioannou, Nikolas . (2555). Complementing user-level coarse-grain parallelism with implicit speculative parallelism. กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom . Ioannou, Nikolas . 2555. "Complementing user-level coarse-grain parallelism with implicit speculative parallelism". กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom . Ioannou, Nikolas . "Complementing user-level coarse-grain parallelism with implicit speculative parallelism." กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom , 2555. Print. Ioannou, Nikolas . Complementing user-level coarse-grain parallelism with implicit speculative parallelism. กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom ; 2555.