LDR | | 00000nmm u2200205 4500 |
001 | | 000000330023 |
005 | | 20241017163220 |
008 | | 181129s2017 ||| | | | eng d |
020 | |
▼a 9780438195042 |
035 | |
▼a (MiAaPQ)AAI10682961 |
035 | |
▼a (MiAaPQ)neucis:10112 |
040 | |
▼a MiAaPQ
▼c MiAaPQ
▼d 248032 |
049 | 1 |
▼f DP |
082 | 0 |
▼a 004 |
100 | 1 |
▼a Cao, Jiajun. |
245 | 10 |
▼a Transparent Checkpointing over RDMA-based Networks. |
260 | |
▼a [S.l.] :
▼b Northeastern University.,
▼c 2017 |
260 | 1 |
▼a Ann Arbor :
▼b ProQuest Dissertations & Theses,
▼c 2017 |
300 | |
▼a 147 p. |
500 | |
▼a Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B. |
500 | |
▼a Adviser: Gene Cooperman. |
502 | 1 |
▼a Thesis (Ph.D.)--Northeastern University, 2017. |
520 | |
▼a Fault tolerance for large-scale applications has long been an area of active research, as the size of the computation keeps growing. One of the components of a fault-tolerance strategy is checkpointing. However, no explicit checkpoint-restart so |
520 | |
▼a In this dissertation, we present the first transparent, system-initiated checkpoint-restart solution that directly supports RDMA networks. This new approach does not depend on a specific parallel programming model, and does not require any modif |
520 | |
▼a Conceptually, this dissertation can be divided into three parts. First, we introduce a new, generic model for RDMA networks, by extracting the key components for checkpointing an RDMA network. These components are the essential states that need |
520 | |
▼a Second, we demonstrate the performance of the proposed approach. Moving from a medium-sized academic computer cluster to a petascale supercomputer, we show what issues are exposed as the application scales up, and how these issues are addressed. |
520 | |
▼a Third, we show how to retrofit transparent checkpointing into the Cloud, as RDMA networks are also becoming more popular in the Cloud. A Checkpointing as a Service approach is presented, which employs checkpointing to provide fault tolerance as |
590 | |
▼a School code: 0160. |
650 | 4 |
▼a Computer science. |
690 | |
▼a 0984 |
710 | 20 |
▼a Northeastern University.
▼b Computer Science. |
773 | 0 |
▼t Dissertation Abstracts International
▼g 79-12B(E). |
773 | |
▼t Dissertation Abstract International |
790 | |
▼a 0160 |
791 | |
▼a Ph.D. |
792 | |
▼a 2017 |
793 | |
▼a English |
856 | 40 |
▼u http://www.riss.kr/pdu/ddodLink.do?id=T14996729
▼n KERIS |
980 | |
▼a 201812
▼f 2019 |
990 | |
▼a 관리자
▼b 관리자 |