| LDR | | 00000nmm u2200205 4500 |
| 001 | | 000000330023 |
| 005 | | 20241017163220 |
| 008 | | 181129s2017 ||| | | | eng d |
| 020 | |
▼a 9780438195042 |
| 035 | |
▼a (MiAaPQ)AAI10682961 |
| 035 | |
▼a (MiAaPQ)neucis:10112 |
| 040 | |
▼a MiAaPQ
▼c MiAaPQ
▼d 248032 |
| 049 | 1 |
▼f DP |
| 082 | 0 |
▼a 004 |
| 100 | 1 |
▼a Cao, Jiajun. |
| 245 | 10 |
▼a Transparent Checkpointing over RDMA-based Networks. |
| 260 | |
▼a [S.l.] :
▼b Northeastern University.,
▼c 2017 |
| 260 | 1 |
▼a Ann Arbor :
▼b ProQuest Dissertations & Theses,
▼c 2017 |
| 300 | |
▼a 147 p. |
| 500 | |
▼a Source: Dissertation Abstracts International, Volume: 79-12(E), Section: B. |
| 500 | |
▼a Adviser: Gene Cooperman. |
| 502 | 1 |
▼a Thesis (Ph.D.)--Northeastern University, 2017. |
| 520 | |
▼a Fault tolerance for large-scale applications has long been an area of active research, as the size of the computation keeps growing. One of the components of a fault-tolerance strategy is checkpointing. However, no explicit checkpoint-restart so |
| 520 | |
▼a In this dissertation, we present the first transparent, system-initiated checkpoint-restart solution that directly supports RDMA networks. This new approach does not depend on a specific parallel programming model, and does not require any modif |
| 520 | |
▼a Conceptually, this dissertation can be divided into three parts. First, we introduce a new, generic model for RDMA networks, by extracting the key components for checkpointing an RDMA network. These components are the essential states that need |
| 520 | |
▼a Second, we demonstrate the performance of the proposed approach. Moving from a medium-sized academic computer cluster to a petascale supercomputer, we show what issues are exposed as the application scales up, and how these issues are addressed. |
| 520 | |
▼a Third, we show how to retrofit transparent checkpointing into the Cloud, as RDMA networks are also becoming more popular in the Cloud. A Checkpointing as a Service approach is presented, which employs checkpointing to provide fault tolerance as |
| 590 | |
▼a School code: 0160. |
| 650 | 4 |
▼a Computer science. |
| 690 | |
▼a 0984 |
| 710 | 20 |
▼a Northeastern University.
▼b Computer Science. |
| 773 | 0 |
▼t Dissertation Abstracts International
▼g 79-12B(E). |
| 773 | |
▼t Dissertation Abstract International |
| 790 | |
▼a 0160 |
| 791 | |
▼a Ph.D. |
| 792 | |
▼a 2017 |
| 793 | |
▼a English |
| 856 | 40 |
▼u http://www.riss.kr/pdu/ddodLink.do?id=T14996729
▼n KERIS |
| 980 | |
▼a 201812
▼f 2019 |
| 990 | |
▼a 관리자
▼b 관리자 |