가야대학교 분성도서관

상단 글로벌/추가 메뉴

회원 로그인

주메뉴

전체메뉴보기

전체메뉴바탕이미지TL

전체메뉴바탕이미지TR

: • 전체검색; • 단행본; • 정기간행물; • 학위논문; • 비도서; • 특정번호 검색; • 기사색인; • 학습연구지원; • 지정도서

: • 전자자원통합검색; • 해외 WEB DB; • E-Book; • E-Learning; • 국내 WEB DB; • 인터넷검색

: • 대출현황조회/연장; • 예약현황조회/취소; • 희망도서; • 내서재; • 나의서평; • 나의태그; • 나의위젯; • RSS; • SDI; • 개인정보관리

: • 공지사항; • FAQ; • Q&A 일반·학술; • 신착/인기 도서; • 학위논문온라인제출; • 외부인 이용안내; • 설문조사

: • 도서관 소개; • 연혁; • 현황; • 이용안내; • 찾아오시는 길

전체메뉴바탕이미지BL

전체메뉴바탕이미지BR

자료검색

자료검색

Home
상세정보

상세정보

검색결과 돌아가기

부가기능

Hands-on big data analytics with PySpark : analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs /

상세 프로파일

상세정보
자료유형	E-Book
개인저자	Lai, Rudy, author. Potaczek, Bart흢omiej, author.
서명/저자사항	Hands-on big data analytics with PySpark :analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs /Rudy Lai, Bart흢omiej Potaczek.
발행사항	Birmingham, UK : Packt Publishing, 2019.
형태사항	1 online resource : illustrations
소장본 주기	Added to collection customer.56279.3
ISBN	1838648836 9781838648831
내용주기	Cover; Title Page; Copyright and Credits; About Packt; Contributors; Table of Contents; Preface; Chapter 1: Pyspark and Setting up Your Development Environment; An overview of PySpark; Spark SQL; Setting up Spark on Windows and PySpark; Core concepts in Spark and PySpark; SparkContext; Spark shell; SparkConf; Summary; Chapter 2: Getting Your Big Data into the Spark Environment Using RDDs; Loading data on to Spark RDDs; The UCI machine learning repository; Getting the data from the repository to Spark; Getting data into Spark; Parallelization with Spark RDDs; What is parallelization? Basics of RDD operationSummary; Chapter 3: Big Data Cleaning and Wrangling with Spark Notebooks; Using Spark Notebooks for quick iteration of ideas; Sampling/filtering RDDs to pick out relevant data points; Splitting datasets and creating some new combinations; Summary; Chapter 4: Aggregating and Summarizing Data into Useful Reports; Calculating averages with map and reduce; Faster average computations with aggregate; Pivot tabling with key-value paired data points; Summary; Chapter 5: Powerful Exploratory Data Analysis with MLlib; Computing summary statistics with MLlib Using Pearson and Spearman correlations to discover correlationsThe Pearson correlation; The Spearman correlation; Computing Pearson and Spearman correlations; Testing our hypotheses on large datasets; Summary; Chapter 6: Putting Structure on Your Big Data with SparkSQL; Manipulating DataFrames with Spark SQL schemas; Using Spark DSL to build queries; Summary; Chapter 7: Transformations and Actions; Using Spark transformations to defer computations to a later time; Avoiding transformations; Using the reduce and reduceByKey methods to calculate the results Performing actions that trigger computationsReusing the same rdd for different actions; Summary; Chapter 8: Immutable Design; Delving into the Spark RDD's parent/child chain; Extending an RDD; Chaining a new RDD with the parent; Testing our custom RDD; Using RDD in an immutable way; Using DataFrame operations to transform; Immutability in the highly concurrent environment; Using the Dataset API in an immutable way; Summary; Chapter 9: Avoiding Shuffle and Reducing Operational Expenses; Detecting a shuffle in a process; Testing operations that cause a shuffle in Apache Spark Changing the design of jobs with wide dependenciesUsing keyBy() operations to reduce shuffle; Using a custom partitioner to reduce shuffle; Summary; Chapter 10: Saving Data in the Correct Format; Saving data in plain text format; Leveraging JSON as a data format; Tabular formats -- CSV; Using Avro with Spark; Columnar formats -- Parquet; Summary; Chapter 11: Working with the Spark Key/Value API; Available actions on key/value pairs; Using aggregateByKey instead of groupBy(); Actions on key/value pairs; Available partitioners on key/value data; Implementing a custom partitioner; Summary
요약	In this book, you'll learn to implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. Techniques are demonstrated using practical examples and best practices. You will also learn how to use Spark and its Python API to create performant analytics with large-scale data.
일반주제명	SPARK (Computer program language) Application software -- Development. Big data. Electronic data processing. Python (Computer program language) Application software -- Development. Big data. Electronic data processing. Python (Computer program language) SPARK (Computer program language)
언어	영어
기타형태 저록	Print version:Lai, Rudy.Hands-On Big Data Analytics with Pyspark : Analyze Large Datasets and Discover Techniques for Testing, Immunizing, and Parallelizing Spark Jobs.Birmingham : Packt Publishing Ltd, 짤20199781838644130
대출바로가기	http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=2094759

소장정보

소장정보

인쇄

메세지가 없습니다
No.	등록번호	청구기호	소장처	도서상태	반납예정일	예약	서비스	매체정보
1	WE00017062	004.2	가야대학교/전자책서버(컴퓨터서버)/	대출가능

서평

서평

태그

태그

태그추가 (로그인 필요)

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0)

대출현황/연장

예약현황조회/취소

자료구입신청

상호대차

FAQ

교외접속

사서에게 물어보세요

메뉴추가

quickBottom

카피라이터

하단 로고

김해캠퍼스 | 621-748 | 경남 김해시 삼계로 208 | TEL:055-330-1033 | FAX:055-330-1032
Copyright 2012 by kaya university Bunsung library All rights reserved.