Amazon cover image
Image from Amazon.com

Data analytics: a small data approach

By: Contributor(s): Material type: TextTextPublication details: CRC Press Boco Raton 2021Description: xiv, 257 pISBN:
  • 9780367609504
Subject(s): DDC classification:
  • 001.42 HUA
Summary: Data Analytics: A Small Data Approach is suitable for an introductory data analytics course to help students understand some main statistical learning models. It has many small datasets to guide students to work out pencil solutions of the models and then compare with results obtained from established R packages. Also, as data science practice is a process that should be told as a story, in this book there are many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines. The main models covered in this book include linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, and deep learning. Each chapter introduces two or three techniques. For each technique, the book highlights the intuition and rationale first, then shows how mathematics is used to articulate the intuition and formulate the learning problem. R is used to implement the techniques on both simulated and real-world dataset.
List(s) this item appears in: IT & Decision Sciences | Non Fiction
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Collection Call number Copy number Status Date due Barcode
Book Book Indian Institute of Management LRC General Stacks IT & Decisions Sciences 001.42 HUA (Browse shelf(Opens below)) 1 Available 004210

Table of Contents
1. INTRODUCTION

Who will benefit from this book

Overview of a Data Analytics Pipeline

Topics in a Nutshell

2. ABSTRACTION

Regression & tree models

Overview

Regression Models

Tree Models

Remarks

Exercises

3. RECOGNITION

Logistic regression & ranking

Overview

Logistic Regression Model

A Ranking Problem by Pairwise Comparison

Statistical Process Control using Decision Tree

Remarks

Exercise

4. RESONANCE

Bootstrap & random forests

Overview

How Bootstrap Works

Random Forests

Remarks

Exercises

5. LEARNING (I)

Cross validation & OOB

Overview

Cross-Validation

Out-of-bag error in Random Forest

Remarks

Exercises

6. DIAGNOSIS

Residuals & heterogeneity

Overview

Diagnosis in Regression

Diagnosis in Random Forests

Clustering

Remarks

Exercises

7. LEARNING (II)

SVM & ensemble Learning

Overview

Support Vector Machine

Ensemble Learning

Remarks

Exercises

data analytics

8. SCALABILITY

LASSO & PCA

Overview

LASSO

Principal Component Analysis

Remarks

Exercises

9. PRAGMATISM

Experience & experimental

Overview

Kernel Regression Model

Conditional Variance Regression Model

Remarks

Exercises

10. SYNTHESIS

Architecture & pipeline

Overview

Deep Learning

inTrees

Remarks

Exercises

CONCLUSION

APPENDIX: A BRIEF REVIEW OF BACKGROUND KNOWLEDGE

The normal distribution

Matrix operations

Optimization

Data Analytics: A Small Data Approach is suitable for an introductory data analytics course to help students understand some main statistical learning models. It has many small datasets to guide students to work out pencil solutions of the models and then compare with results obtained from established R packages. Also, as data science practice is a process that should be told as a story, in this book there are many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines.

The main models covered in this book include linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, and deep learning. Each chapter introduces two or three techniques. For each technique, the book highlights the intuition and rationale first, then shows how mathematics is used to articulate the intuition and formulate the learning problem. R is used to implement the techniques on both simulated and real-world dataset.

There are no comments on this title.

to post a comment.

©2019-2020 Learning Resource Centre, Indian Institute of Management Bodhgaya

Powered by Koha