TY - BOOK AU - Huang, Shuai AU - Deng, Houtao TI - Data analytics: : a small data approach SN - 9780367609504 U1 - 001.42 PY - 2021/// CY - Boco Raton PB - CRC Press KW - R (Computer program language) KW - Python (Computer program language) KW - Quantitative research N1 - Table of Contents 1. INTRODUCTION Who will benefit from this book Overview of a Data Analytics Pipeline Topics in a Nutshell 2. ABSTRACTION Regression & tree models Overview Regression Models Tree Models Remarks Exercises 3. RECOGNITION Logistic regression & ranking Overview Logistic Regression Model A Ranking Problem by Pairwise Comparison Statistical Process Control using Decision Tree Remarks Exercise 4. RESONANCE Bootstrap & random forests Overview How Bootstrap Works Random Forests Remarks Exercises 5. LEARNING (I) Cross validation & OOB Overview Cross-Validation Out-of-bag error in Random Forest Remarks Exercises 6. DIAGNOSIS Residuals & heterogeneity Overview Diagnosis in Regression Diagnosis in Random Forests Clustering Remarks Exercises 7. LEARNING (II) SVM & ensemble Learning Overview Support Vector Machine Ensemble Learning Remarks Exercises data analytics 8. SCALABILITY LASSO & PCA Overview LASSO Principal Component Analysis Remarks Exercises 9. PRAGMATISM Experience & experimental Overview Kernel Regression Model Conditional Variance Regression Model Remarks Exercises 10. SYNTHESIS Architecture & pipeline Overview Deep Learning inTrees Remarks Exercises CONCLUSION APPENDIX: A BRIEF REVIEW OF BACKGROUND KNOWLEDGE The normal distribution Matrix operations Optimization N2 - Data Analytics: A Small Data Approach is suitable for an introductory data analytics course to help students understand some main statistical learning models. It has many small datasets to guide students to work out pencil solutions of the models and then compare with results obtained from established R packages. Also, as data science practice is a process that should be told as a story, in this book there are many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines. The main models covered in this book include linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, and deep learning. Each chapter introduces two or three techniques. For each technique, the book highlights the intuition and rationale first, then shows how mathematics is used to articulate the intuition and formulate the learning problem. R is used to implement the techniques on both simulated and real-world dataset. ER -