Data smart: using data science to transform Information into insight (Record no. 328)

MARC details
000 -LEADER
fixed length control field 09108nam a22002057a 4500
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20190904105646.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 190904b ||||| |||| 00| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9788126546145
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 006.31
Item number FOR
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Foreman, John W.
245 ## - TITLE STATEMENT
Title Data smart: using data science to transform Information into insight
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc. New Delhi
Name of publisher, distributor, etc. Wiley India Pvt. Ltd.
Date of publication, distribution, etc. 2018
300 ## - PHYSICAL DESCRIPTION
Extent xx, 409 p.
365 ## - TRADE PRICE
Price type code INR
Price amount 799.00
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc. note TABLE OF CONTENTS<br/>Introduction xiii<br/><br/>1 Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask 1<br/><br/>Some Sample Data 2<br/><br/>Moving Quickly with the Control Button 2<br/><br/>Copying Formulas and Data Quickly 4<br/><br/>Formatting Cells 5<br/><br/>Paste Special Values 7<br/><br/>Inserting Charts 8<br/><br/>Locating the Find and Replace Menus 9<br/><br/>Formulas for Locating and Pulling Values 10<br/><br/>Using VLOOKUP to Merge Data 12<br/><br/>Filtering and Sorting 13<br/><br/>Using PivotTables 16<br/><br/>Using Array Formulas 19<br/><br/>Solving Stuff with Solver 20<br/><br/>OpenSolver: I Wish We Didn’t Need This, but We Do 26<br/><br/>Wrapping Up 27<br/><br/>2 Cluster Analysis Part I: Using K-Means to Segment Your Customer Base 29<br/><br/>Girls Dance with Girls, Boys Scratch Their Elbows 30<br/><br/>Getting Real: K-Means Clustering Subscribers in E-mail Marketing 35<br/><br/>Joey Bag O’ Donuts Wholesale Wine Emporium 36<br/><br/>The Initial Dataset 36<br/><br/>Determining What to Measure 38<br/><br/>Start with Four Clusters 41<br/><br/>Euclidean Distance: Measuring Distances as the Crow Flies 41<br/><br/>Distances and Cluster Assignments for Everybody! 44<br/><br/>Solving for the Cluster Centers 46<br/><br/>Making Sense of the Results 49<br/><br/>Getting the Top Deals by Cluster 50<br/><br/>The Silhouette: A Good Way to Let Different K Values Duke It Out 53<br/><br/>How about Five Clusters? 60<br/><br/>Solving for Five Clusters 60<br/><br/>Getting the Top Deals for All Five Clusters 61<br/><br/>Computing the Silhouette for 5-Means Clustering 64<br/><br/>K-Medians Clustering and Asymmetric Distance Measurements 66<br/><br/>Using K-Medians Clustering 66<br/><br/>Getting a More Appropriate Distance Metric 67<br/><br/>Putting It All in Excel 69<br/><br/>The Top Deals for the 5-Medians Clusters 70<br/><br/>Wrapping Up 75<br/><br/>3 Naïve Bayes and the Incredible Lightness of Being an Idiot 77<br/><br/>When You Name a Product Mandrill, You’re Going to Get Some Signal and Some Noise 77<br/><br/>The World’s Fastest Intro to Probability Theory 79<br/><br/>Totaling Conditional Probabilities 80<br/><br/>Joint Probability, the Chain Rule, and Independence 80<br/><br/>What Happens in a Dependent Situation? 81<br/><br/>Bayes Rule 82<br/><br/>Using Bayes Rule to Create an AI Model 83<br/><br/>High-Level Class Probabilities Are Often Assumed to Be Equal 84<br/><br/>A Couple More Odds and Ends 85<br/><br/>Let’s Get This Excel Party Started 87<br/><br/>Removing Extraneous Punctuation 87<br/><br/>Splitting on Spaces 88<br/><br/>Counting Tokens and Calculating Probabilities 92<br/><br/>And We Have a Model! Let’s Use It 94<br/><br/>Wrapping Up 98<br/><br/>4 Optimization Modeling: Because That “Fresh Squeezed” Orange Juice Ain’t Gonna Blend Itself 101<br/><br/>Why Should Data Scientists Know Optimization? 102<br/><br/>Starting with a Simple Trade-Off f 103<br/><br/>Representing the Problem as a Polytope 103<br/><br/>Solving by Sliding the Level Set 105<br/><br/>The Simplex Method: Rooting around the Corners 106<br/><br/>Working in Excel 108<br/><br/>There’s a Monster at the End of This Chapter 117<br/><br/>Fresh from the Grove to Your Glasswith a Pit Stop Through a Blending Model 118<br/><br/>You Use a Blending Model 119<br/><br/>Let’s Start with Some Specs 119<br/><br/>Coming Back to Consistency 121<br/><br/>Putting the Data into Excel 121<br/><br/>Setting Up the Problem in Solver 124<br/><br/>Lowering Your Standards 126<br/><br/>Dead Squirrel Removal: The Minimax Formulation 131<br/><br/>If-Then and the “Big M” Constraint 133<br/><br/>Multiplying Variables: Cranking Up the Volume to 11 137<br/><br/>Modeling Risk 144<br/><br/>Normally Distributed Data 145<br/><br/>Wrapping Up 154<br/><br/>5 Cluster Analysis Part II: Network Graphs and Community Detection 155<br/><br/>What Is a Network Graph? 156<br/><br/>Visualizing a Simple Graph 157<br/><br/>Brief Introduction to Gephi 159<br/><br/>Gephi Installation and File Preparation 160<br/><br/>Laying Out the Graph 162<br/><br/>Node Degree 165<br/><br/>Pretty Printing 166<br/><br/>Touching the Graph Data 168<br/><br/>Building a Graph from the Wholesale Wine Data 170<br/><br/>Creating a Cosine Similarity Matrix 172<br/><br/>Producing an r-Neighborhood Graph 174<br/><br/>How Much Is an Edge Worth? Points and Penalties in Graph Modularity 178<br/><br/>What’s a Point and What’s a Penalty? 179<br/><br/>Setting Up the Score Sheet 183<br/><br/>Let’s Get Clustering! 185<br/><br/>Split Number 1 185<br/><br/>Split 2: Electric Boogaloo 190<br/><br/>And…Split 3: Split with a Vengeance 192<br/><br/>Encoding and Analyzing the Communities 193<br/><br/>There and Back Again: A Gephi Tale 197<br/><br/>Wrapping Up 202<br/><br/>6 The Granddaddy of Supervised Artificial Intelligence—Regression 205<br/><br/>Wait, What? You’re Pregnant? 205<br/><br/>Don’t Kid Yourself 206<br/><br/>Predicting Pregnant Customers at RetailMart Using Linear Regression 207<br/><br/>The Feature Set 207<br/><br/>Assembling the Training Data 209<br/><br/>Creating Dummy Variables 210<br/><br/>Let’s Bake Our Own Linear Regression 213<br/><br/>Linear Regression Statistics: R-Squared, F Tests, t Tests 221<br/><br/>Making Predictions on Some New Data and Measuring Performance 230<br/><br/>Predicting Pregnant Customers at RetailMart Using Logistic Regression 239<br/><br/>First You Need a Link Function 240<br/><br/>Hooking Up the Logistic Function and Reoptimizing 241<br/><br/>Baking an Actual Logistic Regression 244<br/><br/>Model Selection—Comparing the Performance of the Linear and Logistic Regressions 245<br/><br/>For More Information 248<br/><br/>Wrapping Up 249<br/><br/>7 Ensemble Models: A Whole Lot of Bad Pizza 251<br/><br/>Using the Data from Chapter 6 252<br/><br/>Bagging: Randomize, Train, Repeat 254<br/><br/>Decision Stump Is an Unsexy Term for a Stupid Predictor 254<br/><br/>Doesn’t Seem So Stupid to Me! 255<br/><br/>You Need More Power! 257<br/><br/>Let’s Train It 258<br/><br/>Evaluating the Bagged Model 267<br/><br/>Boosting: If You Get It Wrong, Just Boost and Try Again 272<br/><br/>Training the Model—Every Feature Gets a Shot 272<br/><br/>Evaluating the Boosted Model 280<br/><br/>Wrapping Up 283<br/><br/>8 Forecasting: Breathe Easy; You Can’t Win 285<br/><br/>The Sword Trade Is Hopping 286<br/><br/>Getting Acquainted with Time Series Data 286<br/><br/>Starting Slow with Simple Exponential Smoothing 288<br/><br/>Setting Up the Simple Exponential Smoothing Forecast 290<br/><br/>You Might Have a Trend 296<br/><br/>Holt’s Trend-Corrected Exponential Smoothing 299<br/><br/>Setting Up Holt’s Trend-Corrected Smoothing in a Spreadsheet 300<br/><br/>So Are You Done? Looking at Autocorrelations 306<br/><br/>Multiplicative Holt-Winters Exponential Smoothing 313<br/><br/>Setting the Initial Values for Level, Trend, and Seasonality 315<br/><br/>Getting Rolling on the Forecast 319<br/><br/>And Optimize! 324<br/><br/>Please Tell Me We’re Done Now!!! 326<br/><br/>Putting a Prediction Interval around the Forecast 327<br/><br/>Creating a Fan Chart for Effect 331<br/><br/>Wrapping Up 333<br/><br/>9 Outlier Detection: Just Because They’re Odd Doesn’t Mean They’re Unimportant 335<br/><br/>Outliers Are (Bad?) People, Too 335<br/><br/>The Fascinating Case of Hadlum v Hadlum 336<br/><br/>Tukey Fences 337<br/><br/>Applying Tukey Fences in a Spreadsheet 338<br/><br/>The Limitations of This Simple Approach 340<br/><br/>Terrible at Nothing, Bad at Everything 341<br/><br/>Preparing Data for Graphing 342<br/><br/>Creating a Graph 345<br/><br/>Getting the k Nearest Neighbors 347<br/><br/>Graph Outlier Detection Method 1: Just Use the Indegree 348<br/><br/>Graph Outlier Detection Method 2: Getting Nuanced with k-Distance 351<br/><br/>Graph Outlier Detection Method 3: Local Outlier Factors Are Where It’s At 353<br/><br/>Wrapping Up 358<br/><br/>10 Moving from Spreadsheets into R 361<br/><br/>Getting Up and Running with R 362<br/><br/>Some Simple Hand-Jamming 363<br/><br/>Reading Data into R 370<br/><br/>Doing Some Actual Data Science 372<br/><br/>Spherical K-Means on Wine Data in Just a Few Lines 372<br/><br/>Building AI Models on the Pregnancy Data 378<br/><br/>Forecasting in R 385<br/><br/>Looking at Outlier Detection 389<br/><br/>Wrapping Up 394<br/><br/>Conclusion 395<br/><br/>Where Am I? What Just Happened? 395<br/><br/>Before You Go-Go 395<br/><br/>Get to Know the Problem 396<br/><br/>We Need More Translators 397<br/><br/>Beware the Three-Headed Geek-Monster: Tools, Performance, and Mathematical Perfection 397<br/><br/>You Are Not the Most Important Function of Your Organization 400<br/><br/>Get Creative and Keep in Touch! 400<br/><br/>Index 401
520 ## - SUMMARY, ETC.
Summary, etc. DESCRIPTION<br/>The book provides nine tutorials on optimization, machine learning, data mining, and forecasting all within the confines of a spreadsheet. Each tutorial uses a real-world problem and the author guides the reader using query’s the reader might ask as how to craft a solution using the correct data science technique. Hosting these nine spreadsheets for download will be necessary so that the reader can work the problems along with the book.<br/><br/>Important topics covered by the book:<br/><br/>Linear and integer programming<br/>K-nearest neighbors graphs and clustering<br/>Logistic regression<br/>Demand forecasting with seasonal adjustments<br/>Price sensitivity, revenue optimization, and price-sensitive forecasting<br/>Naïve Bayes classification<br/>Outlier detection using graphs and Local Outlier Factors<br/>Multi-criteria decision analysis
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Data mining
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Web usage mining
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Book
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Collection code Home library Current library Shelving location Date acquired Source of acquisition Total Checkouts Full call number Accession Number Date last seen Copy number Cost, replacement price Price effective from Koha item type
    Dewey Decimal Classification     IT & Decisions Sciences Indian Institute of Management LRC Indian Institute of Management LRC General Stacks 05/01/2019 Gratis Book   006.31 FOR 000024 09/04/2019 1 799.00 09/04/2019 Book

©2019-2020 Learning Resource Centre, Indian Institute of Management Bodhgaya

Powered by Koha