Supervised machine learning for text analysis in R (Record no. 5738)
[ view plain ]
000 -LEADER | |
---|---|
fixed length control field | 04891nam a22001937a 4500 |
005 - DATE AND TIME OF LATEST TRANSACTION | |
control field | 20240207115630.0 |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION | |
fixed length control field | 240207b |||||||| |||| 00| 0 eng d |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
International Standard Book Number | 9780367554194 |
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER | |
Classification number | 006.35 |
Item number | HVI |
100 ## - MAIN ENTRY--PERSONAL NAME | |
Personal name | Hvitfeldt, Emil |
245 ## - TITLE STATEMENT | |
Title | Supervised machine learning for text analysis in R |
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT) | |
Name of publisher, distributor, etc. | CRC Press |
Place of publication, distribution, etc. | Boco Raton |
Date of publication, distribution, etc. | 2022 |
300 ## - PHYSICAL DESCRIPTION | |
Extent | xix, 381 p. |
365 ## - TRADE PRICE | |
Price type code | GBP |
Price amount | 48.99 |
520 ## - SUMMARY, ETC. | |
Summary, etc. | I Natural Language Features<br/><br/>1. Language and modeling<br/><br/>Linguistics for text analysis<br/><br/>A glimpse into one area: morphology<br/><br/>Different languages<br/><br/>Other ways text can vary<br/><br/>Summary<br/><br/>2. Tokenization<br/><br/>What is a token?<br/><br/>Types of tokens<br/><br/>Character tokens<br/><br/>Word tokens<br/><br/>Tokenizing by n-grams<br/><br/>Lines, sentence, and paragraph tokens<br/><br/>Where does tokenization break down?<br/><br/>Building your own tokenizer<br/><br/>Tokenize to characters, only keeping letters<br/><br/>Allow for hyphenated words<br/><br/>Wrapping it in a function<br/><br/>Tokenization for non-Latin alphabets<br/><br/>Tokenization benchmark<br/><br/>Summary<br/><br/>3. Stop words<br/><br/>Using premade stop word lists<br/><br/>Stop word removal in R<br/><br/>Creating your own stop words list<br/><br/>All stop word lists are context-specific<br/><br/>What happens when you remove stop words<br/><br/>Stop words in languages other than English<br/><br/>Summary<br/><br/>4. Stemming<br/><br/>How to stem text in R<br/><br/>Should you use stemming at all?<br/><br/>Understand a stemming algorithm<br/><br/>Handling punctuation when stemming<br/><br/>Compare some stemming options<br/><br/>Lemmatization and stemming<br/><br/>Stemming and stop words<br/><br/>Summary<br/><br/>5. Word Embeddings<br/><br/>Motivating embeddings for sparse, high-dimensional data<br/><br/>Understand word embeddings by finding them yourself<br/><br/>Exploring CFPB word embeddings<br/><br/>Use pre-trained word embeddings<br/><br/>Fairness and word embeddings<br/><br/>Using word embeddings in the real world<br/><br/>Summary<br/><br/>II Machine Learning Methods<br/><br/>Regression<br/><br/>A first regression model<br/><br/>Building our first regression model<br/><br/>Evaluation<br/><br/>Compare to the null model<br/><br/>Compare to a random forest model<br/><br/>Case study: removing stop words<br/><br/>Case study: varying n-grams<br/><br/>Case study: lemmatization<br/><br/>Case study: feature hashing<br/><br/>Text normalization<br/><br/>What evaluation metrics are appropriate?<br/><br/>The full game: regression<br/><br/>Preprocess the data<br/><br/>Specify the model<br/><br/>Tune the model<br/><br/>Evaluate the modeling<br/><br/>Summary<br/><br/>Classification<br/><br/>A first classification model<br/><br/>Building our first classification model<br/><br/>Evaluation<br/><br/>Compare to the null model<br/><br/>Compare to a lasso classification model<br/><br/>Tuning lasso hyperparameters<br/><br/>Case study: sparse encoding<br/><br/>Two class or multiclass?<br/><br/>Case study: including non-text data<br/><br/>Case study: data censoring<br/><br/>Case study: custom features<br/><br/>Detect credit cards<br/><br/>Calculate percentage censoring<br/><br/>Detect monetary amounts<br/><br/>What evaluation metrics are appropriate?<br/><br/>The full game: classification<br/><br/>Feature selection<br/><br/>Specify the model<br/><br/>Evaluate the modeling<br/><br/>Summary<br/><br/>III Deep Learning Methods<br/><br/>Dense neural networks<br/><br/>Kickstarter data<br/><br/>A first deep learning model<br/><br/>Preprocessing for deep learning<br/><br/>One-hot sequence embedding of text<br/><br/>Simple flattened dense network<br/><br/>Evaluation<br/><br/>Using bag-of-words features<br/><br/>Using pre-trained word embeddings<br/><br/>Cross-validation for deep learning models<br/><br/>Compare and evaluate DNN models<br/><br/>Limitations of deep learning<br/><br/>Summary<br/><br/>Long short-term memory (LSTM) networks<br/><br/>A first LSTM model<br/><br/>Building an LSTM<br/><br/>Evaluation<br/><br/>Compare to a recurrent neural network<br/><br/>Case study: bidirectional LSTM<br/><br/>Case study: stacking LSTM layers<br/><br/>Case study: padding<br/><br/>Case study: training a regression model<br/><br/>Case study: vocabulary size<br/><br/>The full game: LSTM<br/><br/>Preprocess the data<br/><br/>Specify the model<br/><br/>Summary<br/><br/>Convolutional neural networks<br/><br/>What are CNNs?<br/><br/>Kernel<br/><br/>Kernel size<br/><br/>A first CNN model<br/><br/>Case study: adding more layers<br/><br/>Case study: byte pair encoding<br/><br/>Case study: explainability with LIME<br/><br/>Case study: hyperparameter search<br/><br/>The full game: CNN<br/><br/>Preprocess the data<br/><br/>Specify the model<br/><br/>Summary<br/><br/>IV Conclusion<br/><br/>Text models in the real world<br/><br/>Appendix<br/><br/>A Regular expressions<br/><br/>A Literal characters<br/><br/>A Meta characters<br/><br/>A Full stop, the wildcard<br/><br/>A Character classes<br/><br/>A Shorthand character classes<br/><br/>A Quantifiers<br/><br/>A Anchors<br/><br/>A Additional resources<br/><br/>B Data<br/><br/>B Hans Christian Andersen fairy tales<br/><br/>B Opinions of the Supreme Court of the United States<br/><br/>B Consumer Financial Protection Bureau (CFPB) complaints<br/><br/>B Kickstarter campaign blurbs<br/><br/>C Baseline linear classifier<br/><br/>C Read in the data<br/><br/>C Split into test/train and create resampling folds<br/><br/>C Recipe for data preprocessing<br/><br/>C Lasso regularized classification model<br/><br/>C A model workflow<br/><br/>C Tune the workflow |
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Computational linguistics - Statistical methods |
700 ## - ADDED ENTRY--PERSONAL NAME | |
Personal name | Silge, Julia |
942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
Koha item type | Book |
Source of classification or shelving scheme | Dewey Decimal Classification |
Withdrawn status | Lost status | Source of classification or shelving scheme | Damaged status | Not for loan | Collection code | Bill No | Bill Date | Home library | Current library | Shelving location | Date acquired | Source of acquisition | Cost, normal purchase price | Total Checkouts | Full call number | Accession Number | Date last seen | Copy number | Cost, replacement price | Price effective from | Koha item type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dewey Decimal Classification | IT & Decisions Sciences | 2023-24/1525 | 26-12-2023 | Indian Institute of Management LRC | Indian Institute of Management LRC | General Stacks | 02/07/2024 | Indica Publishers & Distributors Pvt. Ltd. | 3435.91 | 006.35 HVI | 005569 | 02/07/2024 | 1 | 5286.02 | 02/07/2024 | Book |