Machine learning for factor investing: R version

By: Coqueret, Guillaume Contributor(s): Guida, Tony Material type: Text

TextPublication details: Boco Raton CRC Press 2021 Description: xix, 321 pISBN: 9780367545864Subject(s): Machine learning | R (Computer program language) | Investments--Data processingDDC classification: 332.64 Summary: Machine learning (ML) is progressively reshaping the fields of quantitative finance and algorithmic trading. ML tools are increasingly adopted by hedge funds and asset managers, notably for alpha signal generation and stocks selection. The technicality of the subject can make it hard for non-specialists to join the bandwagon, as the jargon and coding requirements may seem out of reach. Machine Learning for Factor Investing: R Version bridges this gap. It provides a comprehensive tour of modern ML-based investment strategies that rely on firm characteristics. The book covers a wide array of subjects which range from economic rationales to rigorous portfolio back-testing and encompass both data processing and model interpretability. Common supervised learning algorithms such as tree models and neural networks are explained in the context of style investing and the reader can also dig into more complex techniques like autoencoder asset returns, Bayesian additive trees, and causal models. All topics are illustrated with self-contained R code samples and snippets that are applied to a large public dataset that contains over 90 predictors. The material, along with the content of the book, is available online so that readers can reproduce and enhance the examples at their convenience. If you have even a basic knowledge of quantitative finance, this combination of theoretical concepts and practical illustrations will help you learn quickly and deepen your financial and technical expertise.

List(s) this item appears in: Finance & Accounting | Non Fiction

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes ( 2 )
Comments ( 0 )

Holdings
Item type	Current library	Collection	Call number	Copy number	Status	Date due	Barcode
Book	Indian Institute of Management LRC General Stacks	Finance & Accounting	332.64 COQ (Browse shelf(Opens below))	1	Available		004218

Table of Contents
I Introduction

1. Preface
What this book is not about
The targeted audience
How this book is structured
Companion website
Why R?
Coding instructions
Acknowledgements
Future developments

2. Notations and data
Notations
Dataset

3. Introduction
Context
Portfolio construction: the workflow
Machine Learning is no Magic Wand

4. Factor investing and asset pricing anomalies
Introduction
Detecting anomalies
Simple portfolio sorts
Factors
Predictive regressions, sorts, and p-value issues
Fama-Macbeth regressions
Factor competition
Advanced techniques
Factors or characteristics?
Hot topics: momentum, timing and ESG
Factor momentum
Factor timing
The green factors
The link with machine learning
A short list of recent references
Explicit connections with asset pricing models
Coding exercises

5. Data preprocessing
Know your data
Missing data
Outlier detection
Feature engineering
Feature selection
Scaling the predictors
Labelling
Simple labels
Categorical labels
The triple barrier method
Filtering the sample
Return horizons
Handling persistence
Extensions
Transforming features
Macro-economic variables
Active learning
Additional code and results
Impact of rescaling: graphical representation
Impact of rescaling: toy example
Coding exercises

II Common supervised algorithms

6. Penalized regressions and sparse hedging for minimum variance portfolios
Penalised regressions
Simple regressions
Forms of penalizations
Illustrations
Sparse hedging for minimum variance portfolios
Presentation and derivations
Example
Predictive regressions
Literature review and principle
Code and results
Coding exercise

7. Tree-based methods
Simple trees
Principle
Further details on classification
Pruning criteria
Code and interpretation
Random forests
Principle
Code and results
Boosted trees: Adaboost
Methodology
Illustration
Boosted trees: extreme gradient boosting
Managing Loss
Penalisation
Aggregation
Tree structure
Extensions
Code and results
Instance weighting
Discussion
Coding exercises

8. Neural networks
The original perceptron
Multilayer perceptron (MLP)
Introduction and notations
Universal approximation
Learning via back-propagation
Further details on classification
How deep should we go? And other practical issues
Architectural choices
Frequency of weight updates and learning duration
Penalizations and dropout
Code samples and comments for vanilla MLP
Regression example
Classification example
Custom losses
Recurrent networks
Presentation
Code and results
Other common architectures
Generative adversarial networks
Auto-encoders
A word on convolutional networks
Advanced architectures
Coding exercise

9. Support vector machines
SVM for classification
SVM for regression
Practice
Coding exercises

10. Bayesian methods
The Bayesian framework
Bayesian sampling
Gibbs sampling
Metropolis-Hastings sampling
Bayesian linear regression
Naive Bayes classifier
Bayesian additive trees
General formulation
Priors
Sampling and predictions
Code

III From predictions to portfolios

11. Validating and tuning
Learning metrics
Regression analysis
Classification analysis
Validation
The variance-bias tradeoff: theory
The variance-bias tradeoff: illustration
The risk of overfitting: principle
The risk of overfitting: some solutions
The search for good hyperparameters
Methods
Example: grid search
Example: Bayesian optimization
Short discussion on validation in backtests

12. Ensemble models
Linear ensembles
Principles
Example
Stacked ensembles
Two stage training
Code and results
Extensions
Exogenous variables
Shrinking inter-model correlations
Exercise

13. Portfolio backtesting
Setting the protocol
Turning signals into portfolio weights
Performance metrics
Discussion
Pure performance and risk indicators
Factor-based evaluation
Risk-adjusted measures
Transaction costs and turnover
Common errors and issues
Forward looking data
Backtest overfitting
Simple safeguards
Implication of non-stationarity: forecasting is hard
General comments
The no free lunch theorem
Example
Coding exercises

IV Further important topics

14. Interpretability
Global interpretations
Simple models as surrogates
Variable importance (tree-based)
Variable importance (agnostic)
Partial dependence plot
Local interpretations
LIME
Shapley values
Breakdown

15. Two key concepts: causality and non-stationarity
Causality
Granger causality
Causal additive models
Structural time-series models
Dealing with changing environments
Non-stationarity: yet another illustration
Online learning
Homogeneous transfer learning

16. Unsupervised learning
The problem with correlated predictors
Principal component analysis and autoencoders
A bit of algebra
PCA
Autoencoders
Application
Clustering via k-means
Nearest neighbors
Coding exercise

17. Reinforcement learning
Theoretical layout
General framework
Q-learning
SARSA
The curse of dimensionality
Policy gradient
Principle
Extensions
Simple examples
Q-learning with simulations
Q-learning with market data
Concluding remarks
Exercises

Machine learning (ML) is progressively reshaping the fields of quantitative finance and algorithmic trading. ML tools are increasingly adopted by hedge funds and asset managers, notably for alpha signal generation and stocks selection. The technicality of the subject can make it hard for non-specialists to join the bandwagon, as the jargon and coding requirements may seem out of reach. Machine Learning for Factor Investing: R Version bridges this gap. It provides a comprehensive tour of modern ML-based investment strategies that rely on firm characteristics.

The book covers a wide array of subjects which range from economic rationales to rigorous portfolio back-testing and encompass both data processing and model interpretability. Common supervised learning algorithms such as tree models and neural networks are explained in the context of style investing and the reader can also dig into more complex techniques like autoencoder asset returns, Bayesian additive trees, and causal models.

All topics are illustrated with self-contained R code samples and snippets that are applied to a large public dataset that contains over 90 predictors. The material, along with the content of the book, is available online so that readers can reproduce and enhance the examples at their convenience. If you have even a basic knowledge of quantitative finance, this combination of theoretical concepts and practical illustrations will help you learn quickly and deepen your financial and technical expertise.

There are no comments on this title.

to post a comment.