机器学习指南资源集合
jopen
9年前
Miscellaneous
- A curated list of awesome Machine Learning frameworks, libraries and software
- A curated list of awesome data visualization libraries and resources.
- An awesome Data Science repository to learn and apply for real world problems
- The Open Source Data Science Masters
- Machine Learning FAQs on Cross Validated
- List of Machine Learning University Courses
- Machine Learning algorithms that you should always have a strong understanding of
- Differnce between Linearly Independent, Orthogonal, and Uncorrelated Variables
- List of Machine Learning Concepts
- Slides on Several Machine Learning Topics
- MIT Machine Learning Lecture Slides
- Comparison Supervised Learning Algorithms
- Learning Data Science Fundamentals
- Machine Learning mistakes to avoid
- Statistical Machine Learning Course
- TheAnalyticsEdge edX Notes and Codes
Interview Resources
- How can a computer science graduate student prepare himself for data scientist interviews?
- How do I learn Machine Learning?
- FAQs about Data Science Interviews
- What are the key skills of a data scientist?
Artificial Intelligence
- Awesome Artificial Intelligence (GitHub Repo)
- edX course | Klein & Abbeel
- Udacity Course | Norvig & Thrun
- TED talks on AI
Genetic Algorithms
- Genetic Algorithms Wikipedia Page
- Simple Implementation of Genetic Algorithms in Python (Part 1), Part 2
- Genetic Algorithms vs Artificial Neural Networks
- Genetic Algorithms Explained in Plain English
- Genetic Programming
- Genetic Programming in Python (GitHub)
- Genetic Alogorithms vs Genetic Programming (Quora), StackOverflow
Statistics
- Stat Trek Website - A dedicated website to teach yourselves Statistics
- Learn Statistics Using Python - Learn Statistics using an application-centric programming approach
- Statistics for Hackers | Slides | @jakevdp - Slides by Jake VanderPlas
- Online Statistics Book - An Interactive Multimedia Course for Studying Statistics
- What is a Sampling Distribution?
- Tutorials </li>
- What is an Unbiased Estimator?
- Goodness of Fit Explained
- What are QQ Plots? </ul>
- Edwin Chen's Blog - A blog about Math, stats, ML, crowdsourcing, data science
- The Data School Blog - Data science for beginners!
- ML Wave - A blog for Learning Machine Learning
- Andrej Karpathy - A blog about Deep Learning and Data Science in general
- Colah's Blog - Awesome Neural Networks Blog
- Alex Minnaar's Blog - A blog about Machine Learning and Software Engineering
- Statistically Significant - Andrew Landgraf's Data Science Blog
- Simply Statistics - A blog by three biostatistics professors
- Yanir Seroussi's Blog - A blog about Data Science and beyond
- fastML - Machine learning made easy
- Trevor Stephens Blog - Trevor Stephens Personal Page
- no free hunch | kaggle - The Kaggle Blog about all things Data Science
- A Quantitative Journey | outlace - learning quantitative applications
- r4stats - analyze the world of data science, and to help people learn to use R
- Variance Explained - David Robinson's Blog
- AI Junkie - a blog about Artificial Intellingence
- Most Viewed Machine Learning writers
- Data Science Topic on Quora
- William Chen's Answers
- Michael Hochster's Answers
- Ricardo Vladimiro's Answers
- Storytelling with Statistics
- Data Science FAQs on Quora
- Machine Learning FAQs on Quora
- How to almost win Kaggle Competitions
- Convolution Neural Networks for EEG detection
- 非死book Recruiting III Explained
- Predicting CTR with Online ML
- Does Balancing Classes Improve Classifier Performance?
- What is Deviance?
- When to choose which machine learning classifier?
- What are the advantages of different classification algorithms?
- ROC and AUC Explained
- An introduction to ROC analysis
- Simple guide to confusion matrix terminology
- General
- Assumptions of Linear Regression, Stack Exchange
- Linear Regression Comprehensive Resource
- Applying and Interpreting Linear Regression
- What does having constant variance in a linear regression model mean?
- Difference between linear regression on y with x and x with y
- Is linear regression valid when the dependant variable is not normally distributed?
-
Multicollinearity and VIF
- Dummy Variable Trap | Multicollinearity
- Dealing with multicollinearity using VIFs </ul> </li>
-
- Interpreting plot.lm() in R
- How to interpret a QQ plot?
- Interpreting Residuals vs Fitted Plot </ul> </li>
-
- How should outliers be dealt with? </ul> </li>
-
- Regularization and Variable Selection via the Elastic Net </ul> </li> </ul>
- Logistic Regression Wiki
- Geometric Intuition of Logistic Regression
- Obtaining predicted categories (choosing threshold)
- Residuals in logistic regression
- Difference between logit and probit models, Logistic Regression Wiki, Probit Model Wiki
- Pseudo R2 for Logistic Regression, How to calculate, Other Details
-
- Training with Full dataset after CV?
- Which CV method is best?
- Variance Estimates in k-fold CV
- Is CV a subsitute for Validation Set?
- Choice of k in k-fold CV
- CV for ensemble learning
- k-fold CV in R
- Good Resources
- Overfitting and Cross Validation
- Preventing Overfitting the Cross Validation Data | Andrew Ng
- Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation
- CV for detecting and preventing Overfitting
- How does CV overcome the Overfitting Problem
- Bootstrapping
- Why Bootstrapping Works?
- Good Animation
- Example of Bootstapping
- Understanding Bootstapping for Validation and Model Selection
- Cross Validation vs Bootstrap to estimate prediction error, Cross-validation vs .632 bootstrapping to evaluate classification performance </ul> </li> </ul>
- A curated list of awesome Deep Learning tutorials, projects and communities
- Lots of Deep Learning Resources
- Interesting Deep Learning and NLP Projects (Stanford), Website
- Core Concepts of Deep Learning
- Understanding Natural Language with Deep Neural Networks Using Torch
- Stanford Deep Learning Tutorial
- Deep Learning FAQs on Quora
- Google+ Deep Learning Page
- Recent Reddit AMAs related to Deep Learning, Another AMA
- Where to Learn Deep Learning?
- Deep Learning nvidia concepts
- Introduction to Deep Learning Using Python (GitHub), Good Introduction Slides
- Video Lectures Oxford 2015, Video Lectures Summer School Montreal
- Deep Learning Software List
- Hacker's guide to Neural Nets
- Top arxiv Deep Learning Papers explained
- Geoff Hinton 油Tube Vidoes on Deep Learning
- Awesome Deep Learning Reading List
- Deep Learning Comprehensive Website, Software
- deeplearning Tutorials
- AWESOME! Deep Learning Tutorial
- Deep Learning Basics
- Stanford Tutorials
- Train, Validation & Test in Artificial Neural Networks
- Artificial Neural Networks Tutorials
- Neural Networks FAQs on Stack Overflow
-
Neural Machine Translation
- Introduction to Neural Machine Translation with GPUs (part 1), Part 2, Part 3
- Deep Speech: Accurate Speech Recognition with GPU-Accelerated Deep Learning
-
Deep Learning Frameworks
- Torch vs. Theano
- dl4j vs. torch7 vs. theano
- </li>
-
- Torch ML Tutorial, Code
- Intro to Torch
- Learning Torch GitHub Repo
- Awesome-Torch (Repository on GitHub)
- Machine Learning using Torch Oxford Univ, Code
- Torch Internals Overview
- Torch Cheatsheet
- Understanding Natural Language with Deep Neural Networks Using Torch </ul> </li>
-
Caffe
- Deep Learning for Computer Vision with Caffe and cuDNN </ul> </li>
-
TensorFlow
- Website
- TensorFlow Examples for Beginners
- Learning TensorFlow GitHub Repo
- Benchmark TensorFlow GitHub </ul> </li> </ul> </li> </ul>
- Feed Forward Networks
- Implementing a Neural Network from scratch, Code
- Speeding up your Neural Network with Theano and the gpu, Code
- Basic ANN Theory
- Role of Bias in Neural Networks
- Choosing number of hidden layers and nodes,2,3
- Backpropagation Explained
- ANN implemented in C++ | AI Junkie
- Simple Implementation
- NN for Beginners
- Regression and Classification with NNs (Slides)
- Another Intro
- Recurrent and LSTM Networks
- awesome-rnn: list of resources (GitHub Repo)
- Recurrent Neural Net Tutorial Part 1, Part 2, Part 3, Code
- NLP RNN Representations
- The Unreasonable effectiveness of RNNs, Torch Code, Python Code
- Intro to RNN, LSTM
- An application of RNN
- Optimizing RNN Performance
- Simple RNN
- Auto-Generating Clickbait with RNN
- Sequence Learning using RNN (Slides)
- Machine Translation using RNN (Paper)
- Music generation using RNNs (Keras)
- Using RNN to create on-the-fly dialogue (Keras)
- Long Short Term Memory (LSTM)
- Understanding LSTM Networks
- LSTM explained
- Beginner’s Guide to LSTM
- Implementing LSTM from scratch, Python/Theano code
- Torch Code for character-level language models using LSTM
- LSTM for Kaggle EEG Detection competition (Torch Code)
- LSTM for Sentiment Analysis in Theano
- Deep Learning for Visual Q&A | LSTM | CNN, Code
- Computer Responds to email using LSTM | Google
- LSTM dramatically improves Google Voice Search, Another Article
- Understanding Natural Language with LSTM Using Torch
- Torch code for Visual Question Answering using a CNN+LSTM model
- Gated Recurrent Units (GRU)
- LSTM vs GRU </ul> </li> </ul> </li> </ul>
- Recursive Neural Network (not Recurrent) </li> </ul>
- Restricted Boltzmann Machine
- Beginner's Guide about RBMs
- Another Good Tutorial
- Introduction to RBMs
- Hinton's Guide to Training RBMs
- RBMs in R
- Deep Belief Networks Tutorial
- word2vec, DBN, RNTN for Sentiment Analysis
- Autoencoders: Unsupervised (applies BackProp after setting target = input)
- Andrew Ng Sparse Autoencoders pdf
- Deep Autoencoders Tutorial
- Denoising Autoencoders, Theano Code
- Stacked Denoising Autoencoders
- Convolution Networks
- Awesome Deep Vision: List of Resources (GitHub)
- Intro to CNNs
- Understanding CNN for NLP
- Stanford Notes, Codes, GitHub
- JavaScript Library (Browser Based) for CNNs
- Using CNNs to detect facial keypoints
- Deep learning to classify business photos at Yelp
- Interview with Yann LeCun | Kaggle
- Visualising and Understanding CNNs
Natural Language Processing
- A curated list of speech and natural language processing resources
- Understanding Natural Language with Deep Neural Networks Using Torch
- tf-idf explained
- Interesting Deep Learning NLP Projects Stanford, Website
- NLP from Scratch | Google Paper
- Graph Based Semi Supervised Learning for NLP
- Bag of Words </li>
- Topic Modeling
- LDA, LSA, Probabilistic LSA
- Awesome LDA Explanation!. Another good explanation
- The LDA Buffet- Intuitive Explanation
- Difference between LSI and LDA
- Original LDA Paper
- alpha and beta in LDA
- Intuitive explanation of the Dirichlet distribution
- Topic modeling made just simple enough
- Online LDA, Online LDA with Spark
- LDA in Scala, Part 2
- Segmentation of 推ter Timelines via Topic Modeling
- Topic Modeling of 推ter Followers </ul> </li> </ul>
-
word2vec
- Google word2vec
- Bag of Words Model Wiki
- A closer look at Skip Gram Modeling
- Skip Gram Model Tutorial, CBoW Model
- Word Vectors Kaggle Tutorial Python, Part 2
- Making sense of word2vec
- word2vec explained on deeplearning4j
- Quora word2vec
- Other Quora Resources, 2, 3
- word2vec, DBN, RNTN for Sentiment Analysis
-
Text Clustering
- How string clustering works
- Levenshtein distance for measuring the difference between two sequences
- Text clustering with Levenshtein distances </ul> </li>
-
Text Classification
- Classification Text with Bag of Words </ul> </li>
- Kaggle Tutorial Bag of Words and Word vectors, Part 2, Part 3
- What would Shakespeare say (NLP Tutorial)
- A closer look at Skip Gram Modeling </ul>
- Highest Voted Questions about SVMs on Cross Validated
- Help me Understand SVMs!
- SVM in Layman's terms
- How does SVM Work | Comparisons
- A tutorial on SVMs
- Practical Guide to SVC, Slides
- Introductory Overview of SVMs
- Comparisons </li>
- Optimization Algorithms in Support Vector Machines
- Variable Importance from SVM
- Software
- LIBSVM
- Intro to SVM in R </ul> </li>
- Kernels
- What are Kernels in ML and SVM?
- Intuition Behind Gaussian Kernel in SVMs? </ul> </li>
- Probabilities post SVM
- Platt's Probabilistic Outputs for SVM
- Platt Calibration Wiki
- Why use Platts Scaling
- Classifier Classification with Platt's Scaling </ul> </li> </ul>
- Wikipedia Page - Lots of Good Info
- FAQs about Decision Trees
- Brief Tour of Trees and Forests
- Tree Based Models in R
- How Decision Trees work?
- Weak side of Decision Trees
- Thorough Explanation and different algorithms
- What is entropy and information gain in the context of building decision trees?
- Slides Related to Decision Trees
- How do decision tree learning algorithms deal with missing values?
- Using Surrogates to Improve Datasets with Missing Values
- Good Article
- Are decision trees almost always binary trees?
- Pruning Decision Trees, Grafting of Decision Trees
- What is Deviance in context of Decision Trees?
- Comparison of Different Algorithms </li>
- CART
- Recursive Partitioning Wikipedia
- CART Explained
- How to measure/rank “variable importance” when using CART?
- Pruning a Tree in R
- Does rpart use multivariate splits by default?
- FAQs about Recursive Partitioning </ul> </li>
- CTREE
- party package in R
- Show volumne in each node using ctree in R
- How to extract tree structure from ctree function? </ul> </li>
- CHAID
- Wikipedia Artice on CHAID
- Basic Introduction to CHAID
- Good Tutorial on CHAID </ul> </li>
- MARS
- Wikipedia Article on MARS </ul> </li>
- Probabilistic Decision Trees
- Bayesian Learning in Probabilistic Decision Trees
- Probabilistic Trees Research Paper </ul> </li> </ul>
- Awesome Random Forest (GitHub)**
- How to tune RF parameters in practice?
- Measures of variable importance in random forests
- Compare R-squared from two different Random Forest models
- OOB Estimate Explained | RF vs LDA
- Evaluating Random Forests for Survival Analysis Using Prediction Error Curve
- Why doesn't Random Forest handle missing values in predictors?
- How to build random forests in R with missing (NA) values?
- FAQs about Random Forest, More FAQs
- Obtaining knowledge from a random forest
- Some Questions for R implementation, 2, 3
- Boosting for Better Predictions
- Boosting Wikipedia Page
- Introduction to Boosted Trees | Tianqi Chen
-
Gradient Boosting Machine
</li> -
xgboost
- xgboost tuning kaggle
- xgboost vs gbm
- xgboost survey </ul> </li>
- AdaBoost
- AdaBoost Wiki, Python Code
- AdaBoost Sparse Input Support
- adaBag R package
- Tutorial </ul> </li> </ul>
- Wikipedia Article on Ensemble Learning
- Kaggle Ensembling Guide
- The Power of Simple Ensembles
- Ensemble Learning Intro
- Ensemble Learning Paper
- Ensembling models with R, Ensembling Regression Models in R, Intro to Ensembles in R
- Ensembling Models with caret
- Bagging vs Boosting vs Stacking
- Good Resources | Kaggle Africa Soil Property Prediction
- Boosting vs Bagging
- Resources for learning how to implement ensemble methods
- How are classifications merged in an ensemble classifier?
- Stacking, Blending and Stacked Generalization
- Stacked Generalization (Stacking)
- Stacked Generalization: when does it work?
- Stacked Generalization Paper
- Wikipedia article on VC Dimension
- Intuitive Explanantion of VC Dimension
- Video explaining VC Dimension
- Introduction to VC Dimension
- FAQs about VC Dimension
- Do ensemble techniques increase VC-dimension?
- Bayesian Methods for Hackers (using pyMC)
- Should all Machine Learning be Bayesian?
- Tutorial on Bayesian Optimisation for Machine Learning
- Bayesian Reasoning and Deep Learning, Slides
- Bayesian Statistics Made Simple
- Kalman & Bayesian Filters in Python
- Markov Chain Wikipedia Page
- Wikipedia article on Semi Supervised Learning
- Tutorial on Semi Supervised Learning
- Graph Based Semi Supervised Learning for NLP
- Taxonomy
- Video Tutorial Weka
- Unsupervised, Supervised and Semi Supervised learning
- Research Papers 1, 2, 3
- Mean Variance Portfolio Optimization with R and Quadratic Programming
- Algorithms for Sparse Optimization and Machine Learning
- Optimization Algorithms in Machine Learning, Video Lecture
- Optimization Algorithms for Data Analysis
- Video Lectures on Optimization
- Optimization Algorithms in Support Vector Machines
- The Interplay of Optimization and Machine Learning Research
- For a collection of Data Science Tutorials using R, please refer to this list.
Ensembles
Stacking Models
Vapnik–Chervonenkis Dimension
Bayesian Machine Learning
Semi Supervised Learning
Optimization
Other Tutorials
Random Forest / Bagging
Boosting
Reinforcement Learning
Decision Trees
- Platt's Probabilistic Outputs for SVM
Computer Vision
Support Vector Machine
- Beginner's Guide about RBMs
- Restricted Boltzmann Machine
Deep Learning
Logistic Regression
Model Validation using Resampling
Useful Blogs
Resources on Quora
Kaggle Competitions WriteUp
Cheat Sheets
Classification
Linear Regression