7个最常见的机器学习任务及相关方法

jopen 10年前

This article represents some of the most common machine learning tasks that one may come across while trying to solve a machine learning problem. Under each tasks are also listed a set of machine learning methods that could be used to resolve these tasks. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

Following are the key machine learning tasks briefed later in this article:

Regression
Classification
Clustering
Multivariate querying
Density estimation
Dimension reduction
Testing and matching

Following are top 7 most common machine learning tasks that one could come across most frequently while solving an advanced analystics problem:

Regression: Regression tasks mainly deal with estimation of numerical values (continuous variables). Some of the examples include estimation of housing price, product price, stock price etc. Some of the following ML methods could be used for solving regressions problems:
- Kernel regression (Higher accuracy)
- Gaussian process regression (Higher accuracy)
- Regression trees
- Linear regression
- Support vector regression
- LASSO
Classification: Classification tasks is simply related with predicting a category of a data (discrete variables). One of the most common example is predicting whether or not an email if spam or ham. Some of the common usecases could be found in the area of healthcare such as whether a person is suffering from a particular disease or not. It also has its application in financial usecases such as determining whether a transaction is fraud or not. The ML methods such as following could be applied to solve classification tasks:
- Kernel discriminant analysis (Higher accuracy)
- K-Nearest Neighbors (Higher accuracy)
- Artificial neural networks (ANN) (Higher accuracy)
- Support vector machine (SVM) (Higher accuracy)
- Random forests (Higher accuracy)
- Decision trees
- Boosted trees
- Logistic regression
- naive Bayes
- Deep learning
Clustering: Clustering tasks are all about finding natural groupings of data and a label associated with each of these groupings (clusters). Some of the common example includes customer segmentation, product features identification for product roadmap. Some of the following are common ML methods:
- Mean-shift (Higher accuracy)
- Hierarchical clustering
- K-means
- Topic models
Multivariate querying: Multivariate querying is about querying or finding similar objects. Some of the following ML methods could be used for such problems:
- Nearest neighbors
- Range search
- Farthest neighbors
Density estimation: Density estimation problems are related with finding likelihood or frequency of objects. In probability and statistics, density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. Some of the following ML methods could be used for solving density estimation tasks:
- Kernel density estimation (Higher accuracy)
- Mixture of Gaussians
- Density estimation tree
Dimension reduction: As per Wikipedia page on Dimension reduction , Dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. Following are some of ML methods that could be used for dimension reduction:
- Manifold learning/KPCA (Higher accuracy)
- Principal component analysis
- Independent component analysis
- Gaussian graphical models
- Non-negative matrix factorization
- Compressed sensing
Testing and matching: Testing and matching tasks relates to comparing data sets. Following are some of the methods that could be used for such kind of problems:
- Minimum spanning tree
- Bipartite cross-matching
- N-point correlation

7个最常见的机器学习任务及相关方法

相关经验

目录