大规模线性分类、回归和排序用的python库:lightning
jopen
10年前
python下大规模线性分类、回归和排序用的库,支持SDCA、Prox-SDCA、SGD, AdaGrad, SAG, SVRG、FISTA, SpaRSA,亮点:和scikit-learn使用相同的API约定、原生支持数据的密集和稀疏表示、计算密集模块用Cython开发。
Highlights:
- follows the scikit-learn API conventions
- supports natively both dense and sparse data representations
- computationally demanding parts implemented in Cython
Solvers supported:
- primal coordinate descent
- dual coordinate descent (SDCA, Prox-SDCA)
- SGD, AdaGrad, SAG, SVRG
- FISTA, SpaRSA
from sklearn.datasets import fetch_20newsgroups_vectorized from lightning.classification import CDClassifier # Load News20 dataset from scikit-learn. bunch = fetch_20newsgroups_vectorized(subset="all") X = bunch.data y = bunch.target # Set classifier options. clf = CDClassifier(penalty="l1/l2", loss="squared_hinge", multiclass=True, max_iter=20, alpha=1e-4, C=1.0 / X.shape[0], tol=1e-3) # Train the model. clf.fit(X, y) # Accuracy print clf.score(X, y) # Percentage of selected features print clf.n_nonzero(percentage=True)