大规模线性分类、回归和排序用的python库:lightning

jopen 10年前

python下大规模线性分类、回归和排序用的库,支持SDCA、Prox-SDCA、SGD, AdaGrad, SAG, SVRG、FISTA, SpaRSA,亮点:和scikit-learn使用相同的API约定、原生支持数据的密集和稀疏表示、计算密集模块用Cython开发。

Highlights:

  • follows the scikit-learn API conventions
  • supports natively both dense and sparse data representations
  • computationally demanding parts implemented in Cython

Solvers supported:

  • primal coordinate descent
  • dual coordinate descent (SDCA, Prox-SDCA)
  • SGD, AdaGrad, SAG, SVRG
  • FISTA, SpaRSA
from sklearn.datasets import fetch_20newsgroups_vectorized  from lightning.classification import CDClassifier    # Load News20 dataset from scikit-learn.  bunch = fetch_20newsgroups_vectorized(subset="all")  X = bunch.data  y = bunch.target    # Set classifier options.  clf = CDClassifier(penalty="l1/l2",                     loss="squared_hinge",                     multiclass=True,                     max_iter=20,                     alpha=1e-4,                     C=1.0 / X.shape[0],                     tol=1e-3)    # Train the model.  clf.fit(X, y)    # Accuracy  print clf.score(X, y)    # Percentage of selected features  print clf.n_nonzero(percentage=True)

项目主页:http://www.open-open.com/lib/view/home/1421574088171