Tryolabs收集的Python, NLP, Machine Learning资源

jopen 9年前

A curated list of books, libraries, apps and papers we love at Tryolabs. We work with blazing startups and help them build complex projects using Python, NLP & Machine Learning.

Overview

We create amazing Internet & Mobile products for blazing startups. We combine the python ecosystem with machine learning and natural language processing technologies to create heavy backend apps with artificial intelligence components. We follow agile methodologies in order to develop MVPs and full products the lean way.

Training

The training period at Tryolabs is at least two weeks. Its goal is to get up to speed with the tools the company uses. This repo contains a list of tutorials and documentation useful for becoming familiar with the Django/Python ecosystem, as well as some ML and NLP techniques.

During the training period, we recommend doing at least an hour a day of pairing with a mentor who has experience in the team, to get to know the work process and the tools. The goal is to get the mentor to coment on the tasks they are doing to the person in training.

Development Tools

Python

virtualenv and virtualenvwrapper

A very useful development tool that lets us create isolated Python environments for every project, isolating the set of libraries used in the project from the system.

iOS

cocoapods

Package manager for iOS projects. Handle the setup and update of XCode projects to speed up the integration of new components.

nomad

CLI for iOS projects. Has various tools to perform common task from the command line (ex: generate, sign and ditribute OTA an ipa)

Other

Vagrant

Vagrant is a tool for creating isolated, reproducible development environments using virtual machines. It is usually used with VirtualBox, but supports VMWare and other virtualization systems.

Docker

Docker is a tool for creating and managing software containers.

Metamon

Metamon is a tool to automatically set up an isolated execution environment for Django applications.

Source Control

Just use git. A good resource is the Pro Git book by Scott Chacon, and GitHub's help site.

Editors and IDEs

Standards and Conventions

The PEP8 is the definitive reference for Python coding style. The pep8 package can be used to scan code and find parts that don't conform to the PEP8 standard.

With Emacs, the emacs-pep8 package can be used to run the pep8.py script.

Deploying

We use Ansible for all our deployment and server orchestration tasks.

Databases

Relational

Just use Postgres. It's not just a database, it's a complete "relational database framework" that provides full-text search, GIS and extensive documentation of every knob and lever.

NoSQL

Are you sure Postgres can't do what you want?

Document

Key-Value

Graph

Libraries

Machine Learning

Web

Books

This list of books represents, in our opinion, a good balance between theory and practice. We don't expect everyone to read all of these, rather, they should take a few books from this common list.

Machine Learning

Information Retrieval

Computer Vision

Scala

Software Architecture

Programming Language

Papers

Information Retrieval

General

Web Design

Aggregators

Icons

Libraries and Resources

Tech Stack

First things first: Machines are meant to be identical. Ansible provisions your local Vagrant box the same way it provisions a server. This way the production environment is the same as the development one, and we avoid hard to find bugs while being fairly certain that if something works in dev, it will work in prod.

Specifically, machines look like this:

  • The application is run inside a virtualenv, even if it's the only application in the server. This makes it easy to add other applications should the need arise, for instance, you might want to run an IPython Notebook server with a Notebook that provides some analytics and charts of the data in your database, without contaminating the app's environment with IPython's dependencies.

  • Nginx is used as a reverse proxy, sending requests from the Internet to the Django server and responses the other way around. Nginx can take care of load balancing, caching, HTTP acceleration and some degree of security.

  • Supervisor is used to keep the actual application server running, as well as running other scripts or processes. Every process is logged to disk for debugging.

  • Postgres is the database, of course.

Our tech stack looks roughly like this on most projects:

Stack

This is, of course, an approximation. Some projects use NoSQL databases in addition to relational ones, others use other things like message queues, some use specific tools like Varnish instead of Nginx for HTTP acceleration.

https://github.com/tryolabs/awesome-tryo