PHP 爬虫库:Goutte
jopen
11年前
Goutte 是一个抓取网站数据的 PHP 库。它提供了一个优雅的 API,这使得从远程页面上选择特定元素变得简单。
Require the Goutte phar file to use Goutte in a script:
require_once '/path/to/goutte.phar';
Create a Goutte Client instance (which extends SymfonyComponentBrowserKitClient):
use Goutte\Client; $client = new Client();
Make requests with the request() method:
$crawler = $client->request('GET', 'http://www.symfony-project.org/');
The method returns a Crawler object (SymfonyComponentDomCrawlerCrawler).
点击链接:
$link = $crawler->selectLink('Plugins')->link(); $crawler = $client->click($link);
提交表单:
$form = $crawler->selectButton('sign in')->form(); $crawler = $client->submit($form, array('signin[username]' => 'fabien', 'signin[password]' => 'xxxxxx'));抽取数据:
$nodes = $crawler->filter('.error_list'); if ($nodes->count()) { die(sprintf("Authentication error: %s\n", $nodes->text())); } printf("Nb tasks: %d\n", $crawler->filter('#nb_tasks')->text());