排名前50的开源Web爬虫用于数据挖掘
b573
10年前
有各种用途的网络爬虫,但本质上是一个网络爬虫是用来从互联网收集挖掘数据。大多数搜索引擎使用它作为提供了最新数据的方法,并用于查找互联网上有什么新的内容。 在这篇文章中,介绍前50个开源的Web爬虫可在网上进行数据挖掘。
项目名 | 开发语言 | 平台 |
Heritrix | Java | Linux |
Nutch | Java | Cross-platform |
Scrapy | Python | Cross-platform |
DataparkSearch | C++ | Cross-platform |
GNU Wget | C | Linux |
GRUB | C#, C, Python, Perl | Cross-platform |
ht://Dig | C++ | Unix |
HTTrack | C/C++ | Cross-platform |
ICDL Crawler | C++ | Cross-platform |
mnoGoSearch | C | Windows |
Norconex HTTP Collector | Java | Cross-platform |
Open Source Server | C/C++, Java PHP | Cross-platform |
PHP-Crawler | PHP | Cross-platform |
YaCy | Java | Cross-platform |
WebSPHINX | Java | Cross-platform |
WebLech | Java | Cross-platform |
Arale | Java | Cross-platform |
JSpider | Java | Cross-platform |
HyperSpider | Java | Cross-platform |
Arachnid | Java | Cross-platform |
Spindle | Java | Cross-platform |
Spider | Java | Cross-platform |
LARM | Java | Cross-platform |
Metis | Java | Cross-platform |
SimpleSpider | Java | Cross-platform |
Grunk | Java | Cross-platform |
CAPEK | Java | Cross-platform |
Aperture | Java | Cross-platform |
Smart and Simple Web Crawler | Java | Cross-platform |
Web Harvest | Java | Cross-platform |
Aspseek | C++ | Linux |
Bixo | Java | Cross-platform |
crawler4j | Java | Cross-platform |
Ebot | Erland | Linux |
Hounder | Java | Cross-platform |
Hyper Estraier | C/C++ | Cross-platform |
OpenWebSpider | C#, PHP | Cross-platform |
Pavuk | C | Lunix |
Sphider | PHP | Cross-platform |
Xapian | C++ | Cross-platform |
Arachnode.net | C# | Windows |
Crawwwler | C++ | Java |
Distributed Web Crawler | C, Java, Python | Cross-platform |
iCrawler | Java | Cross-platform |
pycreep | Java | Cross-platform |
Opese | C++ | Linux |
Andjing | Java | |
Ccrawler | C# | Windows |
WebEater | Java | Cross-platform |
JoBo | Java | Cross-platform |