PHP-spider
PHP 网络爬虫
共0Star
详细介绍
一个可扩展的PHP WEB 蜘蛛,示例代码:
use VDB\Spider\Spider;
use VDB\Spider\Discoverer\XPathExpressionDiscoverer;
$spider = new Spider('http://www.oschina.net');
特性:
-
supports two traversal algorithms: breadth-first and depth-first
-
supports depth limiting and queue size limiting
-
supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
-
comes with a useful set of URI filters, such as Domain limiting
-
supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
-
supports custom request handling logic
-
comes with a useful set of persistence handlers (memory, file. Redis soon to follow)
-
supports custom persistence handlers
-
collects statistics about the crawl for reporting
-
dispatches useful events, allowing developers to add even more custom behavior
-
supports a politeness policy
-
will soon come with many default discoverers: RSS, Atom, RDF, etc.
-
will soon support multiple queueing mechanisms (file, memcache, redis)
-
will eventually support distributed spidering with a central queue
-
0 Star
-
0 Star
-
1193 Star
-
0 Star
-
0 Star
-
756 Star
-
0 Star