public handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)
|$PageInfo||PHPCrawlerDocumentInfo||A PHPCrawlerDocumentInfo-object containing all information about the currently received document.|
Please see the reference of the PHPCrawlerDocumentInfo-class for detailed information.
|int||The crawling-process will stop immedeatly if you let this method return any negative value.|
Everytime the crawler found and received a document on it's way this method will be called.
The crawler passes all information about the currently received page or file to this method
by a PHPCrawlerDocumentInfo-object.
Please see the PHPCrawlerDocumentInfo documentation for a list of all properties describing the
class MyCrawler extends PHPCrawler
// Print the URL of the document
echo "URL: ".$PageInfo->url."<br />";
// Print the http-status-code
echo "HTTP-statuscode: ".$PageInfo->http_status_code."<br />";
// Print the number of found links in this document
echo "Links found: ".count($PageInfo->links_found_url_descriptors)."<br />";