Documentation for method: PHPCrawler::handleDocumentInfo()

Override this method to get access to all information about a page or file the crawler found and received.

Signature:

public handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)

Parameters:

$PageInfo PHPCrawlerDocumentInfo A PHPCrawlerDocumentInfo-object containing all information about the currently received document.
Please see the reference of the PHPCrawlerDocumentInfo-class for detailed information.

Returns:

int

The crawling-process will stop immedeatly if you let this method return any negative value.

Description:

Everytime the crawler found and received a document on it's way this method will be called.
The crawler passes all information about the currently received page or file to this method
by a PHPCrawlerDocumentInfo-object.

Please see the PHPCrawlerDocumentInfo documentation for a list of all properties describing the
html-document.

Example:class MyCrawler extends PHPCrawler { function handleDocumentInfo($PageInfo) { // Print the URL of the document echo "URL: ".$PageInfo->url."<br />"; // Print the http-status-code echo "HTTP-statuscode: ".$PageInfo->http_status_code."<br />"; // Print the number of found links in this document echo "Links found: ".count($PageInfo->links_found_url_descriptors)."<br />"; // .. } }

Method: PHPCrawler::handleDocumentInfo()

<< Back to class-overview