Documentation for method: PHPCrawler::handleDocumentInfo()

Method: PHPCrawler::handleDocumentInfo()



Override this method to get access to all information about a page or file the crawler found and received.
Signature:

public handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)

Parameters:

$PageInfo PHPCrawlerDocumentInfo A PHPCrawlerDocumentInfo-object containing all information about the currently received document.
Please see the reference of the PHPCrawlerDocumentInfo-class for detailed information.

Returns:

int  The crawling-process will stop immedeatly if you let this method return any negative value.

Description:

Everytime the crawler found and received a document on it's way this method will be called.
The crawler passes all information about the currently received page or file to this method
by a PHPCrawlerDocumentInfo-object.

Please see the PHPCrawlerDocumentInfo documentation for a list of all properties describing the
html-document.

Example:class MyCrawler extends PHPCrawler
{
  function handleDocumentInfo($PageInfo)
  {
    // Print the URL of the document
    echo "URL: ".$PageInfo->url."<br />";

    // Print the http-status-code
    echo "HTTP-statuscode: ".$PageInfo->http_status_code."<br />";

    // Print the number of found links in this document
    echo "Links found: ".count($PageInfo->links_found_url_descriptors)."<br />";
   
    // ..
  }
}