public handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)
$PageInfo | PHPCrawlerDocumentInfo | A PHPCrawlerDocumentInfo-object containing all information about the currently received document. Please see the reference of the PHPCrawlerDocumentInfo-class for detailed information. |
int | The crawling-process will stop immedeatly if you let this method return any negative value. |
Everytime the crawler found and received a document on it's way this method will be called.
The crawler passes all information about the currently received page or file to this method
by a PHPCrawlerDocumentInfo-object.
Please see the PHPCrawlerDocumentInfo documentation for a list of all properties describing the
html-document.
Example:class MyCrawler extends PHPCrawler
{
function handleDocumentInfo($PageInfo)
{
// Print the URL of the document
echo "URL: ".$PageInfo->url."<br />";
// Print the http-status-code
echo "HTTP-statuscode: ".$PageInfo->http_status_code."<br />";
// Print the number of found links in this document
echo "Links found: ".count($PageInfo->links_found_url_descriptors)."<br />";
// ..
}
}