Project Documentation / Classreference

Members:

Public Properties
URL-related information
file The name of the requested page or file, e.g. "page.html".
host The host-part of the URL of the requested page or file, e.g. "www.foo.com".
path The path in the URL of the requested page or file, e.g. "/page/".
port The port of the URL the request was send to, e.g. 80
protocol The protocol-part of the URL of the page or file, e.g. "http://"
query The query-part of the URL of the requested page or file, e.g. "?x=y".
url The complete, full qualified URL of the page or file, e.g. "http://www.foo.com/bar/page.html?x=y".
url_link_depth The linking-depth of the URL related to the entry-URL of the crawling-process.
Content-related information
bytes_received The number of bytes the crawler received of the content of the document.
content The content of the requested document (html-sourcecode or content of file).
content_tmp_file The temporary file to which the content was received.
content_type The content-type of the page or file, e.g. "text/html" or "image/gif".
cookies Cookies send by the server.
header The complete HTTP-header the webserver responded with this page or file.
header_bytes_received The number of bytes the crawler received of the header of the document.
http_status_code The HTTP-statuscode the webserver responded for the request, e.g. 200 (OK) or 404 (file not found).
meta_attributes All meta-tag atteributes found in the source of the document.
received Flag indicating whether content was received from the page or file.
received_completely Flag indicating whether content was completely received from the page or file.
received_to_file Will be true if the content was received into temporary file.
received_to_memory Will be true if the content was received into local memory.
responseHeader The complete HTTP-header the webserver responded with this page or file as a PHPCrawlerResponseHeader-object.
source Same as "content", the content of the requested document.
Information about found links
links_found An numeric array containing information about all links that were found in the source of the page.
links_found_url_descriptors An numeric array containing a PHPCrawlerURLDescriptor-object for every link that was found in the page.
Referer information
referer_url The complete URL of the page that contained the link to this document.
refering_link_raw Contains the raw link as it was found in the content of the refering URL. (E.g. "../foo.html")
refering_linkcode The html-sourcecode that contained the link to the current document.
refering_linktext The linktext of the link that "linked" to this document.
Error-handling
error_code The code of the error that perhaps occured while requesting/receiving the document. (See PHPCrawlerRequestErrors::ERROR_... - constants)
error_occured Indicates whether an error occured while requesting/receiving the document.
error_string A representig, human readable string for the error that perhaps occured while requesting/receiving the document.
Benchmarks
data_transfer_rate The approximated data-transferrate for this document.
data_transfer_time The approximated time it took to receive the data of the document.
server_connect_time The time it took to connect to the server
server_response_time The server response time
unbuffered_bytes_read Number of unbuffered bytes received
Deprecated
received_completly Alias for received_completely, was spelled wrong in prevoius versions of phpcrawl. (deprecated!)
Other
header_send The complete HTTP-request-header the crawler sent to the server (debugging info).
traffic_limit_reached Indicated whether the traffic-limit set by the user was reached after downloading this document.

Author:	-	Version:	-
Package:	phpcrawl	Category:	-

Class: PHPCrawlerDocumentInfo

Author: - Version: -
Package: phpcrawl Category: -

Public Properties
URL-related information
file		The name of the requested page or file, e.g. "page.html".
host		The host-part of the URL of the requested page or file, e.g. "www.foo.com".
path		The path in the URL of the requested page or file, e.g. "/page/".
port		The port of the URL the request was send to, e.g. 80
protocol		The protocol-part of the URL of the page or file, e.g. "http://"
query		The query-part of the URL of the requested page or file, e.g. "?x=y".
url		The complete, full qualified URL of the page or file, e.g. "http://www.foo.com/bar/page.html?x=y".
url_link_depth		The linking-depth of the URL related to the entry-URL of the crawling-process.
Content-related information
bytes_received		The number of bytes the crawler received of the content of the document.
content		The content of the requested document (html-sourcecode or content of file).
content_tmp_file		The temporary file to which the content was received.
content_type		The content-type of the page or file, e.g. "text/html" or "image/gif".
cookies		Cookies send by the server.
header		The complete HTTP-header the webserver responded with this page or file.
header_bytes_received		The number of bytes the crawler received of the header of the document.
http_status_code		The HTTP-statuscode the webserver responded for the request, e.g. 200 (OK) or 404 (file not found).
meta_attributes		All meta-tag atteributes found in the source of the document.
received		Flag indicating whether content was received from the page or file.
received_completely		Flag indicating whether content was completely received from the page or file.
received_to_file		Will be true if the content was received into temporary file.
received_to_memory		Will be true if the content was received into local memory.
responseHeader		The complete HTTP-header the webserver responded with this page or file as a PHPCrawlerResponseHeader-object.
source		Same as "content", the content of the requested document.
Information about found links
links_found		An numeric array containing information about all links that were found in the source of the page.
links_found_url_descriptors		An numeric array containing a PHPCrawlerURLDescriptor-object for every link that was found in the page.
Referer information
referer_url		The complete URL of the page that contained the link to this document.
refering_link_raw		Contains the raw link as it was found in the content of the refering URL. (E.g. "../foo.html")
refering_linkcode		The html-sourcecode that contained the link to the current document.
refering_linktext		The linktext of the link that "linked" to this document.
Error-handling
error_code		The code of the error that perhaps occured while requesting/receiving the document. (See PHPCrawlerRequestErrors::ERROR_... - constants)
error_occured		Indicates whether an error occured while requesting/receiving the document.
error_string		A representig, human readable string for the error that perhaps occured while requesting/receiving the document.
Benchmarks
data_transfer_rate		The approximated data-transferrate for this document.
data_transfer_time		The approximated time it took to receive the data of the document.
server_connect_time		The time it took to connect to the server
server_response_time		The server response time
unbuffered_bytes_read		Number of unbuffered bytes received
Deprecated
received_completly		Alias for received_completely, was spelled wrong in prevoius versions of phpcrawl. *(deprecated!)*
Other
header_send		The complete HTTP-request-header the crawler sent to the server (debugging info).
traffic_limit_reached		Indicated whether the traffic-limit set by the user was reached after downloading this document.

Class: PHPCrawlerDocumentInfo

Author:-Version:-Package:phpcrawlCategory:-

Author: - Version: -
Package: phpcrawl Category: -