Method: PHPCrawler::obeyRobotsTxt()

Defines whether the crawler should parse and obey robots.txt-files.

public obeyRobotsTxt($mode, $robots_txt_uri = null)


$mode bool Set to TRUE if you want the crawler to obey robots.txt-files.
$robots_txt_uri string Optionally. The URL or path to the robots.txt-file to obey as URI (like ""
or "file://../a_robots_file.txt")
If not set (or set to null), the crawler uses the default robots.txt-location of the root-URL ("")




If this is set to TRUE, the crawler looks for a robots.txt-file for the root-URL of the crawling-process at the default location
and - if present - parses it and obeys all containig directives appliying to the
useragent-identification of the cralwer ("PHPCrawl" by default or manually set by calling setUserAgentString())

The default-value is FALSE (for compatibility reasons).

Pleas note that the directives found in a robots.txt-file have a higher priority than other settings made by the user.
If e.g. addFollowMatch("#http://foo\.com/path/file\.html#") was set, but a directive in the robots.txt-file of the host says "Disallow: /path/", the URL will be ignored by the crawler anyway.