public excludeLinkSearchDocumentSections($document_sections)
$document_sections | int | Bitwise combination of the PHPCrawlerLinkSearchDocumentSections-constants. |
No information |
By default, phpcrawl is searching for links in the entire documents it receives during the crawling-process.
This sometimes brings up some non existing "phantom-URLs" because the crawler recognized i.e. some javascript-code
as a link that was not meant to be, or the crawler found a link inside an html-comment that doesn't exist anymore.
By using this method, users can define what predefined sections of HTML-documents should get ignored when it comes
to finding links.
See PHPCrawlerLinkSearchDocumentSections-constants for all predefined sections.
Example 1:// Let the crawler ignore script-sections and html-comment-sections when finding links
$crawler->excludeLinkSearchDocumentSections(PHPCrawlerLinkSearchDocumentSections::SCRIPT_SECTIONS |
PHPCrawlerLinkSearchDocumentSections::HTML_COMMENT_SECTIONS);
Example 2:// Let the crawler ignore all special sections except HTML-comments
$crawler->excludeLinkSearchDocumentSections(PHPCrawlerLinkSearchDocumentSections::ALL_SPECIAL_SECTIONS ^
PHPCrawlerLinkSearchDocumentSections::HTML_COMMENT_SECTIONS);