Documentation for method: PHPCrawler::goMultiProcessed()

Starts the cralwer by using multi processes.

Signature:

public goMultiProcessed($process_count = 3, $multiprocess_mode = 1)

Parameters:

$process_count	int	Number of processes to use
$multiprocess_mode	int	The multiprocess-mode to use. One of the PHPCrawlerMultiProcessModes-constants

Returns:

No information

Description:

When using this method instead of the go()-method to start the crawler, phpcrawl will use the given
number of processes simultaneously for spidering the target-url.
Using multi processes will speed up the crawling-progress dramatically in most cases.

There are some requirements though to successfully run the cralwler in multi-process mode:

The multi-process mode only works on unix-based systems (linux)

Scripts using the crawler have to be run from the commandline (cli)

The PCNTL-extension for php (process control) has to be installed and activated.

The SEMAPHORE-extension for php has to be installed and activated.

The POSIX-extension for php has to be installed and activated.

The PDO-extension together with the SQLite-driver (PDO_SQLITE) has to be installed and activated.

PHPCrawls supports two different modes of multiprocessing:

PHPCrawlerMultiProcessModes::MPMODE_PARENT_EXECUTES_USERCODE

The cralwer uses multi processes simultaneously for spidering the target URL, but the usercode provided to
the overridable function handleDocumentInfo() gets always executed on the same main-process. This
means that the usercode never gets executed simultaneously and so you dont't have to care about
concurrent file/database/handle-accesses or smimilar things.
But on the other side the usercode may slow down the crawling-procedure because every child-process has to
wait until the usercode got executed on the main-process. This ist the recommended multiprocess-mode!

PHPCrawlerMultiProcessModes::MPMODE_CHILDS_EXECUTES_USERCODE

The cralwer uses multi processes simultaneously for spidering the target URL, and every chld-process executes
the usercode provided to the overridable function handleDocumentInfo() directly from it's process. This
means that the usercode gets executed simultaneously by the different child-processes and you should
take care of concurrent file/data/handle-accesses proberbly (if used).

When using this mode and you use any handles like database-connections or filestreams in your extended
crawler-class, you should open them within the overridden mehtod initChildProcess() instead of opening
them from the constructor. For more details see the documentation of the initChildProcess()-method.

Example for starting the crawler with 5 processes using the recommended MPMODE_PARENT_EXECUTES_USERCODE-mode:$crawler->goMultiProcessed(5, PHPCrawlerMultiProcessModes::MPMODE_PARENT_EXECUTES_USERCODE);

Please note that increasing the number of processes to high values does't automatically mean that the crawling-process
will go off faster! Using 3 to 5 processes should be good values to start from.

Method: PHPCrawler::goMultiProcessed()

<< Back to class-overview