How can I crawl a Web site directory?
- From the Dashboard or Projects page, click the Create New Robot button.
- Select Crawler and enter the required information.
Under the Settings tab, make any desired configuration changes.
- If you wish to use input data to provide the crawler with URLs to visit, activate the Dynamic URL? checkbox.
- To follow the rules of any robots.txt files on the site (recommended), activate the Respect robots.txt checkbox.
Under the Output tab, create any output fields your project requires. These fields will store the output data generated by the crawler.
Under the Page Processors tab, configure any page processors required by your project. See What should I know about page processors? for details.
When all necessary page processors are configured, click the blue Save button in the top-right of the page to save the crawler.
On the Projects page, select the crawler and click the Create Run button near the top-right of the page.
Enter a name for the run.
Select the new run and click Open in the slide-in panel.
Under the Configuration tab, change settings as needed.
Under the Integrations tab, configure any required integrations.
Under the Executions tab, you may launch the execution when ready, or view existing execution information.