What should I know about site navigation?

dexi.io robots can navigate a Web site in virtually any way a human can. This document discusses a number of Web site navigation elements you may encounter on your travels throughout the Internet and how to address them in our scraper editor.

Links and buttons

To instruct the scraper to follow a link:

  1. Click the link or button element to select it.
  2. In the Element Panel, click Click Element.

Navigation menus

To instruct the scraper to follow a single navigation menu item, follow the directions found above, under Links and buttons.

To instruct the scraper to iterate through all menu items:

  1. Select menu items.
  2. In the Element Panel, click Loop Through Elements.
  3. Click the Step Forward button twice to adjust timeline position to just after the Loop Through Elements step.
  4. Click the first menu item to select it.
  5. In the Element Panel, click Click Element.
  6. Follow this step with any further navigation or extraction steps to be performed after clicking each menu item.

The robot will perform all configured steps for each menu item before continuing with steps following the Loop Through Elements loop.

Authenticating with username and password

Make sure you have input fields and testing values configured for the username and password under the editor's Inputs tab. See What should I know about input and output?

To log in to a site protected by a username/password security model:

  1. Click the username field to select it.
  2. In the Element Panel, click Input.
  3. In the Step Panel, choose the input field from which to retrieve the username.
  4. Make any other required configuration changes.
  5. Click the password field to select it.
  6. In the Element Panel, click Input.
  7. In the Step Panel, choose the input field from which to retrieve the password.
  8. Make any other configuration changes as needed.
  9. Click the "submit credentials" button to select it.
  10. In the Element Panel, click Click Element.

Captcha

To address a Captcha or similar anti-robot feature:

  1. Click the Captcha image element to select it.
  2. In the Element Panel, click Resolve Captcha. If selecting a Captcha element doesn't reveal the Resolve Captcha option, temporarily choose another option, such as an Extract option. You can then edit the step to select Resolve Captcha from the Step Type menu and enter the required configuration settings.
  3. Click the response input field to select it.
  4. In the Element Panel, click Input and complete the configuration.
  5. Click the "submit response" button to select it.
  6. In the Element Panel, click Click Element.

How the extractor navigates

Whenever extractor robots navigate to a new url they do so by opening up a "tab" - once the iteration for a given page is done it automatically jumps back - closing down the newly opened tabs and shifting back to the existing tab. Often this will be a loop through a list of links - where each click on the links will open a new tab and perform the actions you've described - and once it iterates to the next link in the list it simple closes the new tabs and shifts back to the existing tab containing the list - causing no additional loading or waiting time.
We also refer to this as "Page state" which you can read more about here

Did this answer your question?