Extractors: Page Navigation

How Extractors navigate from page to page

Morten Franck avatar
Written by Morten Franck
Updated over a week ago

This article explains how Extractor robots handle page navigation.

Extractors: Lists & Loops shows an example of this by navigating back and forth between a product "listing" page and each product "details" page:

This article briefly explains how such page navigation is handled in general.

To learn the basics of how to build Extractor robots, please see Extractors.

Page State

When a step in an Extractor robot navigates to a new URL, e.g. via a "Go to URL" step or a "Click element" step that clicks a link, it does so in a way that corresponds to a human opening the page in a new browser tab (or browser window).

In the example of looping over product "details" pages on a "listing" page, every time the robot clicks a "details" link, the details page is opened in a new "tab". After every iteration of the loop that "tab" is then closed.

Keeping track of what page should be navigated to next is called "page state" and is automatically handled by Extractors.

Examples

Almost any interaction with a web page can cause a page navigation (as defined by the developer of the page).

However, some typical examples of what can cause a page navigation are listed below, specifying in parenthesis which step type to use.

  • Visiting a new URL (Go to URL)

  • Clicking a link (Click element)

  • Submitting a form (Click element)

  • Navigating a paginated page (Page iteration)

  • Explicitly changing the URL via JavaScript (Execute JavaScript, e.g. location.href=<url>)

Single Page Applications

Some websites are built as "single page apps" (if built using frameworks like React or AngularJS), ie. from the browser's perspective, when new content is loaded, e.g. loading a product details page from the listing page, it is not considered a page navigation.

This is because the loading of the new content is handled via JavaScript code: the URL in the browser remains the same but the logic in the code "swaps out" the list "page" for the details "page".

It is not possible to provide a general answer on how to handle single page applications but, in the "product listing-details" example, in each iteration, before going back to the "listing" page, the final step should somehow reset the page state. This final step could e.g. click a "Back" button on the page or be a "History -> Back" step, corresponding to clicking "Back" in your browser.

The same resetting of state should be done when using branches in Extractors that interact with single page applications. Read more about this in the "Page state" section of What should I know about the extractor editor?.

Tip! A few websites that implement "normal" page navigation (links opened in new tabs), require the Extractor robot to treat the website as if it were a single page app (due to some quirk in the website code). This behaviour can be achieved by enabling the "Force single page?" option on the Settings tab:

Did this answer your question?