How to build an Extractor Robot

Extractor robots are, as the name implies, designed for extracting data from web pages. They can also be used to automate almost any other interaction with the page you do in your browser: click buttons, input values, select and hover elements and much more.

If you don’t already have an account, see Getting Started for a step-by-step guide to create one and how to create an empty (or a demo) Extractor robot.

The rest of this article assumes that you have followed these steps and are now looking at the Extractor robot editor, looking something like this, depending on the page you are extracting data from, of course:

The editor window is divided into an top-half preview area and a bottom-half toolbox area (similar to what you might be used to in e.g. the Firefox or Chrome developer tools).

The preview area displays the page like your browser and when you hover the mouse over individual elements on the page, an overlay is shown. Once clicked, a right-hand menu is presented which presents the most-often used actions for elements of that type:

The Toolbox Area

The toolbox area provides tools for building and working with the Extractor robot:

Steps: the steps, or actions, of the robot.
Elements: the HTML structure of the page (the DOM).
Network: the network requests made when loading the page.
Inputs: the input fields, if any, used by the robot.
Outputs: the output fields extracted by the robot.
Results: the extracted values (based on the test input values).
Settings: various settings of the robot, e.g. its name.
Versions: a list of the most recent versions of the robot and the ability to restore them.
Console: any console output from the page.

Generally, for each element you wish to interact with, repeat the steps below:

In the preview area, point to and click the element, e.g. an image.
In the right-hand context menu, select the action you wish to perform with the element, e.g. extract the value of the element.

Each time a step is added to the robot this is reflected in the Steps tab.

It is also possible to add a step before or after an existing step by clicking the green “+” symbols shown when hovering a step:

The “+” symbol below a step adds the new step as a branch. See What should I know about the extractor editor? for information about branches.

The Steps tab is also where you navigate back and forward through the robot - or play all steps:

Read on for more details.

Step Types

If the action you wish to perform is not available in the right-hand context menu, click “Add step for element”. This will bring up the “step type chooser” dialog, where all available step types are presented, including examples of how to use them.

Input & Output

An Extractor takes input values when it needs to search for something, fill in a form, enter a date or do something else requiring external input.

On the Inputs tab in the editor you define the fields that make up the input as well as the test values that should be used while building the robot.

Input fields can be added individually or via a data type.

Outputs work exactly the same way as inputs - except of course that they represent what comes out of the robot.

Executing the Robot: Getting Results

When you build your Extractor robot in the editor you are providing test values which allow you to see an example of what the web page looks like.

For example, on a web page that allows you to search for a product, you provide an example product name and you build the robot around what that page looks like. The underlying assumption is that the other product pages are structured the same way, ie. presenting the information with the same types of elements.

When the Extractor is working as intended in the editor, to get actual results of your robot, you must configure it by creating what is called a configuration - or run in parts of the platform. Most importantly you can add input values to a configuration, e.g. multiple search values from the example above.

Read about how to create a configuration and execute it in Getting Started, section "Getting Results".

Read on below to learn how to execute a configuration with multiple input values.

Inputs & Results

To add inputs to a configuration, click the “Inputs” tab:

Input rows can be added to a configuration/run in three ways:

Manually adding them in the UI by clicking “Click here to add one now”.
Importing a CSV file.
Via the API.

When you have added the inputs, you can start the execution the usual way: click “Execute now”. Then, on the “Executions” tab, click “View”.

The “Results” tab of the execution will now hold one result per input:

To view the result rows of an input, click anywhere on the row. This opens a new tab named after the input value, e.g.:

An execution of a configuration without inputs, or with just one input, has just one result and the result row is shown directly on the “Results” tab.

In case the execution fails, please see How can I debug a failed execution?.

More on Extractors

To learn more about branches and how Extractors navigate pages, see What should I know about the extractor editor? and What should I know about site navigation?

For info about how to handle lists, see Extractors: Lists & Loops.

For other specific topics, please use the search function on support.dexi.io.