Dexi is the ultimate platform for web scraping, browser automation and a whole lot more! In this article we briefly explain how you can get started using the different parts of the platform.
To extract data from a website you create an Extractor robot.
First, you must identify the starting page you wish to extract data from. Your imagination is the limit!
As an example, extracting product information, the page is often either a “list” page, listing multiple items and serving as a starting page, or a specific “details” page, presenting the details of a single item.
Once you have identified the (starting) page, follow the steps below to create an Extractor robot:
Step 1: browse to app.dexi.io and sign up (or sign in, if you already have an account).
Step 2: click the activation link in the email you receive.
Step 3: in the dialog that appears, click the item that most closely matches what you want to use Dexi for.
Step 4: some options will take you through a tour showing you how to use Dexi. If you didn't choose one of these options, click the “Projects” link in the left-hand menu and click the green “New...” button and then click "Create new robot".
In the dialog that appears, ensure “Extractor” is selected, paste the URL into the “Url” text box and provide a name for the robot. Then click “Create new robot”.
The other robot types, “Crawler”, “Pipes” and “AutoBot”, are explained below this section.
if you instead would like us to build the robot for you, click “Build my robot” or visit our Robot Building page.
Tip! If you would like to see an example of an existing robot and perhaps do some experiments, click the “Create an example robot” button instead.
Step 5: the requested URL will now load in an editor that allows you to point and click to the elements you want the robot to interact with, e.g. extract data from.
Extractor robots provide a powerful way to interact with web pages. It is not only possible to extract data: you can do pretty much anything you can do in your browser, e.g. click buttons, select elements in lists and much more.
For details on how to build Extractor robots, see e.g.:
Or search our knowledge base for a specific topic which we continuously add information to.
For background information on a couple of web technologies (e.g. HTML) that you might need to build some Extractor robots, please see Web Data Extraction Basics.
Once your Extractor robot (or any other robot type) is working as intended in the editor, you must execute it to get actual results. Robots can be executed with different configurations - called runs in part of the platform - most importantly with multiple input values, effectively executing the robot multiple times, say, with different search values or dates.
To get the results of your robot, follow the steps below:
Step 1: on the Projects page, select the robot and click the “New run” button:
Step 2: click the “Open” button to open the configuration / run.
Step 3: change any settings you wish to change, e.g. setting a schedule, adding any integrations or, if the robot takes any inputs, add / import inputs to the configuration:
Step 4: execute the robot, or rather the configuration, by clicking the “Execute now” button (the latest saved version of the robot is executed). On the “Executions” tab, click “View” to view the execution. Depending on system load it can take from a few seconds to a few minutes for the execution to start.
Step 5: on the “Results” tab of the execution, results appear as they are extracted. When the execution completes, results can be downloaded in various formats (csv/xls/json/...), sent to and stored in a number of different places, e.g. Google Drive, Google Sheets, Amazon S3, your own custom webhooks or retrieved via the API.
The actual task of executing the robot is performed by what is called a worker. The number of workers on your account determines your capacity, ie. how much work can be done concurrently.
Robot configurations with multiple inputs can be set to use multiple workers for faster execution. This is controlled by the Concurrent executions on the “Configuration” tab.
For example, if your subscription includes three workers you are able to concurrently execute e.g.:
- Three robot configurations with one or no inputs.
- One robot configuration with multiple inputs and Concurrent executions set to 2, and configuration with one or no inputs.
That is, some combination that adds up to 3.
Visit our price plans to see how many workers are included in each price plan.
Where do I go next?
We understand that there can be a bit of a learning curve in learning to using the platform efficiently, e.g. that there are a number of new concepts to learn. The glossary provides a concise description of all concepts used in the dexi.io universe.
To explore more features of the platform, read on and follow the links below:
Pipes robots makes it possible to automate a process of data processing and transformation (ETL) performing arbitrarily complex business logic. For example, a Pipes robot could execute an Extractor robot, iterate its results, call an external web service for each result, do some custom formatting of the web service result and save the “enriched” results in an external SQL database or a Dexi data set (see below).
Crawler robots allow you to quickly collect a large number of URLs and other basic information from a website, e.g. identify product pages on a website and save the URL and page title for each page. For example, a Pipes robot could execute a Crawler that gathers product pages on a website and sends each URL to an Extractor that extracts the required information.
AutoBot robots allow you to normalise/standardise (the fields of) results extracted from a number of different websites, e.g. extract and save product id, name and description from three different web shops.
Data sets makes it possible to work with large amounts of data (even images and files!) similar to a NoSQL collection or SQL table. Advanced deduplication and record linkage can be performed using e.g. fuzzy matching.
Dictionaries maps keys to values and can be used e.g. to correct misspellings like “Galaxy” vs “Gallaxy”.
Addons add functionality to the platform in various ways. For example, integration addons allow you to send data to third-party services, e.g. Amazon S3, Box or Google Sheets. Other examples include CAPTCHA-solving services, Google Maps (geocoding) and machine learning/text analysis services. More addons are continuously implemented.
Triggers performs actions when events occur. For example, when an execution of a robot completes, results could be added to a data set. More events and actions are continuously implemented.
The API allows you to programatically talk to dexi.io, e.g. get the results of an execution or start an execution of a robot configuration.
“Just get me the data, please”
If you are not technically inclined or perhaps you don’t have the time to learn a new platform, we offer to build the robot for you. Simply tell us which information you want from which web page(s) and we build the necessary robot(s).
To request a robot build, please see our Robot Building page.
You can also log in to the platform and click the “Build my robot” button in the bottom left corner:
If you need any other help, please write us at firstname.lastname@example.org.
Thank you for reading and enjoy dexi.io!