What should I know about the extractor editor?
The extractor editor is used to create, update, and test scraping robots. It consists of four main parts.
The preview area
The preview area is where you can see the pages that you are scraping. To interact with an element you simply click it and choose what you want to do from our context-sensitive suggestions menu.
The suggestion box
If the action you want to perform is not available, choose Create step for element. This will create a simple step that asserts that the element exists; you can now edit this element and change it to whatever you want.
The timeline area contains the complete robot configuration. Each step the robot will take is represented in the timeline under the Steps tab. Additional tabs allow input and output definition, handle network requests, list the robot's results, and more.
The menu allows step editing and manipulation.
What are steps?
Steps are the actions a scraper robot performs upon visiting a Web page, and simulate real-world human interaction with the page. For instance, when visiting a Web site, you would enter the URL in your browser's address bar (1). When the first page loads, you notice an interesting product, so you click the product image (2) to navigate to the product details page to see the price (3). The equivalent robot would consist of the following steps.
- Go to URL
- Click element
- Extract value
Many step types are available, and we are continuously extending our already capable collection. If you ever find something you feel is missing, we would love to hear your ideas.
How do I group steps?
You can group steps in the scraper editor to help keep track of which steps are operating on the same page or similar logical grouping. Grouping does not have any functional impact at the moment. You can group steps by either dragging a selection box over a set of steps or hold the Shift key while selecting multiple steps. You can only group steps that are fully connected.
How is the robot executed?
The robot execution flow is made of up interconnected steps, and the system will execute these steps from beginning to end, top to bottom. See the illustration below for a visual representation of how steps and branches are executed.
First successful Branch
In addition to normal branching as you see above - you have the option of changing a step to only run the first successful branch - and skip any additional branches that may be.
See the illustration below for a visual representation of how first-branch flows are executed.
The last thing that affects how a robot is executed is repeating steps. These steps repeat while some condition is satisfied, and executes all steps after during each iteration. Examples of repeating steps are Loop Through Elements, Iterate Pages, and Do - While. Repeating steps have the option within the editor to skip to previous or next iterations by clicking the up and down arrow on the steps.
See the illustration below for a visual representation of how flows containing repetable steps are executed.
A general behavior of all robot execution that jumps back to a previous step (branches or repeatable steps) is that they automatically change back to the page that the robot was on at that moment in time. This means that you do not have to navigate back before your branch ends; the robot will automatically do that for you.
An exception to that rule is if the entire page is a single page app, in which case any subsequent pages after a branched or repeatable step will have modified the original page. In these cases you will have to tell the robot how to reset back to the previous state at the end of each branch.
What is a snippet?
Snippets are groups of steps that you can reuse in multiple robots or multiple times in the same robot. If you change a snippet in one robot, all robots using that snippet will receive the same change once you save the robot.
To create a snippet - choose a group of fields by dragging a box over them or holding down shift while selecting multiple. Then select Create snippet from the context menu and give the snippet a name.
To use your snippet elsewhere, select any step and choose Add snippet before, Add snippet after, or Add snippet branch.
All steps in a snippet are required to be fully connected and you can't remove the first or last steps in a snippet. For this reason, it's often a good idea to add a Do nothing step at either end of a snippet to make it easier to add steps to the beginning or end.