How to build an extractor

The extractor is our primary robot type, and is capable of extracting data from any Web site viewable in a Web browser. Extractors are very specific in what they extract, and are designed to fail if the site fails to produce expected data. You should use an extractor when you have complex navigation requirements, require pin-point accuracy when targeting data for extraction, need to fill out forms or log in to a site, or simply need JavaScript support.

To build a new extractor:

  1. On the Projects page, click the green New button in the top-right corner of the page.
  2. Click New Robot.
  3. Select Extractor.
  4. Enter the URL at which the robot will begin its work.
  5. Enter a descriptive name for the extractor.
  6. Click Create New Robot.

Upon entering the extractor editor, take a moment to familiarize yourself with the environment. The top of the page is dedicated to displaying the Web site your robot will scrape, while the bottom of the page features the editor interface, including the timeline. The timeline provides a visualization of the steps your robot will take as it navigates, interacts with, and extracts information from the Web site. Also notice that the first and final steps to be taken by the robot are already in place: Go to URL and Save Current Output.

At this point, you are free to add, edit, and remove any steps you like in order to achieve the results you want from the extractor. See What step types are available? to learn about the many actions your robot can take.

  • After completing a first draft of your robot, test it in the robot editor using the playback controls.
  • View the results of this test by clicking the Results tab above the timeline.
  • View log messages generated by the extractor by clicking the Log tab.
  • Use the information gathered from the Results and Log tabs to troubleshoot any unexpected results or errors. See How can I troubleshoot a robot? and How can I debug a failed execution? for guidance.

When satisfied with your robot, click the Save button, then click the Close button to return to the Projects page. For next steps, see How can I create a run?

Did this answer your question?