All Collections
About Robots
About Extractors
What should I know about elements, paths, and scopes?
What should I know about elements, paths, and scopes?
Henrik Hofmeister avatar
Written by Henrik Hofmeister
Updated over a week ago

What is an element?

A Web page element is any of the many building blocks making up a Web page. An element can be a snippet of text, an image, a link, a video... If it's on the page, it's an element.

What is an element path?

An element path is a CSS3 selector. Because CSS selectors have certain limits in that they do not support selecting parents or match on text, we have added a few pseudo-selectors to make it easy to match any element you might want.

Dexi-specific selectors:

:visible
Selects only visible elements. Visibility is determined by whether an element has both width and height greater than 0, like jQuery.

:eq($index)
Selects a specific element in group of matched elements, like jQuery.

:first / :last
Selects the first/last element in a group of matched elements.

:self
Selects the element itself. Useful when matching current scope in a loop/iteration.

:prev / :next
Selects the previous or next element for every matched element.

:parent
Selects the parent of every matched element.

:closest($selector)
Selects the closest parent element that matches $selector. Matches only content, not HTML tags or attributes.

:text_contains("$text")
Selects elements containing $text. Matches only content, not HTML tags or attributes.

:text_matches("$regex")
Selects elements matching the regular expression $regex. Matches only content, not HTML tags or attributes.

:text_is("$regex")
Selects elements matching $text exactly. Matches only content, not HTML tags or attributes.

:text_is($in['$text'])
Selects elements exactly matching an input value. Matches only content, not HTML tags or attributes.

:text_start("$text")
Selects elements beginning with $text. Matches only content, not HTML tags or attributes.

:text_end("$text")
Selects elements ending with $text. Matches only content, not HTML tags or attributes.

:text_bestmatch("$text")
Selects elements most closely matching $text. Matches only content, not HTML tags or attributes.

Using inputs and outputs

You can include your input and output variables in your dexi.io CSS3 selectors:

$in['input field'] / $out['output field']
Includes the input or output variable without any added quotes.

Example:

@in['input field'] / @out['output field']
Includes the input or output variable with added quotes.

Example:

Further Reading

W3Schools introduction to CSS selectors:
http://www.w3schools.com/cssref/css_selectors.asp 

What is DOM scope?

When iterating though elements, the robot uses something we call scope or DOM (Document Object Model) scope. Each iteration of a loop will create a scope for the current element in the iteration list, and any subsequent path selectors will be relative to this scope. See the illustration below for a visual representation of how DOM scope works.

Scopes are automatically cleared when changing to a new page, but you can manually clear a scope using the Clear Scope step. Reference elements outside the DOM scope using pseudo-selectors like :parent or :closest(.something) that travel up the DOM tree.

Custom DOM Scope

You may also define your own DOM scopes, and you can have several scopes within scopes. This can be useful if you have some snippet that expects a certain path, and that path can be found in multiple places. Then you can set a container for the data you want to extract and use the same snippet in multiple places.

Did this answer your question?