Dictionary matching is used to compare an incoming data set with an existing dictionary, and has many uses, including competitive pricing comparison. This document discusses how to use dexi.io’s Pipes robot to monitor competitor pricing using this technique.
To get started, first create the data types, data sets, and dictionaries your Pipes will require. Begin at the dexi.io Getting Started page. Follow each step to create a new instance of the demo robot named demoExtractor and a run named demoRun. (For further information on these topics, see What should I know about robots, runs, and executions?) You don’t have to execute the extractor at this time. This will be done later by the first Pipe you create.
1. Create three new data types
Note that the values of any columns marked as keys will be updated in place rather than new values being added to the data set.
Create a new data type named demoDataType using the extractor’s output fields.
Create a new data type named demoDictionaryDataType. Add a row named Product and mark it as a key. Add a row named Price - DKK.
Create a new data type named demoDictionaryResult. Add a row named Product and mark it as a key. Add a row named Price - DKK. Add a row named Price - Ours.
2. Create a new dictionary
Create a new dictionary named demoDictionary. From the Key Data Type drop-down menu, select Text. (You may also select Regular Expression here to allow use of a regular expression in the Key Values field.) In the Key Values field, enter Printed Chiffon Dress. Activate the Verified checkbox. This status is an important factor in dictionary matching, and the topic will come up again later, so keep it in mind.
3. Create two new data sets
Create a new data set named demoExtractedDataSet with data type demoDataType. This data set will store the final results of this exercise.
Create a new data set named demoDictionaryResult with data type demoDictionaryResult.
Next, begin designing the Pipes required.
1. Create a new Pipe
Create a new Pipes robot named demoPipes.
Add an Execute Robot node, found under the dexi.io category. From the Max Concurrency drop-down menu, choose an appropriate number of workers to devote to this Pipe. From the Robot drop-down menu, select demoExtractor.
Add a To Type node, found under the Transforms category. From the Data Type drop-down menu, select demoDataType.
Add a From Type node, found under the Transforms category. From the Data Type drop-down menu, select demoDataType.
Add a Save Row to Data Set node, found under the dexi.io category. From the Data Set drop-down menu, select demoExtractedDataSet.
After saving your work, create a new run for the Pipe named demoPipe1, then execute it to populate demoExtractedDataSet with data gathered by demoExtractor from our demo site.
2. Create a second Pipe
Create a second Pipe named demoDictionaryMatching to perform the dictionary matching needed to compare our competitor’s prices with our own.
Add a For Each Row in Data Set node, found under the dexi.io category. From the Data Set drop-down menu, select demoDictionaryResult.
Add an As Fields node, found under the Transforms category.
Connect the two nodes.
Add a Dictionary Lookup node, found under the dexi.io category. From the Dictionary drop-down menu, select demoDictionary. From the Insert Unknown Entries radio group, select True. This is an important consideration, especially when just beginning to accumulate data. When set to false, rows not matching known/existing data will not be added to the data set. From the Only Use Verified Entries radio group, select False. This is another key factor in dictionary matching. After accumulating an initial data set, you can verify the entries, then set this to True to compare only against existing verified entries.
Connect the As Fields node's right-side tab labeled Product to the Lookup node's left-side Value tab.
Add a From Type node, found under the Transforms category. From the Data Type drop-down menu, select demoDictionaryResult.
Connect the As Fields node's right-side Price - DKK tab to the From Type node's left-side Price - DKK tab.
At this point, we could add a Filter Rows node, found under the Collections category. In the node's Edit pop-up, from the Filter By drop-down menu, select Product. From the Product Is drop-down menu, select Empty. This will remove any empty rows from the resulting data set. Connect the From Type node's right-side Value tab to the Filter Rows node's left-side Input tab. (Note that the node offers two right-side tabs: Failed and Matched. If you wish to capture rows without a value for Product, you can create an additional data set specifically for this purpose to receive Failed output.) For the sake of this exercise, though, don't do this now.
Add a Save Row to Data Set node, found under the dexi.io category. From the Data Set drop-down menu, select demoDictionaryResult.
Connect the From Type node's Matched tab to the Save Row node's left-side Row tab.
After saving your work, create a new run for demoDictionaryMatching named demoDictionaryMatchingRun and execute it. The results will be available under the execution's Results tab.