Parse Node (new)
From Moving Pictures
As of SVN: 0.8.0.565
The parse node is what you would use to look through something for some information. This is done using either regular expressions or an xpath query. Please review the following table for a list of attributes associated with this node.
| Attribute | description |
|---|---|
| name | This is a required attribute that defines the return variable of this node. |
| input | This is a required attribute that defines the input variable/string to run the regular expression against. |
| regex | (required) attribute that defines the regular expression to use. |
| xpath | (required) attribute that defines the xpath query to use. The value of the input attribute should be in XML format. |
Contents |
Usage
Regular Expressions
<parse name="variable" input="${some_string}" regex="${expression}"/>
Regular Expression Variable Output:
Often a regular expression will turn up more than one result especially when you are searching for movies. That is why the parse node returns an array of results. Lets say that the regular expression in the example returns 4 items per result. The regular expression returns the following: Title, Year, IMDb ID, and details page in that order. If you wanted to get the first result's year you would call this: ${details_page_block[0][1]} The parse node returns a two dimensional array where the first dimension is the result index and the second dimension is the index of the items returned by the regex.
XPath Query
<parse name="variable" input="${xml_string}" xpath="${query}"/>
XPath Variable Output
Often a regular expression will turn up more than one result especially when you are searching for movies. That is why the parse node returns an array of results.
When the XPath attribute is used the parse node will always return an one-dimensional array. The length of the index depends on the type of query used. For each child in the node(s) targetted by the query a variable will be created, it will only create variables for direct children.
The variable output will look like this:
- ${variable[0]} will contain the actual node (OuterXml value)
- ${variable[0].child} contains the value of the child node or the OuterXml value if it contains more children
- ${variable[0].@attribute} contains the value of the attribute
If the XPath query would select multiple nodes then the variables will look like this ${variable[0]}, ${variable[1]}, ${variable[2]} etc...
Examples using Regular Expression
T.B.A.
Examples using XPath
XML used in the input attribute:
<root> <movie id="123456"> <title>Movie</title> <year>2008</year> </movie> <movie id="654321"> <title>Movie</title> <year>2007</year> </movie> </root>
Selecting a single node
<parse name="movie" input="${example}" xpath="//movie[@id=654321]"/>
Scraper variable output:
movie[0] = <movie id="654321"><title>Movie</title><year>2007</year></movie> movie[0].@id = 654321 movie[0].title = Movie movie[0].year = 2007
Selecting multiple nodes
<parse name="movie" input="${example}" selectNodes="//movie"/>
Scraper variable output:
movie[0] = <movie id="123456"><title>Movie</title><year>2008</year></movie> movie[0].@id = 123456 movie[0].title = Movie movie[0].year = 2008 movie[1] = <movie id="654321"><title>Movie</title><year>2007</year></movie> movie[1].@id = 654321 movie[1].title = Movie movie[1].year = 2007