XPath is used to navigate through elements and attributes in an XML document. All the web pages are HTML documents in nature. Octoparse provides an XPath engine for HTML documents so that we can use XPath to locate data on web page precisely.
XPath uses path expressions to select nodes. The node is selected by following a path or steps.
Below, it's the list of the most useful path expressions
nodename Selects all nodes with the name “nodename”
/ Selects from the root node
// Selects nodes in the document from the current node that mach the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind
There are some predicates in XPath expressions that are used to find a specific node or a node that contains a specific value and always embedded in square brackets. Below it's a list about some path expressions with predicates and the corresponding results:
X path Expression Results
/bookstore/book[last()] Selects the last book element that is the child of the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element
//title[@lang='en'] Selects all the title elements that have a "lang" attribute with a value of "en"
/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00
source from: octoparse.com