Saturday, November 12, 2011

Introduction to X-Path


XPath

XPath is a language for finding information in an XML document. XPath uses path expressions to navigate in XML documents. XPath is a W3C recommendation


The basic XPath syntax is similar to filesystem addressing. If the path starts with the slash / , then it represents an absolute path to the required element.
 
/AAA
Select the root element AAA

     <
AAA>
          <bbb>        
     
AAA>
 
/AAA/CCC
Select all elements CCC which are children of the root element AAA
    
          <
CCC/>      
          <
CCC/>
     
 
/AAA/DDD/BBB
Select all elements BBB which are children of DDD which are children of the root element AAA
      
               <
BBB/>
      

Question:
  1. What will be the XPath to select the all the parenet elements?
  2. Which of the elements will be selected by XPath - /parent/child


If the path starts with // then all elements in the document which fulfill following criteria are selected.
 
//BBB
Select all elements BBB

          <
BBB/>
          
          <
BBB/>
          
               <
BBB/>
             
                    <
BBB/>
                    <
BBB/>
              
 
//DDD/BBB
Select all elements BBB which are children of DDD

               <
BBB/>
          
                    <
BBB/>
                    <
BBB/>
     

Question:
1) Which XPath will select all grandchild elements?


The star * selects all elements located by preceeding path
 
/AAA/CCC/DDD/*
Select all elements enclosed by elements /AAA/CCC/DDD
     
                    <
BBB/>
                    <
BBB/>
                    <
EEE/>
                    <
FFF/>         
     
 
/*/*/*/BBB
Select all elements BBB which have 3 ancestors

                    <
BBB/>
                    <
BBB/>
  
                    <
BBB/>
                    <
BBB/>
 
                    <
BBB>
                         
                    
BBB>
   

 
//*
Select all elements

     <
AAA>
          <
XXX>
               <
DDD>
                    <
BBB/>
                    <
BBB/>
                    <
EEE/>
                    <
FFF/>
               
DDD>
          
XXX>
          <
CCC>
               <
DDD>
                    <
BBB/>
                    <
BBB/>
                    <
EEE/>
                    <
FFF/>
               
DDD>
          
CCC>
          <
CCC>
               <
BBB>
                    <
BBB>
                         <
BBB/>
                    
BBB>
               
BBB>
          
CCC>
     
AAA>

Question:
  1. Which XPath will select all the occupants?
  2. Which XPath will select both parent and single elements?

Expresion in square brackets can further specify an element. A number in the brackets gives the position of the element in the selected set. The function last() selects the last element in the selection.

 
/AAA/BBB[1]
Select the first BBB child of element AAA
    
          <
BBB/>
 
 
/AAA/BBB[last()]
Select the last BBB child of element AAA

          <
BBB/>
     


Attributes are specified by @ prefix.
 
//@id
Select all attributes @id

          
id = "b1"/>
          
id = "b2"/>
 


 
//BBB[@id]
Select BBB elements which have attribute id

          <
BBB id = "b1"/>
          <
BBB id = "b2"/>
 
 
//BBB[@name]
Select BBB elements which have attribute name
    
          <
BBB name = "bbb"/>
          
 
//BBB[@*]
Select BBB elements which have any attribute
     
          <
BBB id = "b1"/>
          <
BBB id = "b2"/>
          <
BBB name = "bbb"/>
              
 
//BBB[not(@*)]
Select BBB elements without an attribute

          <
BBB/>
     

Question:
  1. How to select the second grandchild element?
  2. How to select apartment (use @ in the XPath)?

Values of attributes can be used as selection criteria. Function normalize-space removes leading and trailing spaces and replaces sequences of whitespace characters by a single space.
 
//BBB[@id='b1']
Select BBB elements which have attribute id with value b1
   
          <
BBB id = "b1"/>
          
 
//BBB[@name='bbb']
Select BBB elements which have attribute name with value 'bbb'
         
          <
BBB name = "bbb"/>
     
 
//BBB[normalize-space(@name)='bbb']
Select BBB elements which have attribute name with value bbb, leading and trailing spaces are removed before comparison
   
          <
BBB name = " bbb "/>
          <
BBB name = "bbb"/>
     

Question:
1) How to select elements where age is 10?
2) How to select the element where age is 10 and name contais “1.”?

Function count() counts the number of selected elements
 
//*[count(BBB)=2]
Select elements which have two children BBB
            
          <
DDD>
                            
          
DDD>
          

 
//*[count(*)=2]
Select elements which have 2 children

          <
DDD>
                      
          
DDD>
          <
EEE>
                          
          
EEE>
     

 
//*[count(*)=3]
Select elements which have 3 children

     <
AAA>
          <
CCC>
 
          
CCC>
          
     
AAA>

Question:
  1. Select a parent which have 4 children.

Function name() returns name of the element, the starts-with function returns true if the first argument string starts with the second argument string, and the contains function returns true if the first argument string contains the second argument string.
 
//*[name()='BBB']
Select all elements with name BBB, equivalent with //BBB

               <
BBB/>
               <
BBB/>
               <
BBB/>
          
          
               <
BBB/>
               <
BBB/>
 
 
//*[starts-with(name(),'B')]
Select all elements name of which starts with letter B

     
          <
BCC>
               <
BBB/>
               <
BBB/>
               <
BBB/>
          
BCC>
          
               <
BBB/>
               <
BBB/>
          
          <
BEC>
               
               
          
BEC>
     

 
//*[contains(name(),'C')]
Select all elements name of which contain letter C
 
          <
BCC>
     
          
BCC>
     
          <
BEC>
               <
CCC/>
               
          
BEC>
     



The string-length function returns the number of characters in the string.
 
//*[string-length(name()) = 3]
Select elements with three-letter name

     <
AAA>
        
          
          <
CCC/>
          
          
     
AAA>
 
//*[string-length(name()) < 3]
Select elements name of which has one or two characters

          <
Q/>
          
          <
BB/>
          
 
 
//*[string-length(name()) > 3]
Select elements with name longer than three characters

          
          <
SSSS/>
          
          
          <
DDDDDDDD/>
          <
EEEE/>
     

Question
  1. select all the child and grandchild elements (use name to check)?


Several paths can be combined with | separator.
 
//CCC | //BBB
Select all elements CCC and BBB

          <
BBB/>
          <
CCC/>
          
               <
CCC/>
 
     
 
/AAA/EEE | //BBB
Select all elements BBB and elements EEE which are children of root element AAA

     
          <
BBB/>
     
          <
EEE/>
     
 
/AAA/EEE | //DDD/CCC | /AAA | //BBB
Number of combinations is not restricted

     <
AAA>
          <
BBB/>
          
          
               <
CCC/>
          
          <
EEE/>
     
AAA>

Question:
1. Select all the grandchild and child elements (use | operator).


The descendant axis contains the descendants of the context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes
 
/AAA/BBB/descendant::*
Select all descendants of /AAA/BBB

 
               <
DDD>
                    <
CCC>
                         <
DDD/>
                         <
EEE/>
                    
CCC>
               
DDD>
   
 

 
//CCC/descendant::*
Select all elements which have CCC among its ancestors

                         <
DDD/>
                         <
EEE/>
                    
          
               <
DDD>
                    <
EEE>
                         <
DDD>
                              <
FFF/>
                         
DDD>
                    
EEE>
               
DDD>
          
     


 
//CCC/descendant::DDD
Select elements DDD which have CCC among its ancestors

                    
                         <
DDD/>
 
          
               <
DDD>
                    
                         <
DDD>
                              
                         
DDD>
                    
               
DDD>
          
     


Question
1. Search from apartment to select all the grandchild elements.

The parent axis contains the parent of the context node, if there is one.
 
//DDD/parent::*
Select all parents of DDD element

     
          <
BBB>
               
                    <
CCC>
                         
                         
                    
CCC>
               
          
BBB>
          <
CCC>
               
                    <
EEE>
                         
                              
                         

                    
EEE>
               
          
CCC>
     



The ancestor axis contains the ancestors of the context node; the ancestors of the context node consist of the parent of context node and the parent's parent and so on; thus, the ancestor axis will always include the root node, unless the context node is the root node.
 
/AAA/BBB/DDD/CCC/EEE/ancestor::*
Select all elements given in this absolute path

     <
AAA>
          <
BBB>
               <
DDD>
                    <
CCC>
                         
                         
                    
CCC>
               
DDD>
          
BBB>
            
          

     
AAA>

 
//FFF/ancestor::*
Select ancestors of FFF element

     <
AAA>
          
              

          <
CCC>
               <
DDD>
                    <
EEE>
                         <
DDD>
                              
                         
DDD>
                    
EEE>
               
DDD>
          
CCC>
     
AAA>




The following-sibling axis contains all the following siblings of the context node.
 
/AAA/BBB/following-sibling::*


          

          <
XXX>
               
 
          
XXX>
          <
CCC>
               
          
CCC>
     

 
//CCC/following-sibling::*



               <
DDD/>
          
 
                    <
FFF/>
                    <
FFF>
                         
                    
FFF>
               
       



The preceding-sibling axis contains all the preceding siblings of the context node
 
/AAA/XXX/preceding-sibling::*



     
          <
BBB>
               
               
          
BBB>
 

 
//CCC/preceding-sibling::*



     
          <
BBB>
               
               
          
BBB>
          <
XXX>
               
                    <
EEE/>
                    <
DDD/>
               
          
XXX>
 



The following axis contains all nodes in the same document as the context node that are after the context node in document order, excluding any descendants and excluding attribute nodes and namespace nodes.
 
/AAA/XXX/following::*

 
          <
CCC>
               <
DDD/>
          
CCC>
     

 
//ZZZ/following::*



 
               <
FFF>
                    <
GGG/>
               
FFF>
          
          <
XXX>
               <
DDD>
                    <
EEE/>
                    <
DDD/>
                    <
CCC/>
                    <
FFF/>
                    <
FFF>
                         <
GGG/>
                    
FFF>
               
DDD>
          
XXX>
          <
CCC>
               <
DDD/>
          
CCC>
     



The preceding axis contains all nodes in the same document as the context node that are before the context node in document order, excluding any ancestors and excluding attribute nodes and namespace nodes
 
/AAA/XXX/preceding::*



     
          <
BBB>
               <
CCC/>
               <
ZZZ>
                    <
DDD/>
               
ZZZ>
          
BBB>
 

 
//GGG/preceding::*



     
          <
BBB>
               <
CCC/>
               <
ZZZ>
                    <
DDD/>
               
ZZZ>
          
BBB>
          
               
                    <
EEE/>
                    <
DDD/>
                    <
CCC/>
                    <
FFF/>
                    



The descendant-or-self axis contains the context node and the descendants of the context node
 
/AAA/XXX/descendant-or-self::*



          <
XXX>
               <
DDD>
                    <
EEE/>
                    <
DDD/>
                    <
CCC/>
                    <
FFF/>
                    <
FFF>
                         <
GGG/>
                    
FFF>
               
DDD>
          
XXX>
          
               
          

     

 
//CCC/descendant-or-self::*


          
               <
CCC/>
               
                    <
CCC/>
 
          <
CCC>
               <
DDD/>
          
CCC>
     



The ancestor-or-self axis contains the context node and the ancestors of the context node; thus, the ancestor-or-self axis will always include the root node.
 
/AAA/XXX/DDD/EEE/ancestor-or-self::*



     <
AAA>
          
          <
XXX>
               <
DDD>
                    <
EEE/>
                    
               
DDD>
          
XXX>
 
     
AAA>

 
//GGG/ancestor-or-self::*



     <
AAA>
 
          <
XXX>
               <
DDD>
 
                    <
FFF>
                         <
GGG/>
                    
FFF>
               
DDD>
          
XXX>
 
     
AAA>



The div operator performs floating-point division, the mod operator returns the remainder from a truncating division. The floor function returns the largest (closest to positive infinity) number that is not greater than the argument and that is an integer.The ceiling function returns the smallest (closest to negative infinity) number that is not less than the argument and that is an integer.
 
//BBB[position() mod 2 = 0 ]
Select even BBB elements

          <
BBB/>
          
          <
BBB/>
          
          <
BBB/>
          
          <
BBB/>
 


Question
  1. Find out child elements whose age is matching with single element.

    Sample XML
     
    <apartment>
      <parent name = "top1">
        <child name="1" age="30">
          <grandchild name="1.1" age="10"></grandchild>
          <grandchild name="1.2" age="15"></grandchild>
        </child>
        <child name="2" age="35">
          <grandchild name="2.1" age="13"></grandchild>
          <grandchild name="2.2" age="16"></grandchild>
        </child>
        <child name="3" age="38">
          <grandchild name="3.1" age="10"></grandchild>
          <grandchild name="3.2" age="12"></grandchild>
        </child>
      </parent>
      <parent name = "top2">
        <child name="4" age="30">
          <grandchild name="4.1" age="3"></grandchild>
          <grandchild name="4.2" age="4"></grandchild>
        </child>
        <child name="5" age="35">
          <grandchild name="5.1" age="5"></grandchild>
          <grandchild name="5.2" age="6"></grandchild>
        </child>
        <child name="6" age="38">
          <grandchild name="6.1" age="10"></grandchild>
          <grandchild name="6.2" age="8"></grandchild>
        </child>
        <child name="7" age="38">
          <grandchild name="7.1" age="10"></grandchild>
          <grandchild name="7.2" age="9"></grandchild>
          <grandchild name="7.3" age="1"></grandchild>
        </child>
      </parent>
      <single name = "sing1" age='38'></single>  
    </apartment>