03r98q
Last Updated: February 25, 2016
·
17.89K
· 123aswin123
261905 photo me

Using The Nokogiri Gem To Parse Nested XML Data In Ruby

Nokogiri (http://www.nokogiri.org/) is indeed one of the most powerful Ruby gems to parse XML / HTML.

The Site itself does not contain sufficient information to parse XML. On the other hand the documentation (http://www.rubydoc.info/github/sparklemotion/nokogiri) is too much to read through to parse a simple XML tree.

Consider the following XML tree(The same tree given in the nokogiri website:http://www.nokogiri.org/tutorials/searching_a_xml_html_document.html )

[shows.xml]
<root>
  <sitcoms>
    <sitcom>
      <name>Married with Children</name>
      <characters>
        <character>Al Bundy</character>
        <character>Bud Bundy</character>
        <character>Marcy Darcy</character>
      </characters>
    </sitcom>
    <sitcom>
      <name>Perfect Strangers</name>
      <characters>
        <character>Larry Appleton</character>
        <character>Balki Bartokomous</character>
      </characters>
    </sitcom>
  </sitcoms>
  <dramas>
    <drama>
      <name>The A-Team</name>
      <characters>
        <character>John "Hannibal" Smith</character>
        <character>Templeton "Face" Peck</character>
        <character>"B.A." Baracus</character>
        <character>"Howling Mad" Murdock</character>
      </characters>
    </drama>
  </dramas>
</root>

According to the Syntax given in the official docs page :

require 'nokogiri'

doc = Nokogiri::XML(File.open("shows.xml"))
doc.xpath('//character').each do

 |char_element|

 puts char_element.text

 end

The output of the above code would be

Al Bundy
Bud Bundy
Marcy Darcy
Larry Appleton
Balki Bartokomous
John "Hannibal" Smith
Templeton "Face" Peck
"B.A." Baracus
"Howling Mad" Murdock

But most probably you , like I did, you wish to display/ access the sitcom name first and then display/ access the character names and repeat it for the remaining elements this code would do it for you:

require 'nokogiri'

xml_file = File.read("shows.xml")

doc = Nokogiri::XML.parse(xml_file)

doc.xpath('//sitcom').each do

  |sitcom_element|

  puts "\nShow Name : "+sitcom_element.xpath('name').text
  count=1
  sitcom_element.xpath('characters/character').each do

    |character_element|

    puts "    #{count}.Charachter : " + character_element.text
    count=count+1



  end

end

The output of the following code is :

Show Name : Married with Children
    1.Charachter : Al Bundy
    2.Charachter : Bud Bundy
    3.Charachter : Marcy Darcy

Show Name : Perfect Strangers
    1.Charachter : Larry Appleton
    2.Charachter : Balki Bartokomous

1)The Key observation here is that

element.xpath("//tag_name") 

would just return a list of all the elements under the tag_name, it is like a Ctrl+F Operation

2)When we use

element.xpath("tag_name") 

we will be able to access the elements in a tree like fashion

Say Thanks
Respond

1 Response
Add your response

24859
None

Nice sharing! Thanks!

over 1 year ago ·