Last Updated: October 07, 2018
·
382
· truthadjustr

How to correctly work with XML on the command line

My favorite cli weapons: sed, awk, grep cannot correctly work on XML content. Behold, I found a new addition, xmlstarlet. It has been my favorite tool to interrogate XML files in the command line. You can bash script functions around it to create specific tools to your suiting.

To count <Person> elements in record.xml:

xmlstarlet sel -t -v "count(//Person)"  record.xml

This is just the beginning of an exciting journey working with XML. I can count how many attributes an element has. I can transform XML. I can list elements having specific attribute values.

For example, here is a script I named as elcount.sh which helps me a pretty quick idea of the structure of an XML:

#!/bin/sh
#

[ $# -eq 0 ] && return 1

TMP=$(mktemp)
TMP2=$(mktemp)
TMP3=$(mktemp)
TMP4=$(mktemp)
XML=$1
xmlstarlet el -u $XML > $TMP
while read x;do
  xmlstarlet sel -t -v "count(//$x)" -n $XML >> $TMP2
  xmlstarlet sel -t -m "//$x/@*" -v "name()" -n $XML | sort -u > $TMP4
  count=$(wc -l < $TMP4)
  if ((count == 0));then
    echo >> $TMP3
  else
    if ((count <= 10));then
      attrs=$(tr '\n' ',' < $TMP4) 
      echo "@:$count:${attrs%?}" >> $TMP3
    else
      echo "@:$count:*" >> $TMP3
    fi
  fi
done < $TMP

paste $TMP $TMP2 $TMP3 | column -s $'\t' -t
rm -f $TMP $TMP2 $TMP3 $TMP4

The script elcount.sh allows me to interrogate and gives me an idea of the general structure of the XML. I have many others customized script built around xmlstarlet. The above script needs a little 1 line tweak if your XML has a custom schema.