How to correctly work with XML on the command line
My favorite cli weapons: sed
, awk
, grep
cannot correctly work on XML content. Behold, I found a new addition, xmlstarlet
. It has been my favorite tool to interrogate XML files in the command line. You can bash script functions around it to create specific tools to your suiting.
To count <Person>
elements in record.xml:
xmlstarlet sel -t -v "count(//Person)" record.xml
This is just the beginning of an exciting journey working with XML. I can count how many attributes an element has. I can transform XML. I can list elements having specific attribute values.
For example, here is a script I named as elcount.sh
which helps me a pretty quick idea of the structure of an XML:
#!/bin/sh
#
[ $# -eq 0 ] && return 1
TMP=$(mktemp)
TMP2=$(mktemp)
TMP3=$(mktemp)
TMP4=$(mktemp)
XML=$1
xmlstarlet el -u $XML > $TMP
while read x;do
xmlstarlet sel -t -v "count(//$x)" -n $XML >> $TMP2
xmlstarlet sel -t -m "//$x/@*" -v "name()" -n $XML | sort -u > $TMP4
count=$(wc -l < $TMP4)
if ((count == 0));then
echo >> $TMP3
else
if ((count <= 10));then
attrs=$(tr '\n' ',' < $TMP4)
echo "@:$count:${attrs%?}" >> $TMP3
else
echo "@:$count:*" >> $TMP3
fi
fi
done < $TMP
paste $TMP $TMP2 $TMP3 | column -s $'\t' -t
rm -f $TMP $TMP2 $TMP3 $TMP4
The script elcount.sh
allows me to interrogate and gives me an idea of the general structure of the XML. I have many others customized script built around xmlstarlet
. The above script needs a little 1 line tweak if your XML has a custom schema.