Parse XML into associative array (with CDATA-Support)
Several days ago I tried to parse an XML file into an associative array. With the use of implexml_load_file
or simplexml_load_string
, combined with json_decode
and json_encode
it seems to work like a charm.
Take a look at the following function:
/**
* @param SimpleXMLElement $xml
* @return mixed
*/
function toAssocArray($xml) {
$string = json_encode($xml);
$array = json_decode($string, true);
return $array;
}
While this seems quite useful, it has one limitation, regarding <!CDATA[...]>.
Consider following xml:
<?xml version="1.0" encoding="utf-8"?>
<sample>
<sample1>foo<!CDATA[bar]></sample1>
<sample2><!CDATA[foobar]></sample2>
</sample>
If we use toAssocArray
, the value of sample1 will be "foobar", as expected. But if we take a look at the value of sample2 we'll notice that it contains an empty array - the actual content "foobar" gets lost.
To get rid of this behaviour, we have to remove the <!CDATA[..]> and filter its value (because usually you store data that could be interpreted as XML/HTML, but shouldn't).
So we have to use preg_replace_callback
on the XML string along with a custom filter function, before we parse it into a SimpleXMLElement and finally into an associative array.
This has one drawback: if we want to get the XML from a file, we have to use file_get_contents
instead of simplexml_load_file
, because simplexml_load_file
will return a SimpleXMLElement, on which we can't apply our filter.
So these are our final functions:
/**
* @param SimpleXMLElement $xml
* @return mixed
*/
function toAssocArray($xml) {
$string = json_encode($xml);
$array = json_decode($string, true);
return $array;
}
/**
* @param string $xml
* @return SimpleXMLElement
*/
function betterxml_load_string($xml) {
$string = preg_replace_callback('/<!\[CDATA\[(.*)\]\]>/', 'cdata_filter', $xml);
$xml = simplexml_load_string($string);
return $xml;
}
/**
* @param array $matches
* @return string
*/
function cdata_filter($matches) {
$converted = htmlspecialchars($matches[1];
$trimmed = trim($converted);
return $trimmed;
}
And we use it as follows:
$xml = file_get_contents('some/xml/file.xml');
$assoc = toAssocArray($xml);
echo $assoc['sample1'];