Last Updated: February 25, 2016
·
8.764K
· codeFareith

Parse XML into associative array (with CDATA-Support)

Several days ago I tried to parse an XML file into an associative array. With the use of implexml_load_file or simplexml_load_string, combined with json_decode and json_encode it seems to work like a charm.
Take a look at the following function:

/**
 * @param SimpleXMLElement $xml
 * @return mixed
 */
function toAssocArray($xml) {
    $string = json_encode($xml);
    $array = json_decode($string, true);
    return $array;
}

While this seems quite useful, it has one limitation, regarding <!CDATA[...]>.
Consider following xml:

<?xml version="1.0" encoding="utf-8"?>
<sample>
    <sample1>foo<!CDATA[bar]></sample1>
    <sample2><!CDATA[foobar]></sample2>
</sample>

If we use toAssocArray, the value of sample1 will be "foobar", as expected. But if we take a look at the value of sample2 we'll notice that it contains an empty array - the actual content "foobar" gets lost.

To get rid of this behaviour, we have to remove the <!CDATA[..]> and filter its value (because usually you store data that could be interpreted as XML/HTML, but shouldn't).
So we have to use preg_replace_callback on the XML string along with a custom filter function, before we parse it into a SimpleXMLElement and finally into an associative array.

This has one drawback: if we want to get the XML from a file, we have to use file_get_contents instead of simplexml_load_file, because simplexml_load_file will return a SimpleXMLElement, on which we can't apply our filter.

So these are our final functions:

/**
 * @param SimpleXMLElement $xml
 * @return mixed
 */
function toAssocArray($xml) {
    $string = json_encode($xml);
    $array = json_decode($string, true);
    return $array;
}
/**
 * @param string $xml
 * @return SimpleXMLElement
 */
function betterxml_load_string($xml) {
    $string = preg_replace_callback('/<!\[CDATA\[(.*)\]\]>/', 'cdata_filter', $xml);
    $xml = simplexml_load_string($string);
    return $xml;
}
/**
 * @param array $matches
 * @return string
 */
function cdata_filter($matches) {
    $converted = htmlspecialchars($matches[1];
    $trimmed = trim($converted);
    return $trimmed;
}

And we use it as follows:

$xml = file_get_contents('some/xml/file.xml');
$assoc = toAssocArray($xml);

echo $assoc['sample1'];