khyb7g
Last Updated: January 06, 2019
·
194
· skyzyx

Less slow, case-insensitive, XPath lookups in PHP

PHP depends on libxml2 as its underlying XML parser. libxml2 supports XPath 1.0, but not newer versions. Because of this, performing case-insensitive queries (like when you're parsing a non-compliant RSS feed) needs to be done in userland.

Querying data out of an XML structure (with DOMDocument) can be up to 100× (i.e., 10000%) faster using well-crafted XPath queries over "regular" PHP (e.g., looping, if conditionals). Most suggestions on the internet say to use XPath's translate() function to convert the entire alphabet, but this can be 8× SLOWER (e.g., 800%). We can make this around 35% less slow so that it is only 4.5–5× slower (450–500%) if we only convert the letters that are actually in the word.

This performance still isn't great, but is definitely better. Tested against PHP 7.2.

<?php
$word           = 'rss';
$elementLetters = \count_chars($word, 3);
$lettersLower   = \mb_strtolower($elementLetters);
$lettersUpper   = \mb_strtoupper($elementLetters);

$query = \sprintf(
    '/*[translate(name(), \'%s\', \'%s\') = \'%s\']',
    $lettersUpper,
    $lettersLower,
    $word
);

# /*[translate(name(), 'RS', 'rs') = 'rss'
$results = $domxpath->query($query);