Last Updated: September 09, 2019

·

1.61K

· alexanderbrevig

Wisdom: Don't use RegEx to parse HTML

This entertaining post can explain it to you: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

tl;dr
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. - Upset StackOverflow User

<h3>Use a real HTML Parser:</h3>

<ul>
<li>Ruby: <a href="http://nokogiri.org/">Nokogiri</a></li>
<li>JavaScript: <a href="http://jquery.com/">jQuery</a></li>
<li>PHP: <a href="http://docs.php.net/manual/en/domdocument.loadhtml.php">PHP5 DOMDocument</a></li>
<li>.Net(C#): <a href="http://htmlagilitypack.codeplex.com/">Html Agility Pack</a></li>
<li>VB6: <a href="http://www.codeguru.com/vb/vb_internet/html/article.php/c4815">MSHTML</a> (Used by IE)</li>
<li>Python: <a href="http://lxml.de/xpathxslt.html">lxml</a></li>
<li>Perl: <a href="http://search.cpan.org/~gaas/HTML-Parser-3.68/Parser.pm">HTML:Parser</a></li>
<li>Java: <a href="http://htmlcleaner.sourceforge.net/">HTML Cleaner</a></li>
</ul>

Written by Alexander Brevig

Related protips

Total input[type=file] style control with pure CSS

295.1K

7

How to make a circular image with CSS only

244.7K

12

Centered Text And Images In Github Markdown

142.8K

1

Have a fresh tip? Share with Coderwall community!

Best #Html Authors

projectcleverweb

395.4K

294.9K

270.8K

thomaslindstr_m

153.2K

111.1K

Related Tags

#native_company#

Filed Under

Accelerate Your Web Development Skills

Awesome Job

Post a job for only $299

Thanks to our sponsor

#native_title# #native_desc#