Last Updated: March 08, 2016

· ryanartecona

RegEx to find <img/> tags missing alt attributes

#html

A couple of weeks ago, at work, I was helping a colleague make a git hook that did some rudimentary testing of locally-committed code changes before they get accepted into the remote, such that invalid or malformed code would require correction before making it into the central repo.

After a couple frustrating hours, I came up with a regex that matches only those tags that lack alt attributes:

<img(\s*(?!alt)([\w\-])+=([\"\'])[^\"\']+\3)*\s*\/?>

Notes on flexibility:

+ the tag and its attributes may span multiple lines
+ supports attributes whose names include hyphens (i.e. data-* attributes)
+ tags can end with /> (self-closing) or just >
+ attribute values can be in single or double quotes
– all of the tag's attributes must have quoted values (so <img src=me.jpg /> won't match)
– none of the tag's attribute values may contain quote characters ( so <img src="<?php 'someString' ?>" /> won't match)

Notes on usage:

this requires a Perl-compatible regular expression (PCRE), which notably doesn't come standard with git grep, and requires the -P flag when used with grep
when searching source code, you should use ack instead of grep
in Python, your regex string literal should be prefixed with r (e.g. r'<regex>'
in Javascript, use forward slashes to denote a regex string, and use the g flag to match all occurrences, instead of only the first (e.g. /<regex>/g)

If you find this useful, incomplete, or interesting in anyway, drop me a tweet!

Written by Ryan Artecona

Say Thanks

Respond

Related protips

Total input[type=file] style control with pure CSS

295.1K

How to make a circular image with CSS only

244.7K

Centered Text And Images In Github Markdown

142.8K

4 Responses

Add your response

u01jmg3

First of all, thank you. I've changed it to look for links/anchors that are missing a class attribute.

<a(\s(?!class)([\w-])+=([\"\'])[^\"\']+\3)\s*\/?>

-Could it be extended to look for links/anchors missing a particular class such as 'standard_link'? The problem is further complicated by the fact that there could be multiple classes defined. I know regex shouldn't really be used with html but I want to use it with the program grepwin to look through my PHP code to make sure I've added a class to all my links.

over 1 year ago ·

ryanartecona

@u01jmg3 Glad you got some use out of it!

I'm sure it's possible for it to be extended in that way, but I'm not so sure I've got the chops to do it. One thing I learned from sculpting this thing is that trying to match anything beyond a well-defined text, date, or other numerical format can make for a pretty hairy regex pretty quickly. You may want to give it a shot yourself, but you would probably be better off writing a script that searches those files in that way for you with the help of a decent HTML parsing library.

over 1 year ago ·

u01jmg3

@ryanartecona
I ended up using XPath.

$html = file_get_contents($filename);
$dom = new DOMDocument();
@$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//a[not(img) and not(contains(concat(' ', &#64;class, ' '), ' standard_link '))]");

if($entries->length > 0){
    echo '<ul><li style="font-weight: bold;">' . $filename . '</li><ol>';
}

foreach ($entries as $entry) {
    $array[] = $dom->saveHTML($entry);
}

if(isset($array))
    foreach($array as $value)
        echo '<li>' . htmlspecialchars($value) . '</li>';           

echo '</ol></ul>';

over 1 year ago ·

fcgrx

To remove any img tag that does not contain a src attribute:




$str = preg_replace('#<img\s((?!src=).)/?>#Umi','',$str);
</code>
</pre>

This is shorter and, I think, more reliable.

over 1 year ago ·

Have a fresh tip? Share with Coderwall community!

Best #Html Authors

395.4K

294.9K

270.8K

153.2K

devers

111.1K

Related Tags

Filed Under

Accelerate Your Web Development Skills

Tools

Awesome Job

See All Jobs

Post a job for only $299

#native_title# #native_desc#

#native_cta#

RegEx to find &lt;img/&gt; tags missing alt attributes

Written by Ryan Artecona

Related protips

Total input[type=file] style control with pure CSS

How to make a circular image with CSS only

Centered Text And Images In Github Markdown

4 Responses Add your response

Have a fresh tip? Share with Coderwall community!

RegEx to find <img/> tags missing alt attributes

4 Responses

Add your response