Last Updated: February 25, 2016
·
740
· Drk Strife

Find broken images in your html source

One of the biggest issues when constantly working with converted files of different formats to HTML has been identifying missing images in the HTML.

I wrote a simple script to go through these files and find any images that are not in their respective directory.

#!/usr/bin/env ruby

script_name         = File.basename($0)
directory           = ARGV[0] || './'
regEx               = /.*\<[Ii][Mm][Gg].*[Ss][Rr][Cc]=["']([^'"]*)/
files               = Dir.glob("#{directory}/**/*.html")
broken_images       = {}
broken_images_count = 0
errors              = []

puts "Searching in #{directory}..."
files.each do |file|
  path                = file.split('/')
  filename            = path.pop
  current_path        = File.join(path)
  content             = File.open(file).read
  broken_images[file] ||= []

  begin
    puts "reading #{file}..."
    content.scan(regEx) do |img_link|
      image_path = File.join(current_path, img_link)
      unless File.exists?(image_path)
        broken_images[file] << image_path
        broken_images_count += 1
      end
    end
    broken_images[file].uniq!
  rescue Exception => e
    message = "File: #{file}\n"
    message += "#{e.message}\n"
    message += e.backtrace.join("\n")
    errors  << message
  end
end

puts "#{broken_images_count} images missing..."

unless broken_images_count == 0
  result_path = "/tmp/#{Time.now.to_i}_#{script_name}_results.txt"
  File.open(result_path, 'w') do |file|
    broken_images.each do |filename, images|
      if images.count > 0
        file.write("#{filename}\n")
        images.each{|image| file.write(" -#{image}\n")}
      end
    end
  end
  puts "Broken image results in #{result_path}..."
end

unless errors.empty?
  error_path = "/tmp/#{Time.now.to_i}_#{script_name}_errors.log"
  File.open(error_path, 'w') do |file|
    errors.each do |error|
      file.write("#{error}\n=====\n")
    end
  end
  puts "Error log in #{error_path}..."
end

You can find the gist at: https://gist.github.com/4477256