Find broken images in your html source
One of the biggest issues when constantly working with converted files of different formats to HTML has been identifying missing images in the HTML.
I wrote a simple script to go through these files and find any images that are not in their respective directory.
#!/usr/bin/env ruby
script_name = File.basename($0)
directory = ARGV[0] || './'
regEx = /.*\<[Ii][Mm][Gg].*[Ss][Rr][Cc]=["']([^'"]*)/
files = Dir.glob("#{directory}/**/*.html")
broken_images = {}
broken_images_count = 0
errors = []
puts "Searching in #{directory}..."
files.each do |file|
path = file.split('/')
filename = path.pop
current_path = File.join(path)
content = File.open(file).read
broken_images[file] ||= []
begin
puts "reading #{file}..."
content.scan(regEx) do |img_link|
image_path = File.join(current_path, img_link)
unless File.exists?(image_path)
broken_images[file] << image_path
broken_images_count += 1
end
end
broken_images[file].uniq!
rescue Exception => e
message = "File: #{file}\n"
message += "#{e.message}\n"
message += e.backtrace.join("\n")
errors << message
end
end
puts "#{broken_images_count} images missing..."
unless broken_images_count == 0
result_path = "/tmp/#{Time.now.to_i}_#{script_name}_results.txt"
File.open(result_path, 'w') do |file|
broken_images.each do |filename, images|
if images.count > 0
file.write("#{filename}\n")
images.each{|image| file.write(" -#{image}\n")}
end
end
end
puts "Broken image results in #{result_path}..."
end
unless errors.empty?
error_path = "/tmp/#{Time.now.to_i}_#{script_name}_errors.log"
File.open(error_path, 'w') do |file|
errors.each do |error|
file.write("#{error}\n=====\n")
end
end
puts "Error log in #{error_path}..."
end
You can find the gist at: https://gist.github.com/4477256
Written by Eduardo Garibay-Frausto
Related protips
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Ruby
Authors
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#