Simple cache when scraping with Ruby
I'm scraping a bunch of websites lately and got bored with using File.write
to store cached versions of websites. Because I'm still developing the script I don't want it to hit the real website every time. So simple way to fix that is with the vcr gem. While made primarily for testing you can also use it for this kind of tasks.
First you need some kind of configuration file that loads before your actual script. I have it in config/vcr.rb
:
VCR.configure do |c|
c.cassette_library_dir = 'cassettes'
c.hook_into :webmock
c.allow_http_connections_when_no_cassette = true
end
Then I have a Shared
module with the cache
method which I include
in any classes I need this functionality:
module Shared
def cache name
VCR.use_cassette name do
yield
end
end
end
And now you can use this magic, to have the website you're scraping instantly cached:
def github_for user
cache "gh-#{user}" do
response = open("https://api.github.com/users/#{user}").read
JSON[response]
end
end
Written by Miha Rekar
Related protips
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Ruby
Authors
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#