Saving images with (just) Requests:HTTP for Humans
note, the 'just code' version of the function below can be gotten from https://gist.github.com/4221658
I've been working on a scraper do download image urls that I've collected for another project, using Requests: HTTP for Humans, and the suggested method of saving images from requests was giving me some trouble:
import requests
from PIL import Image
from StringIO import StringIO
r = requests.get('http://server.com/1.jpg')
r.content
i = Image.open(StringIO(r.content))
It turned out to be my own fault, but still - PIL can be a PITA if you don't already have it installed, so it seems potentially helpful to distribute an alternative solution that only requires standard library (er, maybe 2.6 or 2.7 though...) outside of requests.
Because in reality, images aren't special - they're just files. And if you're downloading them, you don't need to use software that understands images, you just have to handle the IO and file handling bits. So besides requests, we only need two standard library imports:
import requests
from io import open as iopen
from urlparse import urlsplit
Notice I brought in io.open as iopen and not open - it didn't seem strictly necessary, but I wanted to make sure it was obvious that I wasn't using the more common File.open.
def requests_image(file_url):
suffix_list is a whitelist - only files that have the following suffixes will be saved:
suffix_list = ['jpg', 'gif', 'png', 'tif', 'svg',]
For validation we use the same hack twice - after getting the path [2] from urlsplit(file_url), ex:
urlsplit(file_url)[2] = 'image_cache/1354660469986314.jpg'
.split() the path with '/' first and get the right most result ('1354660469986314.jpg') and then .split() that result with '.'
to get the file name suffix ('jpg')
file_name = urlsplit(file_url)[2].split('/')[-1]
file_suffix = file_name.split('.')[1]
...if you wanted to add error checking you could also do something like
fileType = i.headers['content-type'].split('/')[-1]
# 'image/jpeg' => 'jpeg'
But since you're going to have to add 'jpeg' to the suffix list you might as well make a second tuple like ctypes = ['image/jpeg', 'image/png', ] and validate i.headers['content-type'] against that - in my case I'm passing links from img src anyway, so I'm not getting bogged down in too much validation.
Moving on, the requests part is dead simple, as designed by kennethreitz
i = requests.get(file_url)
The rest is pretty much the example from the io page in the standard lib - the only thing to mention is that the mode is 'wb' for'write binary.
Also for 'warner brothers' because I'm a bugs bunny fan from way back
if file_suffix in suffix_list and i.status_code == requests.codes.ok:
with iopen(file_name, 'wb') as file:
file.write(i.content)
else:
return False
```
This can be tested in the interpreter easily:
requests_image(raw_input())
and then copy/paste image urls (or not-image urls if you'd like to see the python interpreter print "False")
Uncommented function is here:
https://gist.github.com/4221658
* let's face it, import statements and else: return false is garnish at best - not the meat