Collecting a website recursively into a single PDF
The goal of this
The following script will download a website recursively into a collection of html files, convert them into PDFs and then concatenates them into a single PDF.
Prerequisites
You'll need pdftk, wget and wkhtmltopdf.
Make sure that you have a wkhtmltopdf version that terminates properly, for example version 0.9.9.
If you're on OSX, you can install all of these tools via homebrew.
The formula for pdftk can be found here.
The script
#!/bin/bash
echo "Collecting files from subfolders..."
for FILENAME in $(find . -type f -name '*\.html' -print | sed 's/^\.\///')
do
mv $FILENAME `basename $FILENAME`
done
echo "Converting into PDF files..."
find . -name \*.html | sed 's/.html$//g' | xargs -n 1 -I X wkhtmltopdf --quiet X.html X.pdf
echo "Concatenating the PDF files..."
pdftk *.pdf cat output book.pdf
Written by Martin Naumann
Related protips
1 Response
Is there a way to do this on Windows?
over 1 year ago
·
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Shell script
Authors
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#