Last Updated: September 09, 2019
· avgp

Collecting a website recursively into a single PDF

The goal of this

The following script will download a website recursively into a collection of html files, convert them into PDFs and then concatenates them into a single PDF.


You'll need pdftk, wget and wkhtmltopdf.
Make sure that you have a wkhtmltopdf version that terminates properly, for example version 0.9.9.

If you're on OSX, you can install all of these tools via homebrew.
The formula for pdftk can be found here.

The script


echo "Collecting files from subfolders..."
for FILENAME in $(find . -type f -name '*\.html' -print | sed 's/^\.\///')
    mv $FILENAME `basename $FILENAME`

echo "Converting into PDF files..."
find . -name \*.html | sed 's/.html$//g' | xargs -n 1 -I X wkhtmltopdf --quiet X.html X.pdf

echo "Concatenating the PDF files..."
pdftk *.pdf cat output book.pdf

1 Response
Add your response


Is there a way to do this on Windows?

over 1 year ago ·