After a while you'll reach the point where you want to get all URIs of your website for some reason e.g. creating a sitemap for SE optimization or a list for validation purposes (W3C Validator) - or just getting a quick overview to tidy up your web site. For all these cases this simple little script using wget and grep may be a great little helper:
wget --no-verbose --recursive --spider --force-html --level=DEPTH_LEVEL --no-directories --reject=jpg,jpeg,png,gif YOUR_DOMAIN 2>&1 | sort | uniq | grep -oe 'www[^ ]*'
Resulting in a list of all URIs depending on the DEPTH_LEVEL (e.g. 5) you set, it sorts out all pictures and forces to crawl html files. You can then save the output into a single file by adding
> result.txt after the statement. You can simply modify the matching pattern, for example replacing the www into http:// in order to get more suitable results.
The script does not save any data or content from the web site. It just simply 'spiders' the structure and does not create any directories.
Take a look here for building regular expressions: http://rubular.com
A local installation of the W3C Validator API would be a possible combination to use the URIs: http://validator.w3.org/docs/api.html