What's the problem?
When you develop a website with Facebook or Twitter integration, you get to the point in time, when you have deployed your site on the server with its final domain, but you don't want your site to be open for public. So you block the access to it, but need to test the Facebook-integration with og-tags.
Step 1: Block unwanted guests
Your first step to make the page private would maybe be to block the ip-address in the
.htaccess like this:
order deny,allow deny from all allow from 192.168.0.1 allow from 10.0.0.1
But this leads to a problem, where you need to have an accurate list of ip-addresses from every client that should see this page. With a huge customer, this escalates quickly.
So its better to also allow BasicAuthentication:
AuthName "Restricted Access"
require user [username]
deny from all
allow from 192.168.0.1
allow from 10.0.0.1
This requests a password from every client that does not have this ip-address.
Step2: Open the door to Facebook/Twitter
There are 2 methods to give access to the Facebook crawler, as described here by facebook.
Method 1: Maintainable, but insecure
If you do not have that kind of big news that has to be kept secret until the site is made public, then whitelisting user agents is quite right for you.
There are currently 3 User-Agents, that the Facebook scraper uses:
Twitter uses only one
You can whitelist this user-agents, but be aware, that you can spoof the user-agent with nearly every browser.
Anyway, here is the code to whitelist the user-agent:
SetEnvIfNoCase User-Agent "^facebookexternalhit" facebook SetEnvIfNoCase User-Agent "Facebot" facebook SetEnvIfNoCase User-Agent "Twitterbot" twitter AuthUserFile /path/to/.htpasswd AuthName "Restricted Access" AuthType Basic require user [username] satisfy any deny from all allow from 192.168.0.1 allow from 10.0.0.1 allow from env=facebook allow from env=twitter
you set an
env-variable. Same for Twitter.
Method 2: Whitelist scraper IPs
This method is more secure, because spoofing the ip-address is much harder.
Here you just need to get the current list of ip-addresses of the Facebook crawler, using the following command:
whois -h whois.radb.net -- '-i origin AS32934' | grep ^route
then you add an
allow from ... entry for every ip-address returned.
AS13414 so the request to get their addresses is
whois -h whois.radb.net -- '-i origin AS13414' | grep ^route
although Twitter states here, that they currently only use these addresses:
184.108.40.206 220.127.116.11 18.104.22.168 22.214.171.124 126.96.36.199 188.8.131.52
But please be aware, that this addresses can change often.