Whitelist Facebook/Twitter to test your Open Graph and Twitter Card implementations
What's the problem?
When you develop a website with Facebook or Twitter integration, you get to the point in time, when you have deployed your site on the server with its final domain, but you don't want your site to be open for public. So you block the access to it, but need to test the Facebook-integration with og-tags.
Step 1: Block unwanted guests
Your first step to make the page private would maybe be to block the ip-address in the vhost
or .htaccess
like this:
order deny,allow
deny from all
allow from 192.168.0.1
allow from 10.0.0.1
But this leads to a problem, where you need to have an accurate list of ip-addresses from every client that should see this page. With a huge customer, this escalates quickly.
So its better to also allow BasicAuthentication:
AuthUserFile /path/to/.htpasswd
AuthName "Restricted Access"
AuthType Basic
require user [username]
satisfy any
deny from all
allow from 192.168.0.1
allow from 10.0.0.1
This requests a password from every client that does not have this ip-address.
Step2: Open the door to Facebook/Twitter
There are 2 methods to give access to the Facebook crawler, as described here by facebook.
Method 1: Maintainable, but insecure
If you do not have that kind of big news that has to be kept secret until the site is made public, then whitelisting user agents is quite right for you.
There are currently 3 User-Agents, that the Facebook scraper uses:
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit
Facebot
Twitter uses only one
Twitterbot
You can whitelist this user-agents, but be aware, that you can spoof the user-agent with nearly every browser.
Anyway, here is the code to whitelist the user-agent:
SetEnvIfNoCase User-Agent "^facebookexternalhit" facebook
SetEnvIfNoCase User-Agent "Facebot" facebook
SetEnvIfNoCase User-Agent "Twitterbot" twitter
AuthUserFile /path/to/.htpasswd
AuthName "Restricted Access"
AuthType Basic
require user [username]
satisfy any
deny from all
allow from 192.168.0.1
allow from 10.0.0.1
allow from env=facebook
allow from env=twitter
you set an env
-variable Facebook
in case the user agent matches one of the Facebook user-agents and allow access for this env
-variable. Same for Twitter.
Method 2: Whitelist scraper IPs
This method is more secure, because spoofing the ip-address is much harder.
Here you just need to get the current list of ip-addresses of the Facebook crawler, using the following command:
whois -h whois.radb.net -- '-i origin AS32934' | grep ^route
then you add an allow from ...
entry for every ip-address returned.
Twitters ASNUM
is AS13414
so the request to get their addresses is
whois -h whois.radb.net -- '-i origin AS13414' | grep ^route
although Twitter states here, that they currently only use these addresses:
199.59.148.209
199.59.148.210
199.59.148.211
199.16.156.124
199.16.156.125
199.16.156.126
But please be aware, that this addresses can change often.