Last Updated: February 25, 2016
·
5.909K
· slue

Whitelist Facebook/Twitter to test your Open Graph and Twitter Card implementations

What's the problem?

When you develop a website with Facebook or Twitter integration, you get to the point in time, when you have deployed your site on the server with its final domain, but you don't want your site to be open for public. So you block the access to it, but need to test the Facebook-integration with og-tags.

Step 1: Block unwanted guests

Your first step to make the page private would maybe be to block the ip-address in the vhost or .htaccess like this:

order deny,allow
deny from all
allow from 192.168.0.1
allow from 10.0.0.1

But this leads to a problem, where you need to have an accurate list of ip-addresses from every client that should see this page. With a huge customer, this escalates quickly.

So its better to also allow BasicAuthentication:
AuthUserFile /path/to/.htpasswd AuthName "Restricted Access" AuthType Basic require user [username] satisfy any deny from all allow from 192.168.0.1 allow from 10.0.0.1

This requests a password from every client that does not have this ip-address.

Step2: Open the door to Facebook/Twitter

There are 2 methods to give access to the Facebook crawler, as described here by facebook.

Method 1: Maintainable, but insecure

If you do not have that kind of big news that has to be kept secret until the site is made public, then whitelisting user agents is quite right for you.

There are currently 3 User-Agents, that the Facebook scraper uses:
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit
Facebot
Twitter uses only one
Twitterbot

You can whitelist this user-agents, but be aware, that you can spoof the user-agent with nearly every browser.

Anyway, here is the code to whitelist the user-agent:

SetEnvIfNoCase User-Agent "^facebookexternalhit" facebook
SetEnvIfNoCase User-Agent "Facebot" facebook
SetEnvIfNoCase User-Agent "Twitterbot" twitter
AuthUserFile /path/to/.htpasswd
AuthName "Restricted Access"
AuthType Basic
require user [username]
satisfy any
deny from all
allow from 192.168.0.1
allow from 10.0.0.1
allow from env=facebook
allow from env=twitter

you set an env-variable Facebook in case the user agent matches one of the Facebook user-agents and allow access for this env-variable. Same for Twitter.

Method 2: Whitelist scraper IPs

This method is more secure, because spoofing the ip-address is much harder.

Here you just need to get the current list of ip-addresses of the Facebook crawler, using the following command:

whois -h whois.radb.net -- '-i origin AS32934' | grep ^route  

then you add an allow from ... entry for every ip-address returned.

Twitters ASNUM is AS13414 so the request to get their addresses is

whois -h whois.radb.net -- '-i origin AS13414' | grep ^route

although Twitter states here, that they currently only use these addresses:

199.59.148.209 
199.59.148.210 
199.59.148.211 
199.16.156.124 
199.16.156.125 
199.16.156.126 

But please be aware, that this addresses can change often.