The explosion in 3rd party services to help developers is fantastic! It reduces the need for us to do things like set up mail servers when we can use things like SendGrid or MailGun. We don't have to have a server to host our WebApp, we can just deploy it to Heroku or AppEngine.
While this is great as it lets us spend a minimal amount of time on creating infrastructure and just get to the business of writing our software, it does introduce several risks including leaving us open to the outage or shutdown of the API service.
As I write this, Yahoo! have once again shut down another service with a very small window of warning: the upcoming api is being shutdown with 11 days notice!
I'll focus here on what is called residual risk. While this is no where near being a comprehensive risk management post it will give you an idea to help protect yourself.
The first thing we need to do is identify all our risks using something like a five box risk matrix. Essentially this weighs up the likelihood of a risk occurring and the consequence resulting in the risk occurring. This gives us a much better idea of where to focus our plans and to what detail.
Here I am talking about third party API risks, so you may consider that the outage of an email service to have a lower consequence (getting more support calls) than the outage of a payment gateway service (loss of income). You could further break down your risks to include:
- Shutdown (hello Yahoo!)
Planning - Mitigation and Response
The second thing for each risk you identify is to create two plans:
- The Mitigation Plan
- The Response Plan
The mitigation plan is how you intend to reduce the risk from occurring. It's important to note that this is not a plan to stop the risk from occurring but to reduce the likelihood of the risk occurring.
Your mitigation plan may be making your app use either SendGrid or MailGun depending on an environment variable.
The response plan is how you are going to deal with the risk once it occurs. The point of risk management is not trying to stop things from happening that are out of your control, but what you are going to do when it happens.
Your response plan might detail how to switch your app from using SendGrid to MailGun in the event of an outage.
While certainly not a detailed description of risk management, I hope that this post will give you an idea of things to think about that could go wrong and how to cover yourself when it does. Being able to simply flick a switch when a provider is down is going to show your boss you have certainly planned for disaster!
Feel free to leave a comment or contact me if you have any questions or would like to discuss anything.