Website Disaster Planning Worksheet

This is a plan for __________________(name of client or company) for when something goes badly on our servers.

Who’s responsible

When something goes wrong, we will call: (name of primary contact). If they are not available, then we will call: _ (name of secondary contact).

Primary Contact
Secondary Contact

Inventory

  1. We have a complete list of users and passwords for these computers & services:

    • Domain registry
    • DNS provider
    • SSL certificate issuer
    • Web server
    • Database server
    • Email server
  2. We have an electronic version of the list of passwords, computers, and services.

    • We check it every ________ (weeks/months/quarters) and update it if needed.
  3. We have a diagram or map of our computer set up.

    • We check it every ________ (weeks/months/quarters) and update it if needed.

Backups

  1. Database backups

    • We back up (continuously/nightly/daily/weekly)
    • We transfer the backups offsite…
      • Manually
      • To a different server
      • To a geographically redundant storage service such as Amazon S3
    • We test restore these backups (weekly / monthly / quarterly)
    • (advanced) we have a hot failover / spare database server
      • it’s in a different geographic region
      • it’s with a different hosting company
  2. Content backups. (The irreplaceable images, text, etc uploaded by the site’s users or staff)

    • We back up (continuously/hourly/nightly/daily/weekly)…
      • To a different server
      • To a geographically redundant storage service
    • We test restore these backups (weekly / monthly / quarterly)
  3. Code backups

  4. We use a service such as BitBucket or GitHub
  5. We have a copy on each developer’s computer
  6. We back up some other way (include how often): ____________________________

  7. Server configurations

    • We manually configure our servers.
    • We back up ALL of our server’s files, not just our content.
      • daily / weekly / monthly / quarterly
      • the backup is copied off site
    • We have documentation about our server’s custom settings
      • and we’re confident we could set up another one just the same.
    • (advanced) We use configuration management to make changes to our servers
  8. Email

    • Our email is hosted by a large, redundant email provider such as Google Apps, Outlook365, or Fastmail
    • We use an email backup relay such as DNSMadeEasy

      Downtime

  9. How much downtime can we realistically tolerate?

    • 99% uptime: 3.65 days per year / 7.20 hours per month
    • 99.9% uptime: 8.76 hours per year / 43.8 minutes per month
    • Even more strict than 99.9%: ___________________
  10. How will we know when the site’s down?

  11. Active monitoring / alerting
  12. PagerDuty
  13. Manually checking the site / notified by customers

  14. How will we let people know that there’s a problem with our site?

  15. Status page (status.oursite.com or an automatic page)
  16. Social media
  17. Email
  18. Phone

Discuss each of the downtime ranges from question #1 and what that might look like to the business. Can we survive for one business day if our email is down? How would our customers reach us? If the website is down, how do we provide value to our clients or customers?

A lot of businesses answer question #1 with, “we can’t have ANY downtime!” Keep in mind that adding a “nine” (going from 99% uptime to 99.9% for example) increases the ongoing operations cost exponentially.

If your website can’t go down, you will pay a lot of money to stop it from happening. If your site is providing real-time information for stock trading, critical healthcare systems, or air traffic control this is the wrong worksheet, sorry. ;)

Execution

  1. We have a clear plan going over the steps to be taken in case of a server failure.
    • The plan includes server setup instructions
    • The plan includes a description of how to restore from backup.
  2. After we have been down for _______ hours then we will execute the plan.
  3. We have an account ready with _____________ (a VPS / cloud provider) in case our primary company goes down.
  4. We are familiar with the contents of this plan.

    • We review and update it (monthly / quarterly / annually)
  5. We have marked on the calendar a date to review and test this plan in the not-too-distant future.


Conclusion

If you’ve made it this far, congratulations! You’re on the way to putting together a robust disaster recovery plan for your business. You should hopefully know how to fill in any missing gaps in your plan, so get to it!

Incidentally, if you have any questions about this checklist, or if you need help setting up disaster recovery for your business, please email me at fred@fredalger.net.