How our new backup system saved 24+ hours of downtime
Remember when we announced our new infrastructure in October last year? Part of the innovation, which we were particularly proud of, was our in-house created backup/restore system. A few days ago this system was put to its first critical real-life test and the results were impressive. We were able to restore 3 times more data, 7 times faster, compared to the previous such event when we were still using the old backup solution. Here is how we did it.
How often do we need massive backup restores?
The short answer is: very rarely. Having a highly redundant infrastructure with multiple SSDs in RAID almost eliminates the need of such restores. Normally, when an SSD fails, it is seamlessly replaced with a new piece of hardware without any noteworthy downtime or data loss. And disk failures are very common: for a provider of our size, it is normal to see such events on an almost daily basis. However, every now and then, a misfortunate coincidence of several hardware and software failures at once can make the standard hardware replacement impossible. And these are the times, when we need to restore all the accounts that were on the damaged instance from our backup copies.
Previously, before our new backup system.
The previous time we needed to make full backup restore of a whole shared hosting server was more than an year ago. Back then we were using R1Soft backup, which is among the most popular in our industry. Hosting providers like us use this software for two main reasons. First, it is quite reliable. We’ve almost never had any serious issues with missing and corrupt backups. And second, it is very lightweight and does not create significant load on the production servers while creating the backups (a resource-intensive process that takes place every day). With these two features R1Soft works perfectly in 99% of the time — when it creates the backups and when individual backup copies are needed.
However, in the rare occasions when a full restore of multiple accounts is necessary, R1Soft has one serious drawback — the recovery process is painfully slow and the affected sites can experience prolonged outage. In the event in question, all our affected accounts were down for 28 hours. It took this long for two reasons. First, R1Soft does not allow simultaneous restores from and to multiple locations. All the data needs to be recovered through one single network interface and this is slow. Another issue with R1Soft is that the recovery cannot be incremental and the server instance is down during the whole restore process. All affected sites can only come back online at the same time, after the whole information is transferred from the backup server to the production machine. Therefore, even the smallest website could not be brought back up until we have restored the full server.
Most shared hosting providers will hardly consider this story a serious problem that requires further actions once the restore is over. After all, only a single machine was affected and all customers got their websites back without data loss. The downtime of the sites was also almost negligible on an yearly basis: 28 hours are just 0,3% of the year. However, at SiteGround, we were quite unhappy with the duration of the issue and were determined to prevent this from repeating in the future.
And now, after our new backup system.
That’s how we set our minds on creating our own backup system to guarantee a faster restore process and our talented DevOps department started working on it. We launched the new solution in October 2015 but it wasn’t until just a few days ago that we had to use it in an event similar to the one described above. Compared to our then-used solution R1Soft, our own system makes distributed backups and allows simultaneous restores from multiple backup instances to multiple production servers. Thus, we now were able to recover 4TB of data (which was nearly three times more than the previous time), in just 4 hours, compared to the 28 hours from the story above. Moreover, our system allows incremental recovery and the first accounts were up just a few minutes after the issue was identified, with the longest downtime (about 4 hours) affecting only few individual sites. This brought down the average downtime for all affected accounts to less than 2 hours, compared to 28 hours from before. Quite an impressive improvement, isn’t it? But…
Can it get even faster?
Yes, it can! In our latest massive restore case, we actually were not able to use the Infiniband network connectivity between our backup servers and the production ones as planned in such cases. Thus the data was transferred through the standard network of 1 GBit/s, instead over the 10 Gbit/s Infiniband connection. This, we found, was due to a dormant hardware issue that we were able to discover only during an actual restore. However, we have already made sure that next time this will not be an issue, and thus will make the restore even faster.
Another thing is that with the new system we can theoretically restore on unlimited number of production instances simultaneously, but in practice we are limited, not by the backup system itself, but by the way our DNS system works at the moment. We had three instances affected by the issue and each of them had individual DNS. Thus we needed to restore to only three new instances using the old IPs, so that the domain names, which are not registered with us can continue to work as before and do not experience additional downtime, due to DNS propagation time. To avoid such limitation in the future we plan to work on a brand new central DNS and/or proxy system.
Our backup system story is just another example of how we approach problems. We are never satisfied to just fix the immediate issue and forget about it until the next time. We take each problem as a challenge that needs a unique solution. And if such a solution does not exist at that time, we never shy away from inventing it ourselves.
Comments ( 65 )
Thanks! Your comment will be held for moderation and will be shortly published, if it is related to this blog article. Comments for support inquiries or issues will not be published, if you have such please report it through
Silverblade
Good job, I'm proud to be your customer
kenny
NICE
Alain
Thanks. Happy to be your customer.
Gerald Marsh
Well Done! I was quite grateful for the daily backups a couple of weeks ago where a configuration problem with my Drupal installation meant that upgrading a few modules completely clobbered 4 websites. I decided that my knowledge is not up to diagnosing exactly what happened so selected to restore from the previous backup. The man-machine interface is intuitve and the process completed very quickly. I still have to work out how to overcome the config issue but at least the sites work again. Thank you very much.
Anthony Boyd
Best hosting ever
Carlos Amado
Congratulations!, Happy to be your reseller!
Randy Carson
Great story, and a great company. I recommend your company so often, people think I must work there. Well done.
Paul Westeneng
Great to hear this. I hope my sites will never need this, but if they do, I know I'm in good hands.
Rolf Kenmo - HumanGuide
Well, I don't understand all the technical stuff, but I am very pleased as a customer! Moreover, what I appreciate very much is that you tell us this. Such stories give credibility;-) Thanks!
Camille P
This is great communication! Many companies would shy away from admitting that real life applies to them too. But telling us the real story only builds confidence with us as your customers. Much appreciated and thank you for sharing!
Gail Warnaar
I could not be happier with the service and the confidence in your ability to recover me if I ever need it. I am with you because my former company left me really hanging when THEIR server crashed. When they finally did get me back up, the site had no resemblance to what they had built less than a year before, and no one seemed to have any idea what my site had been 10 months previous! I really should have sued them, but was, and still am, too busy trying to restore my business! I still look for an expert in VirtueMart who can help me!
Shane Poteete
We have been with you for several years, resell your services, and routinely recommend you to our customers. Your customer service and tech support has always been exceptional. It is great to know that you are also developing new systems and solutions to help improve that service even more! Thank you for making our work day less stressful, and a little easier!!!
Philip Wade
Excellent work!
Cari Adamek
You are the best web host I've ever used and my previous one was very good so that's saying a lot. You aren't the best because you don't make mistakes — you're the best because of how you handle your mistakes and customer problems. You could have done what every other web host does and said, "That's how it works. There's nothing we can do about it. Your site will be up in 28 hours. Take a chill pill." Instead, you felt your customers' pain and said, "How can we do it better?" And you did it! Awesome.
Beatrice Johnston
super! ... happy to be your customer!
Brian Mitchell
I love the approach. I love even more the transparency that goes with sharing your process with the world. Most companies pretend problems never happen so as not to undermine customer confidence. Any sensible person knows this for the fallacy that it is. I love companies that show the world the challenges that they face and how hard they work resolve them and avoid them in the future. I will echo the sentiment from above... I am a proud and happy client.
Paul
I just love this company - such a great pro-active and creative attitude - a fan for life! Keep up the great work sitegrounders!
Frank Smith
Very nicely done! I am in the middle of completing phase 1 of a municipal fiber optic network. I have been preaching the "3 R's" = Redundancy, Restoration, and Resiliency through our first phase. It is greatly to hear about SiteGround's commitment to keepings thing up and running with the ability to bounce back when the proverbial stuff hits the fan.
Craig Bass
THIS is just one reason I love SiteGround! The hosting company I previously used went down for long periods of time (several days in some instances) and all they could say was they had "every man on deck" working on the problem. PLEASE, SiteGround, NEVER SELL OUT TO ENDURANCE INTERNATIONAL!!!
Craig
Outstanding! I appreciate your transparency on this.
Bruce Wilson
Stories like this are why I'm such an enthusiastic customer and promote you service when ever we bid on a project. You are a big selling point for us when we write our proposals. I sleep better at night knowing my two VPS's with you will be working when I wake up. :-)
Paolo
Great Work!!! Keep it going...
Jansen Wendlandt
Thanks for sharing! How you respond to situations like this is a testament not just to how you respond to rare events, but to daily events as well. Thank you.
Stefan Warum
This really shows how dedicated you are in providing the best service for your customers. I've tried a couple of hosts, but none comes even near to the amazing service that SiteGround offers. You are the only one I can recommend wholeheartedly! I'm grateful that I've found you!
David Fraser
SiteGround is not a standstill company. They seem to be ahead of the curve so to speak. I'm really glad to be one of you customers. Keep up the outstanding work.
Kenneth Shea
Good Job SiteGround and good for us to hear! Have had numerous web host since my web presence started in 2000, SiteGround has been by far the very best, no comparison.
Ron Wilder
Waaaaay better than Lunarpages... My last hosting company. Was down for over a week and a half with multiple websites. You guys and gals are great! Thanks!
Ed Troxell Creative
I can't stop talking about you guys! You are awesome! Keep up the great work!
Trish
Congratulations well done
Kevin Hinchman
Way to be proactive! Good job.
Andrea Gallucci
Since we moved our customers sites with you, finally we found a competent and reliable hosting provider, either for running time, speed, and competency of your tech support. Go on like this. Congratulations. Andrea Gallucci, MD DIGITHAI Software Group.
Howard Kelley
SG never disappoints whether its performance or customer service. Everyday I am thankful that I have moved so of my domains to SG from one of the largest and supposedly top U.S. hosting firms.
David Hunt
To be static is to be going backwards. Glad to hear that your eyes are on the future and improving your customer's experience.
Keith
WOW. You guys take hosting to a level no one else can touch. Thank you for putting in the time and effort to make this possible.
Yogendra Rawat
Nice to hear that my site is in expert hands....I can focus on selling as always....awesome work guys...
Andrio Suroyo
Great Job! I have yet to have any problems with hosting on Siteground and my website loads perfectly every time I access / work on it. Thank you for your constant effort to get even better.
Karl Steinmann
Brilliant. Keep up the great work, and I will continue to spread the Siteground gospel. ;-)
Riccardo
Really impressive guys. Congratulations for that and thanks for supplying to us the best hosting solution out there. Riccardo R99photography.com
Gerson de Barros
The people already described how happy and secured they fill with your solution. What else I can say! Im really proud to be one of your new customer... keep it up. Keep your eyes in the future and with a visionary solutions that can help us all and protect against downtime and losses.
Ruslan
Sounds cool!
GJ
You guys rock..these pro active measurements keep us all happy.
Brian Wall
Excellent and the report just increases my confidence in you.
Jarold Villanueva
I'm a new customer of siteground and I'm very happy of what they are doing for their customers.... Keep it up Guy's.... :-)
Steve Squeo
Love it!
Ulf Tölle
absolutely great achievement, very inspiring!
Howard
Outstanding work. I switched to SiteGround two months ago and could not be happier. Thank you!!!
Scott Peterson
I have to say, you really can't appreciate a good web host until you've had a bad one. My previous web host provided little support and my Magento installation ran painfully slow. I was apprehensive about switching to Siteground initially because of my bad experience. Since I made the switch to Siteground support has been excellent, Magento runs at least 5 - 10 times faster than before, and I have experienced virtually no down time. My previous host was down several times (sometimes for days) during the year and a half or so that I had them. In other words, Keep up the good work!
FABIO A ESPARZA Z
Some days ago, (by mistake) I delete a important information of a web and I almost got crazy!!! Some hours later, I have the idea to ask Siteground for help.... In a few minutes, This problem was fixed and I have the info again... It it was really quickly... now is more quickly???? GUAUUUUUUU For this I love Siteground!!!!... Simple the best!
Bradd Graves
I may need to use this feature soon. I'll let you know how it goes!
Lawrence Lim
I've personally have tried to restore a few files recently and it is really seamless and fast. Impressive and proud to be your valued customer.
WordSuccor
This is great.. It is going to help us a lot.
Umesh
Excellent work. I have recommended SiteGround to quite a few clients and stories like this help reiterate their, and my confidence in SiteGround.
Al
I've been recommending SG since I moved all my sites to them and my recommendations are not because I am a customer, but also because I believe every business owner deserves a good home for their business website with a hosting company that not only provides the best hosting plans, hardware, and support, but also that thinks ahead by looking at various ways to improve their service, and making our life easier. Did I mention that I've been with previous hosting companies and SG is the only hosting company that goes above and beyond the call of duty to make sure everything is running smoothly, and provide assistance even when the issue is not their doing? Because of our review, comments and recommendation of SG, all over the web, we are constantly contacted by people asking us why we strongly recommend SG, and whether we are getting some huge kickbacks from SG, and we tell them that our reviews are honest, unbiased, and they can try them for themselves and find out how good SG (the whole team) is. Am I happy with SG? Oh YES!!!
Peter (Santa) Rodaughan
Your story does concern me. 4TB!!! As word gets out you will have to start thinking about 1PB or even 1EB. God help us if it even gets to 1ZB or even 1YB. Keep up the great work and you will get so much bigger. Well done everyone.
Dan
This sounds great. As customers, how do we access the backups to restore?
Daniel Kanchev Siteground Team
Hi, Dan and thanks for the great question :) On our shared hosting servers our clients may use the backup restore tool in cPanel to manage their backups and restore data. For more details check this tutorial: https://www.siteground.com/tutorials/cpanel/backup_restore.htm On our cloud servers, however, only our support team can restore a backup for you. We are in the process of implementing the same backup/restore tool for our cloud customers and once it is ready it will be installed on all cloud servers.
Stephen Goodenough
I was one of the customers who had a website on the affected equipment, and I think must have been near the end of the back-up process, but well done. I'm glad I wasn't at the end of a 28 hour process! I'm always impressed by SG.
Jaswinder Kaur
Glad to know the progress SG is doing and I am quite happy that I moved to SiteGround in January, 2016 to host my Blog Ease Bedding. Now I want my blog to be more secure with daily backups, which I don't have in StartUp Plan. The second problem I am facing is- CPU increase. Please let me know how I can solve this problem, where I can get daily backups and no worries about CPU increase. Thanks.
Marina Yordanova Siteground Team
Hello Jaswinder, thank you for the good words! While on the StartUp plan you could take advantage of daily backups by signing up for our Backup subscription service: https://ua.siteground.com/daily_backup.htm Alternatively you could upgrade to a GrowBig or GoGeek plan, which include by default Backup subscriptions and allow higher CPU executions, as well as give you much more additional features.
Mortiz
Have you ever thought of selling this solution to other hosting companies?
seo learning
Hi Lilyana Congratulations!, Happy to be your reseller!
AJ
I love that you guys share the nerdy stuff!
MS
Good Job ! Legendary support
Woon Fei Lai
Impressive with the backup system, but not sure why would I receive the email today about this article after a year it published
Hristo Pandjarov Siteground Team
The email was about the latest upgrade of the system that we have just applied on all our servers. Check it out, the new interface is way better :)
Start discussion
Thanks! Your comment will be held for moderation and will be shortly published, if it is related to this blog article. Comments for support inquiries or issues will not be published, if you have such please report it through