Dropbox restores full access, explains cause of downtime

Victor Barreiro Jr.

This is AI generated summarization, which may have errors. For context, always refer to the full article.

After a planned maintenance went awry, the cloud storage provider says no public data was at risk during the downtime

DROPBOX UP. Dropbox explains what went wrong and how they'll improve things moving forward. Screen shot from Dropbox

MANILA, Philippines – Following a recent service outage, cloud storage provider Dropbox provided a post-mortem, explaining what caused the downtime and the actions they will be taking to prevent a repeat of it.

Dropbox’s head of infrastructure, Akhil Gupta, went on the Dropbox Tech Blog to say that a planned maintenance went awry.

He said the maintenance was meant to upgrade the operating system (OS) on some of the machines they use for their databases, and an upgrade script “checks to make sure there is no active data on the machine before installing the new OS.”

A bug in the update script pushed a reinstallation command onto some machines that were active at the time. Despite each database having a redundancy system in place – basically a duplicated component that acts as a backup, and in this case is made up of a master machine and two slave machines for redundancy – some of the databases were affected, causing Dropbox to go down.

Gupta assures the public that files were not at risk, as the databases did not contain file data. Rather, they were used to provide specific features, like photo album sharing and camera uploads.

He also explained that basic services were restored within 3 hours by recovering from backups, but the size of some of the databases slowed the recovery process.

The recovery processes only completed at 4:40 pm PT of January 12 (January 13, Philippine time).

The way forward

Dropbox took additional measures to prevent another downtime of this sort from this happening.

Gupta wrote that an additional layer of checks was added, making machines verify their state before executing a command. The additional layer of checks will prevent machines running critical processes from executing commands that could cause them to break down.

Dropbox also built a tool that should speed up recovery for large backups, should another downtime occur. The company plans to make this new tool open source “so others can benefit from what we’ve learned.” – Rappler.com

Add a comment

Sort by

There are no comments yet. Add your comment to start the conversation.

Summarize this article with AI

How does this make you feel?

Loading
Download the Rappler App!
Person, Human, Sleeve

author

Victor Barreiro Jr.

Victor Barreiro Jr is part of Rappler's Central Desk. An avid patron of role-playing games and science fiction and fantasy shows, he also yearns to do good in the world, and hopes his work with Rappler helps to increase the good that's out there.