Posts Tagged ‘statistics’

Dropbox Download Statistics/Graph

Tuesday, July 12th, 2011

Recently I added one of my machines to my Dropbox account. I installed the CLI version on an Ubuntu machine (using this script). While doing so, I noticed I could get statistics about the process as it was happening via the command `./dropbox.py status’, and naturally decided I should gather them and make pretty graphs to appease my curiosity. (Note: I’m particularly curious because I’ve been working on my own Dropbox clone as of late – Asink)

So, I started a fresh download of my Dropbox folder (containing 10691 files weighing in at 616 MB). The complete download took me right at an hour and a half (5500 seconds) to complete. During this time period, I polled dropbox’s `status` command once a second until the download was done. Each second I gathered the total number of files Dropbox reported it still had to download, the download speed it estimated, and the amount of time it estimated it would take to complete the download. I might add that part of what piqued my interest in this was that for a while Dropbox was telling me it would take nearly 100 days to completely download a mere 616 MB of files.

And so, without further adieu, the graph. I apologize for a) the somewhat awkward units – I wanted to overlay all the information on one graph so I normalized the units to make that work, and b) for the gap in the middle of the data – my statistic-gathering script decided to die on me and I didn’t notice for a few minutes:

Dropbox Download Statistics

Dropbox Download Statistics

Several things struck me as interesting about this graph. First is the fact that the download speed appears to begin high, drop off, and then pick up drastically near the end. I find this mildly confusing, and I think it could be caused by one of two things: a) Dropbox throttling download speeds when it detects you’re downloading a lot (hence the faster speeds near the beginning and end) or b) the speed is miscalculated by the Dropbox daemon, and is a bit off as you approach either of the extremes. Some may claim that this is my ISP throttling traffic, which *would* explain the dropoff after the initial burst, but fails to explain the drastic pickup at the end.

Second, Dropbox’s time-to-completion estimation is about as horrid as that of progress bars in Windows. Even though the number of files remaining to be downloaded is more-or-less linear, the time remaining estimation varies everywhere from 100 days (it caps out at 100 days *24*4 = 9600 quarter-hours at the beginning and again at ~2000 seconds). This estimation is also surprisingly bad given the relatively constant estimated download speed. One would think that you could come up with some relatively-easy calculation which would account for small variations in connection speed, and arrive at a much more accurate estimation than what they have.

Well, there you have it: a highly-unscientific and mildly interesting graph of my Dropbox downloads.