Google Nearline Delivers Some Serious Competition To Amazon Glacier 71
SpzToid writes Google is offering a new kind of data storage service – and revealing its cloud computing strategy against Amazon Web Services and Microsoft Azure. The company said on Wednesday that it would offer a service called Nearline, for non-essential data. Like an AWS product called Glacier, this storage costs just a penny a month per gigabyte. Microsoft's cheapest listed online storage is about 2.4 cents a gigabyte. While Glacier storage has a retrieval time of several hours, Google said Nearline data will be available in about three seconds. From the announcement: "Today, we're excited to introduce Google Cloud Storage Nearline, a simple, low-cost, fast-response storage service with quick data backup, retrieval and access. Many of you operate a tiered data storage and archival process, in which data moves from expensive online storage to offline cold storage. We know the value of having access to all of your data on demand, so Nearline enables you to easily backup and store limitless amounts of data at a very low cost and access it at any time in a matter of seconds."
It's the webframe, Captain! (Score:2)
It's the webframe, Captain! She kenna take any more data...
Re: (Score:2, Insightful)
And solved for 30 years? Really? I don't recall having a backup service like this 30 years ago with such uptime, and certainly not in my own home.
Re:Oh yeah, this is just what I want (Score:4, Informative)
TFS just has marketing (Score:1)
How do they do it?
Re: (Score:2, Interesting)
Yeah I'd like some more meat to the story as well. Amazon Glacier achieves its pricing by using low-RPM consumer drives plugged into some sort of high-density backplanes; supposedly they are so densely packed that you can only spin up a few drives at once due to power and heat issues. Hence the delay.
I assume Google is doing something similar, maybe with somewhat better power or cooling since they're offering faster retrieval times which implies that perhaps they can spin up a higher percentage of drives
Re: (Score:1)
I thought glacier is tape storage system.
Re: (Score:1)
They marketed glacier with the pretense that it was tape storage, but it's actually idle S3 storage.
Re:TFS just has marketing (Score:4, Funny)
I thought glacier is tape storage system.
Mercury delay lines [wikipedia.org].
Re: (Score:2, Interesting)
https://what-if.xkcd.com/63/ [xkcd.com]
It's common knowledge that Google has already been using consumer-grade drives for all of their servers. Because if a drive fails, "so what, we have another one over there holding the data..."
This is pretty much similar to what happened with GMail. They came out and said, "here, have a gigabyte for free!" and everyone was like, "yeah, right..."
Google has storage leaking out of its ears, and generates massive amounts of new data every day... sticking other people's data into the p
Re: TFS just has marketing (Score:2, Informative)
This service uses Google normal hard drive architecture, but makes use of the fact that most of their drives have free capacity in terms of Gigabytes, but are running out of IO bandwidth (ie. There are so many users trying to read and write the data that if they filled the drives to capacity, not everyone would get good read and write performance).
This product is basically filling the drive to capacity, but giving you the lowest priority for reads and writes. Hence why it takes 3 seconds to read dat
Re: (Score:1)
Re: (Score:3)
How do they do it?
They use idle HDDs. The three seconds is the time it takes for them to spin up. Google pays less than $30/TB for HDDs ($120 for a 4TB HDD). If they charge $0.01/GB/Month that is $120/TB/year. If a drive lasts three years, they make $360/TB off a $30/TB investment.
Re: (Score:1, Interesting)
FWIW, there's never been any studies or hard data backing up the myth that HDDs spinning up/down have reliability concerns - this all came about as it's a S.M.A.R.T. metric (which also, is an estimate, and they also fail to provide citations or proof of it's validity).
It's very possible they simply are letting the HDDs idle and spinning them up as needed, I would expect Google of all people to have some pretty decent metrics on HDD failure rates/reasons.
They've been researching this stuff for over half a de
Re: (Score:3)
I don't think they are spun down. That would kill their reliability. The 3 seconds is more likely to be network delays if the data are scattered to far flung locations.
Keeping them spun up for rarely accessed data doesn't make a lot of sense to me. This is the sort of data you're probably not constantly accessing, like backups or other data archives. Keeping the hard drives spinning would also increase their power draw and heat.
Besides which, I don't know of any network that would cause three second lag. You can typically send data halfway around the world in under a few hundred milliseconds. So, I'm not sure what else that delay would be.
Re: (Score:1)
Re: (Score:2)
I don't think they are spun down. That would kill their reliability.
Would it? Would it really? I've been spinning down rotating discs since I discovered hd-idle, and haven't lost one yet. The only ones I've lost in that time have been discs I've been keeping running all the time. Why would spinning up a disc cause it to wear unduly? Auto-spindown is the very best feature of MyBooks and GoFlexes, I tell you what. It gets so nice and quiet when both my external discs spin down (I have a backup volume for each) now that all my system volumes are on SSD.
Re: (Score:2)
I don't think they are spun down. That would kill their reliability.
There is no reason to believe that idling a drive shortens its life. Reliability studies from Google [google.com] and Backblaze [pcworld.com] found no failure increase with spin up/down cycles. Total spinning time is a much bigger predictor of failure, so spinning down when not in use likely extends the life.
Biggest factors affecting reliability:
1. Manufacturer: Hitachi is best, Seagate is worst. WD is in the middle.
2. Total spinning time
3. Temperature: Hotter is better (the opposite of what most people believe).
what about redundancy? (Score:3)
In terms of the prediction of "$360/TB off a $30/TB investment", does that take into account redundancy to protect their liability for drive failure? I'm thinking they have at least two copies of everything a customer uploads. Maybe three. It's still great money, but I think the numbers are more like $360/TB off a $60/TB investment.
Re: (Score:2)
Actually, not that simple. Neither have egress costs if you use their VMs - it's only going to the internet. Amazon Glacier to Internet is free for the first 1GB, $.09/gb for the first 10tb, $.085/gb until 50tb (at between 10-50tb Nearline is cheaper), then $.07 until 150tb.
Both charge 1 cent/gb for reads, though AWS is free for the first 5%.
I don't get the pricing? (Score:5, Interesting)
Re: (Score:1)
Don't ask me, I don't even know if it means storage or bandwidth usage. Don't both need to be mentioned?
Re: (Score:3)
Re: (Score:3, Interesting)
A penny a month per gigabyte... that's $10/month per terabyte... that is already what Dropbox charges for "fast" storage. So what gives? Why would I pay $10/month for a terabyte of slow storage when I can get the same amount of storage for the same price in a regular, fast format with Dropbox?
Why pay for a terabyte of storage when you are not using it to capacity?
Re: (Score:2)
What if you had more than just 1 Tb? If you had more than 1tb, how is Dropbox going to help you at all? Oh right, now you must purchase DropBox for Business, and your price just went way up. https://www.dropbox.com/busine... [dropbox.com]
Re: (Score:2)
Heck, for less than $10 a month you get infinite OneDrive storage.
Re:I don't get the pricing? (Score:4, Informative)
Interesting point, so I read up a bit. This only applies to Office365 customers. What about Linux, (etc.) users that can't fully utilize Office365? This really seems almost like a consumer option, and there are certainly business use-cases where this just ain't gonna fly. There's a 20,000 file limit, *period*, and the maximum file size is 10Gb, which is limiting for some, (especially those folks who roll their own encryption and compression).
For those reasons, Microsoft Office365/OneDrive doesn't seem like a serious competitor to Google Nearline, Amazon Glacier, or Microsoft Azure services.
http://www.techrepublic.com/ar... [techrepublic.com]
Re:I don't get the pricing? (Score:4, Interesting)
A penny a month per gigabyte... that's $10/month per terabyte... that is already what Dropbox charges for "fast" storage. So what gives? Why would I pay $10/month for a terabyte of slow storage when I can get the same amount of storage for the same price in a regular, fast format with Dropbox?
Here is an answer [quora.com] from someone on Quora.
Dropbox offers no Service Level Agreement. Actually they specifically provide no warrantees whatsoever about their service (http://www.dropbox.com/terms) [dropbox.com]. This is a non-starter for many CIOs.
Beyond that, the fact that Dropbox doesn't "own" the underlying cloud storage architecture -- Amazon S3 -- could be an issue, although they advertise it as secure via in-transit and on-disk encryption (https://www.dropbox.com/help/27) [dropbox.com].
If it still is the case that Dropbox uses S3 itself, then that wouldn't make business sense for them to pay more for storage than they're charging their own customers (even if they've decided not to offer a Service Level Agreement).
So my guess is that this has to do with the way they count the storage for customers. Assuming that their customers do not encrypt their data before they place it on DropBox (which would make sense because DropBox customers are rarely CIOs themselves), then DropBox is most likely hashing the content and only storing a single copy of a file even if there are thousand virtual instances of that same file throughout their system.
Also note that in the special case where a company is footing the bill and DropBox can't count the same file multiple times within that same company, otherwise the customer company would complain, then DropBox actually advertises a rate of [dropbox.com] $15 per 5 terabytes per month per user (with no Service Level Agreement of any kind even for business users [dropbox.com]).
Re: (Score:2)
, then DropBox is most likely hashing the content and only storing a single copy of a file even if there are thousand virtual instances of that same file throughout their system.
I'm pretty sure they have said as much themselves.
Re: (Score:3)
Assuming that their customers do not encrypt their data before they place it on DropBox (which would make sense because DropBox customers are rarely CIOs themselves), then DropBox is most likely hashing the content and only storing a single copy of a file even if there are thousand virtual instances of that same file throughout their system.
Wouldn't it make more sense for them to dedupe on some kind of variable block size level than the file level? If someone uploads v1 of some 25 meg powerpoint file and
Re: (Score:2)
One reason I'm about to start using Amazon Glacier for personal backups is specifically because you can't delete files. I want to put up all of my family photos and videos, and know that they will be there even if my kid installs ransomware, our house gets robbed and burns down, and I'm in a coma for six months and can't deal with trying to retrieve deleted files (along with determining the real ones vs ransom ones) in a timely manner from Dropbox or Crashplan.
You can absolutely delete files in Amazon Glacier if the access key you're using has that permission enabled. I imagine there's a surprising number of people who use their AWS root account credentials to access Glacier even though this is strongly discouraged. Even if one creates a new IAM user with access only to Glacier (so a bad guy who compromised your computer can't spin up EC2 instances), the default is for all permissions to be enabled.
Of course, you can disable the permissions to delete files: I've
Re: (Score:2)
You might be interested to know about git-annex then. Here's some links to help you
http://git-annex.branchable.co... [branchable.com]
Specifically for Glacier:
http://git-annex.branchable.co... [branchable.com]
Re: (Score:1)
Seems expensive (Score:2)
1 cent per month per GB is $40 for 4TB per month, or 1/5 of what an external 4TB USB3.0 disk costs. As this is "nonessential" data, backup is optional. Sure, the external disk somehow needs to be connected to your server, and there are other factors, but doing this yourself seems to be a lot cheaper.
Yes, I know if you do it yourself, there is cost for the person doing it as well, but you need to manage the cloud-storage also, and over a worse interface and you get less control in the cloud and cannot put an
Re:Seems expensive (Score:5, Informative)
Oh? Have you factored in the cost of ensuring that you always have an offsite and fully up to date copy, not to mention secondary and tertiary copies for transit time in case your primary datacenter/server happens to kick the bucket/get stolen/evaporate?
It's easy to compare the cost of an offered service to what you can pick up seeming similar equipment for from Amazon or Newegg... the realities though are far more complex.
There are ways to manage even that, see this brief bit of Wikipedia [wikipedia.org] for a start.
I don't know if there are any other commercial or enterprise products out there that do it, but I know this one [microsoft.com] stores all of it's data in the cloud (with a local cache) but does all of the encryption on site. Only if you choose does the encryption key leave your site and then only in a way you choose making it rather problematic for a TLA or Microsoft to get to your data.
It is an interesting world when you are dealing with data you cannot legally delete for a period of time and simply want to rid yourself of the burden of having to store it locally. Suing Google or Amazon because their cold storage failed is a far better option than having your IT guy tell you that the HD they stored the crucial data to doesn't spin up anymore... and that the backup disk ended up in the secretary desktop.
Re:Seems expensive (Score:4, Interesting)
> Oh? Have you factored in the cost of ensuring that you always have an offsite and fully up to date copy, not to mention secondary and tertiary copies for transit time in case your primary datacenter/server happens to kick the bucket/get stolen/evaporate?
Assumption: They guarantee that your backups/archives are safe.
Reality: "You are responsible for properly configuring and using the Service Offerings and taking your own steps to maintain appropriate security, protection and backup of Your Content, " Notice the words "and backup". If they lose your data it's your problem, not theirs. http://aws.amazon.com/agreemen... [amazon.com]
> It's easy to compare the cost of an offered service to what you can pick up seeming similar equipment for from Amazon or Newegg... the realities though are far more complex.
Not to those who are 'skilled in the art'. For example. a copy of CrashPlan, two 3TB drives locally, one 3TB drive at a parent/friends house. For the paranoid, two 3TB drives at two peoples houses. Assumption: network bandwidth is sufficient and/or not much data change rate and/or happy to shuttle drives backward and forward.
Or, if you don't want to use crashplan, use rsync or other such replication technique. Set up md5sum scanning to run every few weeks at each location, takes a day or so to run and you're 100% certain that bitrot hasn't set in.
Advantages:
* I can touch each physical box.
* It's massively cheaper.
* Recovery is much quicker since I can just grab the physical copy.
* I know how the backup infrastructure is designed. If something goes wrong it's my fault, I can't rail uselessly against the sky gods if suddenly all my data goes away.
Disadvantages:
* You have to maintain it. You can't trust the sky gods to maintain it for you - a drive fails, you have to buy&replace. Forget to configure something/validate something is done correctly then it's your own fault.
Re: (Score:2)
Spideroak. Encryption client side, reasonable SLA. Not the cheapest by far but you get what you pay for.
Re: (Score:2)
Spideroak. Still closed source five years after they said that they would open it. Never independently audited. All expectations of security and privacy are derived solely from marketing claims. Even the "zero knowledge" claim is more marketing speak than truth. Caveat emptor.
Closed source all-in-one crypto and cloud storage is almost never the right answer.
Re: (Score:1)
There are ways to manage even that, see this brief bit of Wikipedia [wikipedia.org] for a start.
That only works if you do not process the data in the same cloud. But then the price goes through the roof as you have to pay transfer fees and the offer becomes very expensive.
Backup software? (Score:2)
Is there already some personal backup software for GNU/Linux that encrypts all data and can use this as storage?
I'm looking for large offline storage but strong client-side encryption is a must.
Re: (Score:3)
git-annex and Amazon glacier might serve you well. Encrypting your GIT/Glacier archive using your PGP key is a one-click-and-save option. With Google's recent announcement of Nearline I imagine over time it will be supported also. GIT annex came about through a kick-starter campaign, and you're welcome to support the project.
Here's some links to help you:
http://git-annex.branchable.co... [branchable.com]
Specifically for Glacier:
http://git-annex.branchable.co... [branchable.com]
Re: (Score:2)
I recently 'discovered' duplicity - it's very good for this sort of thing, but it can't use this or Glacier as a store. I can use S3 though, which you can use as staging for Glacier.
Personally, I use Duplicity to backup my NAS to another disk. I then have a script that copies full backups up to Glacier (and then deletes them). I'm working on a nicer glacier client for this, but the java one I downloaded from github works well enough to get going.
Keep up the good work Google (Score:1)
Compared to tape (Score:2)