AOL Creates Fully Automated Data Center 123
miller60 writes with an except from a Data Center Knowledge article: "AOL has begun operations at a new data center that will be completely unmanned, with all monitoring and management being handled remotely. The new 'lights out' facility is part of a broader updating of AOL infrastructure that leverages virtualization and modular design to quickly deploy and manage server capacity. 'These changes have not been easy,' AOL's Mike Manos writes in a blog post about the new facility. 'It's always culturally tough being open to fundamentally changing business as usual.'"
Mike Manos's weblog post provides a look into AOL's internal infrastructure. It's easy to forget that AOL had to tackle scaling to tens of thousands of servers over a decade before the term Cloud was even coined.
So it will take ages for a fix (Score:1)
How long will it take for an engineer to get there to replace a card or server?
no security or maintenance? (Score:2)
Seem like it may take time for any one to come to the site for any thing vs have a few people on site to get to stuff quicker.
Re:no security or maintenance? (Score:4, Insightful)
The whole idea is not to need to get to stuff quicker at all.
If you are:
1) Completely virtualized.
2) Use power circuits that are monitored for load, on a battery back up, power conditioners, and diesel fuel generators for local utility backup.
3) Use management devices to control all your bare metal as if you are standing there, complete with USB connected storage per device that you can swap out the iso for.
4) Have redundancy in your virtualization setup that allows you to have high availability, live migration, automated backups, etc.
What you get is an infrastructure that allows you to route around failures and schedule hardware swap outs on your own timetable, which can be far more economical.
If you don't have that then it does involve costly emergency response at 2am to replace a bare metal server that went down. You either pay somebody you have retained locally to do it, or you are the one driving down to the datacenter at 2am to do the replacement yourself with who-the-heck-knows how long it will take with uptime monitoring solutions sending out emails like crazy to the rest of the admin staff, and heavens help you, some execs that demanded to be in the loop from now on due to an "incident".
Don't know about you..... but I would rather be able to relax at 10pm and have a few beers once awhile (to the point I can't drive) without worrying about bare metal servers going down all the time, or who is on call, etc.
Re: (Score:2)
About as much time as it takes on most datacenters that already are monitored remotely. With news like this some would think Nagios or Ganglia did not provide the admins with a web interface.
PS: They might want to, at least, man it with a security guard to sound the alarm in case of fire or robbery
Re:So it will take ages for a fix (Score:5, Interesting)
One of the major backbone providers has a lights-out data center not far from my work. I know a guy who has a hosting business there, and he's shown me around to the limits of his access. There is no one on-site from the company or its contractors--not even a security guard. They have biometrics plus PINs for access; it's laced with low-light/IR cameras (it wouldn't surprise me to learn they have microphones); it has motion detectors in case the cameras miss something; and the redundancy is incredible. They maintain contracts with local electricians, plumbers, and a few technical companies should a blade burn out. They manage the entire thing from a few states over, and as of a couple of years ago almost all of their data centers had been converted to run this way. Savings were good, something like a million dollars per DC per year even as unanticipated downtime decreased.
I looked at it and saw the future of IT. I wasn't sure if I was more impressed or scared.
Re: (Score:2)
It's more scary - every field of technology evolves that way.
Early valve computers used to require technicians to replace burnt out valves on a daily basis. Each morning of the day, the technicians would go round and replace any that had burnt out or were about to burn out. Now your PC has about 2 billion transistors or more (CPU +GPU), and not one will burn out.
100 years ago, it would take 25 minutes to make a long-distance call between San Francisco and New York due to all the operators involved. Now, it'
Re: (Score:1)
Re: (Score:3)
This isn't scary. This is things getting better.
It's scary if your job is manually maintaining servers.
Re: (Score:1)
I'm not so sure. While individual reliability has increased dramatically, the shear number of systems in use around the world has increased as well, probably along a similar rate. Will we eventually reach a point at which computer hardware simply does not fail without an external event (power surge, physical damage, etc)? Maybe. But I don't see that happening until performance plateaus.
Re: (Score:2)
Reliable will become nearly 100% if everything moves to solid state. How many electric motors are there in a laptop these days? Cooling fans, hard disk drives, CD-drives (auto-eject, play motor) - must be around five or six.
Re: (Score:2)
Those must be some fancy microphones to be of any use inside a DC...
Re: (Score:2)
It depends on the noise level of the DC. Where I work, microphones would be useless, but some of the computer rooms in other buildings are relatively quiet and we've used microphones on NetBotz devices when people have been in the room and we're monitoring what they're talking about while working. (It has sometimes saved a phone call when a configuration looked odd momentarily but they were doing it for a reason.)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:3)
The article states "failed equipment is addressed in a scheduled way using outsourced or vendor partners". They don't care if an individual server is down, they just move the workload elsewhere, and wait for a repair. So there actually will be people in their data center doing repairs, they just aren't AOL employees and aren't based in the data center. I could see making a decision that a longer wait time for repairs is justified by labor savings, but it isn't really obvious where those savings come from
Re: (Score:2)
Re: (Score:2)
No, but if a whole cabinet or row goes out because someone wasn't around to notice the funny smell or magic smoke coming out of the power equipment, or hear that ACU fan belt starting to come loose, you just might notice...
Re:So it will take ages for a fix (Score:4, Insightful)
If you have enough spare servers and you can easily get by with engineers only needing to go on site once a month or so, assuming you get your MTBF calculations right that is. There's a good white paper [google.com] by Google on how 200,000 hr MTBF hard drive failure rates equate to drive failures every few hours when you have a few 100k HDs.
Re: (Score:2)
How long will it take for an engineer to get there to replace a card or server?
Much less time than it'll take them to get a user.
Honestly, I was surprised by this article; I thought AOL had already folded.
Re: (Score:2)
You mean over a decade before you heard the term?
Cmon! HP was using the term Cloud five years before "America Online" existed in 1991.
Just because your expertise doesn't extend back before you got that first AOL floppy and went online to type "a/s/l?", it doesn't mean it didn't happen.
Manos, Hands of Fate? (Score:1)
Is now hands-off?
Re: (Score:2)
Uh.... (Score:1)
So they have a fully automated unmanned data center... For their fully unused unpopulated services?
WIN!
Re: (Score:3, Funny)
Re: (Score:2)
Wow .. how '2000'ish (Score:4, Informative)
Yawn
Re:Wow .. how '2000'ish (Score:4, Informative)
Never mind........
Re: (Score:2)
Re: (Score:2)
Eh, machines of that era required constant manual supervision, and uptime was measured in hours, not months or years. That doesn't negate the fact that many new tech fads are poor reimplementations of technology that died for very good reasons.
Re: (Score:3)
Re: (Score:3)
"somebody tried that a long time ago and it wasn't worth it" doesn't necessarily prove anything.
Unless there is some change in technology or technique, past failures are a good indicator of continued inability.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Eh, machines of that era required constant manual supervision, and uptime was measured in hours, not months or years.
I'm not sure what datacenter you were working in, but in general that is quite untrue.
Re: (Score:2)
Telephone exchanges in rural areas are like that. The only time a technician had to enter the premises was to clear out old equipment. There was enough spare capacity in the exchanges that the only work required was to open the local cabinets on the street and pair up a new telephone line.
I luv AOL! (Score:2)
Seriously. AOL keeps my relative's PC experience safe; which, generally, keeps them from bugging me for help. :-)
Who? (Score:4, Insightful)
Seriously though, most telcomm operations operate like this. Their switching centers are all fully automated and unmanned, and usually in the basement of some non descript building. This is nothing new.
Re: (Score:3)
Um, I wouldn't be comfortable my telcomm's switching centers in basements. These are moct commonly the first room to flood when the water comes, and telcomm, switches are everywhere their users are.
I see telcomm switches housed above ground, in plain, sometimes unmarked buildings. There's one a quarter mile from my house, and I drive by two others to go to work. If they have basements, I bet that's where they keep stuff that doesn't matter as much.
And the huge switch that used to work in my old hometown,
Re: (Score:2)
The building I am in hosts one such setup in the basement. It never floods at my location.
Re: (Score:1)
It was to be staffed... (Score:2)
.. but there last geek quite, so now the data center must fend for itself.
Re: (Score:1)
Spelling. You fail it.
Re: (Score:3)
You do realize that this story is about AOL, correct spelling would simply be out of plase.
What (Score:4, Funny)
Re: (Score:1)
Instead of $15/hour techs working for AOL doing regular maintenance they've switched to outside contractors billing at $100-200/hr when the shit hits the fan. I don't think this idea is going to work very well.
Re: (Score:2)
The contractors warranty their work :) Sometimes makes all the difference, the $15/h tech is just miserable usually.
Re:What (Score:5, Informative)
How often does shit hit the fan in that sort of environment ?
As a hybrid techie who does a lot of hardware work, I would much rather go in once a month, fix a batch of issues in one visit, collect my fat cheque and go back to the pub, than spend 40+ hours a week playing Bejeweled, waiting for stuff to break.
I would expect AOL's strategy to greatly reduce costs, because that $15/hr rack monkey costs a lot more than $15/hr in the end. They have benefits, you have to "manage" them, they need human comforts like bathrooms, cleaning, seating, heating/air, lunch room. From an efficiency standpoint, the contractor route is more efficient in both money and time.
Re: (Score:2)
Depends, how confident are you that every eventuality has been planned for and provided for by the system? A significant outage can easily eat up an entire years worth of $15 an hour salaries if you hit an unforeseen condition which causes the whole data center to go down. Sure it's unlikely if the people doing the planning know what they're doing, but I'm sure that the folks in the WTC weren't expecting their records to be destroyed by a terrorist attack taking the entire building down.
Re: (Score:1)
Depends, how confident are you that every eventuality has been planned for and provided for by the system? A significant outage can easily eat up an entire years worth of $15 an hour salaries if you hit an unforeseen condition which causes the whole data center to go down. Sure it's unlikely if the people doing the planning know what they're doing, but I'm sure that the folks in the WTC weren't expecting their records to be destroyed by a terrorist attack taking the entire building down.
Of course any number of $15/h techs in the WTC wouldn't have helped them with this problem anyway.
Re: (Score:2)
So, how do you suggest one should plan against supposed terrorists razing the whole building ?
More to the point: how is the $15 lackey going to make a difference in that scenario ? If nothing else, NOT having the lackey there saves the company from paying out death benefits :D
They're on the Time Warner life support (Score:1)
...and Daddy Warbucks got some dough - in a manner of speaking, as it were, etc und so weiter.
What is AOL again. ..? (Score:2)
I'm from Europe. What is AOL again? And what is its/their significance in 2011/2012 anyway?
- Jesper
Re:What is AOL again. ..? (Score:4, Funny)
I thought everyone knew... AOL is the Internet.
Re: (Score:3)
They suck. They just suck differently now. They've switched from being an ISP to being a content company (and most of their content creators seems rather disgruntled). Mostly US-based, but most slashdotters should recognize names like TechCrunch or primarily HuffPo...the rest, not so much.
Re: (Score:1)
AOL is a service which was providing you with free CDs for decorative purposes. It was, however, a bad idea to put them into you computer's CD drive.
And yes, they also operated in Europe.
Re: (Score:2)
They are formerly America's premier manufacturer and distributor of coasters.
Re: (Score:1)
They have decent TV program listings
In other news (Score:3)
Huh? (Score:2)
I apologize in advance, (Score:2)
But I can't resist.
...In Soviet Russia, remote hands are YOURS!
Pretty easy. (Score:2)
Works very well. (Score:3)
Two points. (Score:4, Insightful)
One - If there is redundancy and virtualization, AOL can certainly keep services running while a tech goes in, maybe once a week, and swaps out the failed blades that have already beeen remotely disabled and their usual services relocated. this is not a problem. Our outfit here has a lights-out facility that sees a tech maybe every few weeks, and other than that a janitor keeps the dust bunnies at bay and makes sure the locks work daily. And yes, they've asked him to flip power switches and tell them what color the lights were. He's gotten used to this. that center doesn't have state-of-the-art stuff in it, either.
Two - Didn't AOL run on a mainframe (or more than one) in the 90s? It predated anything useful, even the Web I think. Netscape was being launched in 1998, Berners-Lee was making a NeXT browser in 1990, and AOL for Windows existed in 1991. Mosaic and Lynx were out in 1993. AOL sure didn't need any PC infrastructure, it predated even Trumpet Winsock, I think, and Linux. I don't think I could have surfed the Web in 1991 with a Windows machine, but I could use AOL.
Re: (Score:2)
Netscape was founded in 1994. http://en.wikipedia.org/wiki/Netscape [wikipedia.org]
Re: (Score:2)
I was thinking of the browser, not the company.
Re: (Score:2)
Netscape didn't come out in 1998. Netscape Navigator 3 was out in 1997 for instance http://sillydog.org/narchive/full123.php [sillydog.org]
I was using Netscape Navigator 2.x with AOL in 1996. I remember because it was a big deal that AOL finally got 32bit winsock support for windows 95. Netscape was definitely out in 1995 as well. I remember "best vieweed with netscape" buttons on websites when I first got on AOL in 1995.
Are you talking about a specific browser version? Like Netscape Communicator 4.0 ?
Both Internet E
Re: (Score:2)
I'm thinking of Mosaic browsers. Before Windows 3.11.
how does redundancy help you when the main power (Score:2)
how does redundancy help you when the main power switch goes down / on fire and there is no one there. Let's see firemen make a big mess and no is there to start the rebuild or it may just do a safe shutdown just to send some out just to find out you need to call in this other guy to fix the switch or generator.
Re: (Score:1)
how does redundancy help you when the main power switch goes down / on fire and there is no one there
If you are a big enough operation, you have redundancy at the data center level. i.e. you can lose an entire data center and have no loss of service on your production applications. Other than a possible speed/performance degradation, your average customer has no knowledge that anything bad has happened.
that's what geographic redundancy is for (Score:2)
This is why you have a duplicate data center in another city that is kept in standby and is just sitting there ready to take over. (Actually, you normally have a mix of services active at either location.)
The company I work for makes telecom equipment, and supporting geo redundancy is a fairly key requirement for some major customers.
Re: (Score:2)
The web was around, and in-force MUCH earlier than you would imagine. Windows 98 had Internet Explorer version 4 inextricably linked to the OS. Not version 1, but version 4. Internet Explorer was concieved as a weapon against Netscape, so there's no way IEv4 predated Netscape...
And before the WWW, the internet was quite useful. Newsgr
Re: (Score:2)
No, you couldn't because NOBODY had Windows in 91.
What in the world are you talking about?
Re: (Score:2)
No, you couldn't because NOBODY had Windows in 91.
'91 was when Win 3.1 came out, and that was when it was becoming obvious that Win really was evolving to becoming a full-time OS. (It wasn't there yet at the time, oh boy it wasn't there, but it was clear that was the way things were going.) Surfing the web at that time (well, info services like gopher) required third-party software, but it definitely existed. I remember using it.
Re: (Score:2)
Nope, March of '92. Others have pointed out that Windows 3.0 was out at that time, but I still maintain practically nobody was running it. In 91 it was very much a DOS world.
Re: (Score:3)
AOL initially ran on a network of Stratus fault-tolerant minicomputers, each running two to eight 680x0 CPUs. Later we added unix boxen, some beefy SGIs and HPs for servers, and Suns for front-end telco interfacing IIRC. By the mid-90s we grew a Tandem fault-tolerant cluster for our critical databases; it did hot component failover, multimaster replication, all
the stuff that's common today, but
with SQL down in the drive controller for blazing speeds. We didn't really
start moving to a PC-based architecture
Re: (Score:2)
Wow. I will never post from an iPhone again...
Re: (Score:2)
Wow. We're still two years from decomissioning our Stratus servers. We're still 6 months from decom of SNA. I gotta talk to the other team about stepping it up.
Re: (Score:2)
Are you running VOS or FTX? I don't know about FTX, but if you're running VOS, and you're (at least) two years out, I highly recommend upgrading to the V-Series. Stuff that used to compile overnight now takes seconds; we stopped building an inverted index of our source code because "display *.pl1 -match x" was instant. More on the port:
http://newsgroups.derkeiler.com/Archive/Comp/comp.sys.stratus/2007-11/msg00005.html [derkeiler.com]
Re: (Score:2)
We're killing all of them. They don't fit into the new software models, and are actually 3 years overdue for decommissioning. We have no redundancy on 75% of them, and their replacements are already online and on production. It's our users who are holding this up, some have put off their work for 5-6 years now, and we don't have the power to compel them to do it. Yet.
Good while they worked, still there, but doomed. They mostly do file transforms and routing, much better on the RHEL system replacing them
Re: (Score:2)
AOL was already famous for being a good source of free floppies in the early 90s, and a search on wikipedia confirms they were renamed to AOL and expanded in '89.
They were doing graphical forums in '86, almost 10 years before Netscape.
AOL Needs a Data Center? (Score:3)
Oh yeah, to house all the dial-up modems...
Amazing! (Score:2)
really? (Score:2)
AOWho?
Me too (Score:1)
n/t /obligatory
Datacenter in a box (Score:2)
At least that way they won't need "heroic support"
Wow, is AOL still around? (Score:3)
Re: (Score:1)
They still serve email. My boss (and much of his family) still use (and pay) for AOL even though they have broadband and AOL provides them with nothing but an email address as far as I can figure. It's apparently hilariously bad, as he's always talking about how the website doesn't work much of the time, and the connection simply times out. I think they also distribute some software that goes with "AOL" but I have no idea what it does. I hear it still crashes a lot though.
Re: (Score:2)
As an Operator currently working inside a DC... (Score:2)
And they named it.... (Score:1)
....wait for it .... Smynet! (Someone typoed)
Hope they don't have rats (Score:2)
To start chewing through wires, causing power outages, starting fires, pooping in the mailbox, that kind of stuff.
One of the early search engines did this. (Score:2)
One of the early search engines, I think Infoseek, worked this way. Machines were installed in blocks of 100 (this was before 1U servers) and never replaced individually. Failed machines were powered off remotely. When some fraction of the block had failed, about 20%, the whole cluster was replaced.
There's a lot to be said for this. You have less maintenance-induced failure. Operating costs are low.
Grid (Score:2)
...over a decade before the term Cloud was even coined.
You mean back when it was called 'grid'?
Not much to see here (Score:1)
What they did:
* Modularize/Standardize Infrastructure, e.g. storage & computing power
* Build provisioning systems
* Virtualize everything
When they say that they are flexible, they mean that they have a lot of dark hardware lying around.
Re: (Score:1)
Really? I'm a bit of a hybrid in terms of tasks, but I've gotten...
1. a lot more offers for admin positions (might have more to do w/ my presentation though)
2. better salary offers on coding positions
Thinking of just me, it seems to be better to stay w the code, especially web development, those are always in demand.
I'd take being a part of an admin team over a coding team anyway though, prolly need more experience before I start getting offered those w/o actively seeking them and getting no reply :)
Re: (Score:3, Insightful)
The software still needs to be written. The programs still need to be run somewhere.
Technically not much has changed. The "Cloud" is still made up of servers that have to be administered. The main effect is that the IT and network admins will have to keep up with technology, especially the new virtualization layers between the hardware and the running application. But keeping up to date has always been a part of working in IT.
Re: (Score:2)