Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage Power IT Hardware

1 In 3 Data Center Servers Is a Zombie 107

dcblogs writes with these snippets from a ComputerWorld story about a study that says nearly a third of all data-center servers are are comatose ("using energy but delivering no useful information"). What's remarkable is this percentage hasn't changed since 2008, when a separate study showed the same thing. ... A server is considered comatose if it hasn't done anything for at least six months. The high number of such servers "is a massive indictment of how data centers are managed and operated," said Jonathan Koomey, a research fellow at Stanford University, who has done data center energy research for the U.S. Environmental Protection Agency. "It's not a technical issue as much as a management issue."
This discussion has been archived. No new comments can be posted.

1 In 3 Data Center Servers Is a Zombie

Comments Filter:
  • Money (Score:5, Insightful)

    by 14erCleaner ( 745600 ) <FourteenerCleaner@yahoo.com> on Sunday June 21, 2015 @09:24AM (#49956151) Homepage Journal
    It's not a management issue, either - it's money. People cost more than dead servers.
    • Re: (Score:1, Insightful)

      by ColdWetDog ( 752185 )

      Money (or lack of it) IS a management issue....

      But how hard is it to automate a process that says, in effect, "if no data is going in or out of this server, shut it down"? I suspect that there is a more nefarious purpose here and I propose a corollary to Hanlon's (Heinlein's) Razor:

      This is the 21st Century - "You have attributed conditions to villainy that simply result from villainy". Incompetence is for the proletariat - we're the NSA. You're toast.

      • Re:Money (Score:5, Insightful)

        by gstoddart ( 321705 ) on Sunday June 21, 2015 @10:40AM (#49956487) Homepage

        But how hard is it to automate a process that says, in effect, "if no data is going in or out of this server, shut it down"?

        Why should the data center even care.

        Most of them are essentially charging rent ... as long as the customer keeps paying, WTF do they care if you actually use them for anything?

        This isn't incompetence on behalf of the data centers. Maybe companies who have machines they've lost track of what they're for.

      • Money (or lack of it) IS a management issue....

        But how hard is it to automate a process that says, in effect, "if no data is going in or out of this server, shut it down"? I suspect that there is a more nefarious purpose here and I propose a corollary to Hanlon's (Heinlein's) Razor:

        This is the 21st Century - "You have attributed conditions to villainy that simply result from villainy". Incompetence is for the proletariat - we're the NSA. You're toast.

        If a customer is paying for it to be there and be kept turned on *maybe* that customer has some use for the server oh I don't know maybe its a hot spare in case another server in another data center goes down? So you turn it off, their other server goes down, their service can't fail over and now your customer has a problem.

    • Comment removed based on user account deletion
      • Where I work, electricity is 0,25ct/kWh and a specialist in IT or law costs 1.000,- EUR/d or more.

        Assuming a server we're planning to shut down is rather old, they usually are, so it will probably fail on its own within 3 years, if not much sooner. It is not doing much anymore, so it's sitting at idle, drawing only idle loads. Assuming the idle load of an old server is 100W, how much specialist's time can we allocate to shutting it down?

        100W * 8760 h/y * 5y = 2.628 kWh. This will cost us about 657,- EUR or

        • by jabuzz ( 182671 )

          You failed to account for the system admin time to keep the server patched and secure. Also you assume that everyone is renting rack space and it is infinite in supply.

          These constraints mean that in my experience when a box is no longer doing anything useful it gets issued with a shutdown command to save the power. At this point if it really is required and a user somewhere starts shouting I can power it back up in a couple of minutes.

          Then generally six to 12 months later it gets removed from the rack becau

          • by Skapare ( 16644 )

            You failed to account for the system admin time to keep the server patched and secure. Also you assume that everyone is renting rack space and it is infinite in supply.

            either that or it's running Linux with zero licensing costs to stir management.

          • by jbolden ( 176878 )

            At this point for almost all companies good quality colo space is infinite. Most times a company isn't even using a meaningful fraction of their colo's space and so they could double or triple instantly without hassle much less an extra 33%. And even if their colo doesn't other's direct connected to it do have extra space... So consider space infinite once you are willing to rent.

            That being said, I have problems believing the 1/3rd of severs figures from the article. That's not my experience at all.

    • It is a reporting issue: it is perfectly normal.

      Some people do not manage remove servers over long periods.

      You install three identical servers: one running the public facing web server, one running the database server, connected by a separate, private network. The third one is available for the new version of the software to be installed, and then activated. Once the software is upgraded on all three, you keep it runnning as a hot standby. If reliable service to clients is not worth more than the cost o

    • by plopez ( 54068 )

      It's a cheap energy problem. If energy were more expansive it would be worth it to take them offline.

  • by Anonymous Coward

    We need enough servers for peak load, not average load.

    • by prefec2 ( 875483 )

      True, but in that case these machines do something sometimes over the year. In a modern data center you would be able to shutdown the servers not used for a longer period and restart them automatically when the load rises. A hardware server start may take ten minutes (if there is not much to synchronize), but as you should know your load profile and use load estimation techniques, you can start the servers in advance. Especially, in context of replication of JVM and .Net components, this should be pretty ea

      • by petes_PoV ( 912422 ) on Sunday June 21, 2015 @10:10AM (#49956333)

        In a modern data center you would be able to shutdown the servers not used for a longer period and restart them automatically when the load rises.

        Many businesses that rely on servers (i.e. all of them) will be running hot standby systems - ones that can automatically take load if there's a hardware failure or software problem.

        One major (world-ranked) international company I consulted at was legally required to have 100% failover capacity - so it was inevitable that they would automatically have 50% of their production servers performing no functions - except for the twice a year when they were "flipped" just to make sure that each set of servers worked as expected.

        Although the source paper does specify physical "zombie" servers, if you need failover VMs, the same basis is applied there, too.

        • by tepples ( 727027 )

          One major (world-ranked) international company I consulted at was legally required to have 100% failover capacity - so it was inevitable that they would automatically have 50% of their production servers performing no functions - except for the twice a year when they were "flipped" just to make sure that each set of servers worked as expected.

          Why flip them twice a year and not, say, weekly?

          • Because doing it right involves a full fail-over test including transferring loads or test loads, DNS auto-reconfiguration, and possibly even paying extra to bring up extra capacity elsewhere. You need to make sure it happens right when it's needed. Extra paperwork, overtime, it's all in there.
            • Because doing it right involves a full fail-over test including transferring loads or test loads, DNS auto-reconfiguration, and possibly even paying extra to bring up extra capacity elsewhere. You need to make sure it happens right when it's needed. Extra paperwork, overtime, it's all in there.

              If the system is architected well, shouldn't all of those steps be automated... including monitoring and failover success/failure?

              • I can imagine that this wouldn't be perfectly smooth. It may be automated but it may not be completely bumpless and I don't think a company would be happy if users see a "scheduled maintenance" sign for 15 min or however long it takes every week.

                • I fear you may be right, and that's exactly why they don't do it more often... but I think that also underscores my point a bit. Shouldn't they work to get it to the point where users won't be impacted?

                  Netflix does this pretty aggressively [pagerduty.com] and users don't seem to notice. Though I realize for most companies I am being very idealistic.

              • If the system is architected well, shouldn't all of those steps be automated... including monitoring and failover success/failure?

                In a perfect world, with perfect systems documentation you'd be right. Unfortunately few of us have the pleasure of working in such an environment :)

      • by iamacat ( 583406 )

        A hardware server start may take ten minutes - if it actually comes up successfully. If you are starting a cluster in an emergency outage, you never know how many servers, power supplies and network switches kicked the bucket since you last used them. Plus, your DNS, NFS, db and other dependencies have to be unaffected by the outage and handle the added load of hundreds of servers starting at the same time. If you do a staggered restart of 100 servers in groups of 10, that's an hour and 40 minutes of outage

        • You are absolutely right . If these server provide fail over then they must be present. The server start stop thing only applies to load management, e.g. for web shops. As I stated earlier (maybe it was in another post), fail over server as all redundancy related infrastructure are not useless. They serve a purpose. Therefore they cannot be stopped without getting into trouble.

      • by mlts ( 1038732 )

        Some servers (IBMs, HP ProLiants) have decent power management capabilities, so the boxes can stay on and be idle... but consume a relatively small amount of electricity and cooling. Add a SSD for local storage and swap (start the OS or hypervisor and let the SAN take it from there), and even the energy usage of spinning disks can be minimized.

        However, with the many ways and layers to do HA, might as well do active/active if possible. On the VMWare side of the house, DRS comes to mind, and it also support

    • by Hadlock ( 143607 )

      In our case, about 20% of our servers are outdated and not kept as well maintained, as they used to host some important service, but their new replacement was built and that service was migrated, but nobody's 100% sure if there were any other latent, less important services running on that machine. So it stays on because everybody has more important things to do than find out what else is running on there, and perhaps more importantly, nobody wants to be the guy who shuts down the server that's still runnin

  • by Anonymous Coward

    One in three people consumes energy and produces nothing interesting.

    • by Anonymous Coward

      Like this comment.
      Crap, now it's 2 out of 3.

      • 3 for 3.
        One for all, and all for one!

        Why is this article (in general) ruffling so many feathers? Because it is a thinly-disguised Malthusian Energy hit-piece specifically targeted at the center of IT's most sacred golden calf, the cloud server industry. The reason that the assumptions made in this study are confusing to many (as in, why are we even on this page? Isn't an overall one-third quiescent portion a sign of a properly engineered critical system?) is that it was not motivated by intelligent resource

  • Bad Title (Score:5, Informative)

    by seven of five ( 578993 ) on Sunday June 21, 2015 @09:51AM (#49956261)
    Reading the title, my first thought was, cripes, those botnets have taken over everything!
  • by Anonymous Coward

    Apparently, the researchers have never heard of business continuity planning. If your primary data center gets knocked offline because your company located it in a hurricane-prone area of the country in order to take advantage of state tax breaks and a cheaper labor force (happens all the time), then you're gonna need another site you can switch over your data/voice traffic instantly when the inevitable hurricane hits. That means maintaining a certain amount of redundant equipment at the failover site that

    • Moreover the idle power of systems vs. under normal load can be three to one.

      Besides failover there are "swing" servers where virtual machines or services are migrated while upgrades done elsewhere. There are "staging" servers that become busy while new software being rolled out but might otherwise be idle for months.

      Note the power draw of an idle server can be a third or less what the normal load is.

      The twats that wrote this paper obviously aren't in the business.

  • Obviously (Score:5, Insightful)

    by penguinoid ( 724646 ) on Sunday June 21, 2015 @10:01AM (#49956313) Homepage Journal

    Those are the servers hosting Slashdot's new "share" button. No one's ever clicked on it.

  • by iamacat ( 583406 ) on Sunday June 21, 2015 @10:16AM (#49956353)

    Modern systems are good at reducing power consumption when idle. It's quite reasonable to have 30% of capacity as spares, reserve for unexpected load, capacity for new apps and so on. They probably consume 3% of the power and nobody is motivated enough to look for more savings. Keeping things completely off is problematic, because you never know how much of the hardware and software will come up in time to handle an emergency unless you run and test it all the time.

    There is certainly room for further environmental/financial improvement, but the 30% figure is sensationalized.

    • Maybe. But on the other hand, even active servers spend a lot of their time idle (the paper says server utilization "rarely exceeds 6%"), and I bet a lot of these "comatose" servers are actually long-forgotten old hardware, or machines that nobody can be bothered to decommission -- it's possible that on average they're older than active servers and thus eating a lot more power.

      • by umghhh ( 965931 )
        In my previous project we had to save costs so much that we never had an updated document/service showing current booking/usage of our development, test and target servers. The result was that we had to negotiate use of some chains, work in shifts etc while some chains were idling. I have not lasted till the end so I do not know how successful the project was. I guess it was very successful - after they switched off all the machines they had more power than they budgeted i.e. managed to get to profit withou
  • How do they judge whether or not a server is contributing useful information? I have two person VPSs out there that do almost nothing on the public internet. They mostly act as a place where I can store data as a form of backup, but also a place I can access when I need it to test programs, get a really fast download, etc. But most of the time these vps's just act as central nodes in my private VPN. So by their definition are my servers in the 1/3 "zombie" serviers? I pay the rent, so to speak, so I'm

  • ... of purple prose.

    The mere existence of servers on standby is not a problem, let alone a "massive" one.

  • This ratio seems pretty close to the ratio of zombie public servants.
  • Bad terminology (Score:5, Insightful)

    by pubwvj ( 1045960 ) on Sunday June 21, 2015 @10:55AM (#49956597)

    Unfortunate confuse of terminology. Zombie computers is a term also used to mean those taken over by bot nets.

    • by Anonymous Coward

      Yes, that's exactly what I thought when I read the title. And I sense it was on purpose. Why, otherwise, use "comatose" everywhere else but the title?

  • by spiritplumber ( 1944222 ) on Sunday June 21, 2015 @12:21PM (#49957019) Homepage
    A bit low, but reasonable. Try making stuff that goes on ships, there's usually double redundancy AND a completely mechanical system in case everything goes to pot.
  • Back when I was a sysadmin for a government department I had been assigned a couple of chassis of HP blades that were bought in one of the famous fiscal year end splurges. For the most part I had no use for them and I didn't even install Linux on them. I think I only ever used a couple of blades and I hated them. It was the first generation and they ran very hot and we had lots of issues with bad RAM. The other three chassis on the rack belonged to the VMWare team and were in heavy use.

    Since I had no n

  • The last time Microsoft had a major Xbox Live outage due to high demand they just spun up a bunch of VMs and everything was fine 4 hours later. You keep them idling so that when you need 'em they're ready on a moments notice. Also if you're not Microsoft or Oracle this means you're not paying the licensing costs associated with the software being in production non stop.
  • I know the industry I'm in, we have regulations which require 3+ years of data retention which "isn't providing anything useful" until it is. If we have a legal "issue" then that will extend until the legal issue goes away and the judge says we can destroy data. While we can use archive methods, sometimes the live system is really what is needed to retrieve data. It's better to just keep disks spinning than shut them down and hope they spin back up.

    IT has a long tail where I work. Things are planning to

  • People pay for servers all the time and never use them. If they paid for the year, then why should the hoster care?
  • From personal experience, the bureaucracy of our org makes it that procurement of servers is so difficult that section managers tend to horde them when they get them.

    I'm hoping virtualization will improve this situation, but something tells me it will only create different problems. The bureaucratic culture usually invents new ways to foul up new tools.

    • by Skapare ( 16644 )
      then they will hoard hordes of instances.
    • by jbolden ( 176878 )

      One way to handle that is to not own your infrastructure and just rent month to month from the vendor who provides a pool of servers. What you are likely facing is the problem of how to prevent the administrative cost from going above X% by preventing the IT administrative cost from going about Y% by slowing down acquisitions... Better yet is just to guarantee Y and save the labor.

      • by Tablizer ( 95088 )

        For security reasons, the org in question wants mostly internal servers. But if they ran it kind of like a vendor, it may work in that that each section has to pay for any server instance it's using, through the budgeting process. But, the org in question would probably bungle that too.

        • by jbolden ( 176878 )

          The department of defense runs servers out of house. Lockheed Martin runs a cloud provider. Many of the country's banks handle it. There is no question you can buy better security than any company has internally.

          As for running an internal cloud that's pretty easy and they could ask a vendor to run the financial it while keeping all the servers physically on their prem.

  • by Skapare ( 16644 ) on Monday June 22, 2015 @04:34AM (#49960525) Homepage
    turn them into mail servers ... then spammers will keep them active.
  • There's this rumor that when Yahoo expanded its Lockport "chicken coop" data centers in upstate NY they vacated at least two large data centers in Northern VA and because the lease isn't up for another two years they have been mostly empty ever since.

    Yet, Yahoo is saving lots of money by doing this.

  • You do not spec for "average" usage; you spec for *max*. You also have to spec for how many machines (when we're talking about thousands, or tens of thousands of servers) are going to fail today, to be picked up by the "zombie" machines that are, in fact, hot spares.

    And then there's the Big Events, like the shooting in Charleston, or when the SCOTUS announces about gay marriage or the ACA - how many of those "zombie" machines are going to go live to help carry the traffic load?

  • It cost $50 to get the data chimp to power a server on.

Sendmail may be safely run set-user-id to root. -- Eric Allman, "Sendmail Installation Guide"

Working...