Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Cloud IT

Amazon's AWS Logs Its Third Outage this Month, Affecting Slack, Epic Games Store, Asana and More (theverge.com) 66

Amazon's crucial web services business AWS is experiencing problems today, with issues affecting services like Slack, Imgur, and the Epic Games store for some users. From a report: It's not looking good if you're working from home, with some Slack users unable to view or upload images, and work management tool Asana also hit by the outages. In an incident update, Slack said its services are "experiencing issues with file uploads, message editing, and other services." Asana says the problems constitute a "major outage," with "many of our users unable to access Asana." Epic Games Store said "Internet services outages" are "affecting logins, library, purchases, etc." It's the third time in as many weeks that problems with AWS have had a significant effect on online services.
This discussion has been archived. No new comments can be posted.

Amazon's AWS Logs Its Third Outage this Month, Affecting Slack, Epic Games Store, Asana and More

Comments Filter:
  • by ranton ( 36917 ) on Wednesday December 22, 2021 @10:48AM (#62105731)

    It looks like from the story that this only affects one availability zone in one region. Shouldn't it be expected that this is going to happen from time to time, which is why anyone who wants high availability will have resiliency from being in multiple availability zones? I'm not an expert in cloud infrastructure so perhaps someone here can set me straight.

    • On one hand you're correct, on the other hand the whole reason to use cloud services is someone else handles keeping it running and it seems lame that you have to do special stuff to get that. The whole point was supposed to be that you don't care where it's running... but it isn't running at all.

      • by aaarrrgggh ( 9205 ) on Wednesday December 22, 2021 @12:59PM (#62106103)

        You get exactly what you want to pay for. If you want to pay for redundancy, it is extra. If you want to pay for redundant clouds it is a lot extra.

        • Sure, I get what it is and why it costs more to get more. But I also understand feeling disappointed that this is the case. Ultimately that is a thing that needs to be done cheaply if anyone is going to be able to have a viable site which can handle load going forwards.

          • Cloud handles cheap growth well. If you are willing to accept degraded performance and don’t require real-time transaction synchronization then you can get redundancy fairly economically at the same time. It only gets really expensive when the PHBs want things like “seamless customer experience” under all failure modes.

      • by DarkOx ( 621550 )

        Well there in lines the problem. Its not unstanding the difference between the ASP model, someone else keeps it running, and the cloud model, you still build it and design it but it runs on someone else hardware you (try to) control with clumsy abstract tooling!

        The cloud model is just the old mainframe model. You buy some time to run your stuff but its still on you to make sure your job isn't going to ABEND and while there is a good deal of reliability and redundancy built into a single instance if you rea

    • I know our disaster plan for a meteor strike on the East coast doesn't necessarily mean we will magically roll over to the West coast. Some things might need to be tweaked to ensure traffic's going to the right place; some services might lose a bit of data, etc.

      So, if we have a meteor strike, we're fine.

      But if the shit goes down every week, we haven't actually tooled for that.

    • Re: (Score:2, Informative)

      by AmiMoJo ( 196126 )

      That's spot on. Someone will say they shouldn't trust the cloud, but they probably couldn't build and run better infrastructure themselves for even double what they pay Amazon.

      • by Viol8 ( 599362 ) on Wednesday December 22, 2021 @11:31AM (#62105877) Homepage

        You're right. And wrong. Few companies could roll out a few billion dollars worth of infrastructure like AWS. However plenty could roll out an in house system with 2 failover backups for considerably less than they pay for cloud services long term.

        • by bn-7bc ( 909819 )
          Yea but that would make the bldy bean counters cry about capex vs opex, and no way to "instantly" scale and the CxO types would not get the free cabin trips and dinners (oh sorry I mean fact finding opportunities)
        • For many reasons, I find self-hosting to be an ideal solution. Of course, idealism often isn't allowed to exist in the realm of reality. The biggest problem with building your own redundant, self-hosted system is having multiple databases that are constantly kept in sync and can be easily failed-over. This certainly isn't impossible, but it's often beyond the reach of most small or mid-sized businesses. In that regard, AWS with multiple availability zones is a great, pragmatic solution.

          On a related no
      • > Someone will say they shouldn't trust the cloud, but they probably couldn't build and run better infrastructure themselves for even double what they pay Amazon.

        Why? I've done that several times over.

      • Comment removed based on user account deletion
    • by Burdell ( 228580 )

      I'm not sure about this time, but in some past events, the AWS internal tooling has overloaded when there was a significant zone outage, such that all the automation that's supposed to make things HA across zones/regions fails. That of course defeats the purpose of using all the AWS HA stuff, which is why smarter companies are doing hybrid-cloud (mix of cloud and self-managed servers) and/or multi-cloud (mix of AWS+Azure+GPC servers) setups.

    • No, you are correct, if you want three or four 9s, you need to have different regions available to kick in automatically if your main region goes down. Like in the old days, you had a load balancer with 2 machines and both go down, do you just give up and say the load balancer didn't do its job?
  • by IWantMoreSpamPlease ( 571972 ) on Wednesday December 22, 2021 @10:55AM (#62105759) Homepage Journal

    At least where I work. Upper management is absolutely in love with "the cloud" and no amount of logic/screaming from the IT staff will change their mind.
    This is affecting us, and naturally management is screaming at us for a fix.
    The solution, you dolts, is to bring it back in-house, where it was running just fine.
    But then they won't save money. Well, which do you want? A penny in your pocket, or access to your data?

    • At any level of real scale, you'll save wads of cash by not using the cloud.

      • by pagedout ( 1144309 ) on Wednesday December 22, 2021 @11:23AM (#62105843)

        We found it to be a mixed bag.

        A straight lift and shift of a medium/large architecture appears to almost always cost more in AWS (unless you are really bad at running a datacenter). We pegged it at about 25% usually. If you refactor to be more cloud friendly you can drive that down into substantial savings presuming you have substantial periods of high and low volumes. If you go whole hog into something more cloud native than it is really random as everything costs even more but if broken down well with high swings you can save a lot.

        A straight lift and shift of a small architecture is just so random its hard to tell. We said they tend to be about even. It can be a good savings in labor if what you have is really standard.

         

      • by Anonymous Coward

        Until you need to have resources in other countries. Then the cloud starts getting more attractive.

    • Simple, "We need to have a backup Cloud host!", If you need true High Availability then you need multiple clouds, Azure + AWS or GCP + AWS, etc... The only way to really have stability is full redundancy. "The Cloud" is rarely a way to save money, not if it's setup correctly, because the amount of fall over you require quickly throws the costs WAY up.
    • The solution is to switch automatically to another region when one goes down, why don't you do that?
    • by Tablizer ( 95088 )

      Upper management is absolutely in love with "the cloud"... and naturally management is screaming at us for a fix [when down].

      Sell them some BozoCoin and then get the hell out.

    • by fermion ( 181285 )
      The internet is not built on the uptime of mainframes. While I understand that some number of teenage suicides will be blamed on the Facebook downtime, most of us can live on 99.5 uptime as will not pay for 4 or 5 nines.

      Professionally, as I depend on constant uptime, these outages are an extreme bother. But any robust system has contingencies. Most people who complain about outages simply will not be bothered to work out the contingencies.

    • by tlhIngan ( 30335 )

      It really depends.

      In-house is great, but the manpower and equipment needed to keep it alive is heavy and can easily outweigh the costs of AWS. When AWS goes down, it takes down a lot, too, but then again, perhaps your infrastructure goes down more often, just that because fewer people use it, it's less noticed.

      We moved email to the cloud, from internal Exchange to Office 365 hosted Exchange. Our IT guy sleeps better at night because chances are, the Office 365 exchange server will be less likely to die over

    • You gotta do what all the other companies are doing, otherwise customers/investors/employees will all ditch you.
    • I would also add that you control your own data. If your business isn't one approved by AWS, you could also loose your data.

  • If you are a small organization then going to cloud is often a good option, as the occasional outage from a cloud service while annoying is much better then if you tried to have your own budget data center, that costs 10x as much. Companies like Amazon can offer you at such a cheap price, because your demand is rather low, and they can use the computers, and employee resources to manage dozens if not hundreds of customers, so you get a better value.
    However if you are a big company like Slack or Epic. Chanc

  • by jhecht ( 143058 ) on Wednesday December 22, 2021 @11:21AM (#62105835)
    On Downdetector.com: at 10:20 a.m. Eastern Hmm. We’re having trouble finding that site. We can’t connect to the server at downdetetector.com.
  • You don't say... (Score:4, Insightful)

    by Rosco P. Coltrane ( 209368 ) on Wednesday December 22, 2021 @11:22AM (#62105837)

    The cloud might not offer the 6-sigma availability and reliability cloud providers promise?

    The cloud operated by only a handful of giant cloud providers is at the mercy of any single one of them going tits up?

    Why, this is such a surprised. So unexpected and so disappointing...

    • I'm not sure it is fair to say "the cloud" doesn't offer a certain reliability. AWS in itself has multiple regions and availability zones to combat this very problem. You can also spread redundancy to other cloud providers to further mitigate risk. Nothing is perfect, but with the proper infrastructure planning, the cloud can provide great reliability. We use AWS's geo-redundancy capabilities, and even though our services are primarily in the region that has been having problems, we have had no downtime
      • by Rosco P. Coltrane ( 209368 ) on Wednesday December 22, 2021 @12:00PM (#62105939)

        we have had no downtime

        Yet.

        The thing with the cloud is, when everything runs fine, it's great. But when things go sour, for technical reasons or because the cloud provider decides they have you trapped and it's time to put the squeeze on you, that's when you realize that infrastructure you built on theirs is locked-in and completely dependent on network availability and you're hosed, because you put all your eggs in that one basket and you have no plan B.

        Roll-your-on is a lot costlier upfront and you know you'll have problems every once in a while if you don't plan great - and you will even if you do. But least you can do something about it, you're not at the mercy of your internet provider simply to keep your company operating, and you're not some cloud provider's bitch.

        • for technical reasons or because the cloud provider decides they have you trapped and it's time to put the squeeze on you, that's when you realize that infrastructure you built on theirs is locked-in and completely dependent on network availability and you're hosed, because you put all your eggs in that one basket and you have no plan B.

          I've been using a few cloud infrastructure companies for many years. AWS since 2009. I can assure you this is a fallacy. There is plenty of competition in the space, and in all this time, I've never felt a "squeeze". I have only found that all services get cheaper and easier every year.

          If you put all your eggs in one basket, that is purely from your own bad planning, and it is no different if you roll your own; It just costs more. If you roll your own, you are still the ISP's bitch, or the Hardware

          • There is competition for some things, but not exactly for things like DynamoDB. AWS offers many custom, non standard services that either are incompatible or don't exist on other clouds. Also pricing to switch and move your data out is where they get you.

            • That is a straw man argument.

              There are certainly many competitors for Dynamo DB that can work in AWS and also other providers.That is the customer's choice to use those products and decide to tie themselves to AWS. Most AWS products are based on open source products and moving elsewhere would be no problem. For instance, they have been pushing AWS Aurora hard in the past couple of years. This product is a repackaged Postgres or MySQL. Any program that uses Postgres will natively work on Aurora. What

  • by Anonymous Coward

    this is exactly what happens when we let the upper class corrupt everything, shit stops working and civilization collapses again, and always for the exact same reason, unmigitigated greed and an out of control, unsustainable and incompetent upper class

    history repeating itself again, will we never learn?

  • by leathered ( 780018 ) on Wednesday December 22, 2021 @11:43AM (#62105901)

    As a sysadmin, whenever something on-prem broke, I'd have the PHBs in my office breathing down my neck telling me that the downtime was unacceptable. Now we've put most thing into the cloud, despite downtime and outages going way up, I can now just shrug my shoulders and blame it on the hosting company. Even if it was something I broke.

    Win-win.

    • by Tablizer ( 95088 )

      You're on Cloud 9

    • by e3m4n ( 947977 )
      its amazing how they wont accept problems that arise when its run by just 1 or a few admins, its deemed entirely unacceptable. But when they outsource it to a company with hundreds, if not thousands of admins, suddenly its an unavoidable circumstance. Ive seen people waste hours of support time demanding an explanation why a call dropped from their desk voip handset, meanwhile they could drop 2 or 3 calls a day on their cell, and they shrug it off and call right back. They arent even willing to consider tha
    • by organgtool ( 966989 ) on Wednesday December 22, 2021 @01:56PM (#62106265)
      This is one of the biggest reasons why outsourcing is so popular in both the private and public sectors. It's not just that you're outsourcing the services to an organization that specializes at performing that particular service, it's outsourcing the blame when that service inevitably breaks.
  • I toldja you should have used Win~ &^ #n` [NO CARRIER]

  • gee, I wonder why they keep getting passed over for government contracts. This sort of thing does draw attention as to design and resiliency, even if its not apples-to-apples. The people at the top make decisions based on perception more than anything. The perception is that AWS is not up to the challenges of a network that can never fail for any reason. Even a 2 minute outage during a skirmish or battle can change the entire outcome. Maybe AWS design is more than sufficient given the lower volume compared
    • by Entrope ( 68843 )

      They got passed over for JEDI mostly because Donald Trump hates Jeff Bezos for owning the Washington Post (and I say that as someone who voted for Trump last year).
      That's a big part of why the competition had to be re-opened. Amazon operates the US government's Top Secret cloud that was the inspiration for JEDI, and just expanded that to a second region: https://aws.amazon.com/blogs/p... [amazon.com]

    • Re:Pentagon Contract (Score:4, Interesting)

      by david.emery ( 127135 ) on Wednesday December 22, 2021 @01:03PM (#62106117)

      Sigh... I'm not sure whether I should be amused or disgusted when people who don't know anything about military procurement or military operations opine about them.

      The procurement system is based on evaluation against a set of requirements. If they write the wrong requirements, they get the wrong results. There's very little room in the evaluation for the exercise of independent judgement.

      In operations, though, there's a lot of room for constructing actual systems against The Real World (tm). Sometimes the requirements really are helpful, and you get systems that you can integrate as expected. Other times, you have to do a lot of work to fit the round peg into a square hole, -usually- because the shape of the hole changed from the time the requirements were written to the time the resulting system was delivered.

      There's always a big conflict/trade-off between 'requirements' and 'simplicity'. Do you want a complex system that might well have holes in it, where mistakes result in downtime (or worse, wrong answers. See https://en.wikipedia.org/wiki/... [wikipedia.org] and in particularhttps://en.wikipedia.org/wiki/Byzantine_fault ). Personally, I've always had a bias towards simpler systems with much higher dependability, but often the people writing the requirements want more complexity in the system so the usage of that system is simpler. These are NOT SIMPLE TRADES.

      But one thing I've learned about trying to reason about distributed systems over 40 years. Communications is the weakest part of the distributed system in military operations. We do not have the dependency of fiber-optic communications in most (but not all) combat systems. When the radios don't work, then you can't use anything that is not on your vehicle, aircraft, local installation, etc. Years ago, we knew how to reason about failures in distributed systems (as defined by Leslie Lamport, "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." https://www.cs.ubc.ca/~bestcha... [cs.ubc.ca] .) These days, I'm not sure how much that is actually taught.

      • by e3m4n ( 947977 )
        it really depends on the system. Dont you remember when a MS Executive convinced a Navy Admiral to put Windows NT in their CIC (Combat Information Center) on the USS New Jersey? (I beleive it was the New Jersey, it was the last battleship still in service). It had to get towed to shore. Then, if that wasnt enough, it happened again in 1998 USS Yorktown CG-48. https://www.wired.com/1998/07/... [wired.com] . None of this seems appropriate or well vetted for an in-service Navy vessel that can be called upon at any time. C
  • Can't wait until AWS has a much bigger outage. I have the popcorn standing by!

  • There's too much hanging on any of the US-EAST Zones, the only big reason for using them is zero cost ingress of network data and they're usually the first zones to get new features.

    • us-east-2 isn't so bad. us-east-1 had better latency from most of South America than sa-east-1 last I checked, so there's also that.

      But yeah, it's good advice to not put too many eggs in the us-east-1 basket.

  • ... people are going to start to realize I was right when I said that "cloud computing" the way these jackasses are selling it doesn't buy you any magical baked-in redundancy. (Should have hired me instead, I guess, huh fuckers?)

  • A lot of comments are critical of overuse of the clouds or AWS specifically. But for me it boils down to a couple of more specific issues that can be improved upon.

    The first thing is within the customers' grasp: let's think critically about our system design in terms of third parties. The more consolidation there is in the SaaS industry, the more we'll see business' interdependence. In other words, if there is a really compelling offering for managed services that run outside your immediate control, ther

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...