Amazon's Move Off Oracle Caused Prime Day Outage in One of its Biggest Warehouses, Internal Report Says (cnbc.com) 130
Amazon is learning how hard it can be to move off of Oracle's database software. From a report: On Prime Day, while the e-retailer was dealing with a major website glitch that slowed sales, the company was also dealing with a technical problem in Ohio at one of its biggest warehouses, leading to thousands of delayed package deliveries, according to an internal report obtained by CNBC. The problem was in large part due to Amazon's migration from Oracle's database to its own technology, the documents show. The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, and how difficult it is to re-create that level of reliability. It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.
Really? (Score:5, Insightful)
Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?
Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.
MongoDB is webscale (Score:1, Offtopic)
I use bussiness management products from oracle with an underlying oracle database. I feel like sometimes the IT department must not be shoveling enough coal into the boiler or something beacuse this antiquated inflexible interface just stalls all the time and very frequently has to go down for some sort of synchronization. It's slick like Amazon's web site. I don't understand why Oracle even exists given my experience with it.
Re:MongoDB is webscale (Score:5, Insightful)
I don't understand why Oracle even exists given my experience with it.
Because it's a damn good database. The question isn't about it's capabilities, it's whether it's worth the cost. As for their other products I agree with you; it's way too sluggish. But I believe Amazon was just using their database.
Now Amazon moving away from Oracle is a good thing; as servers get faster and the open source alternatives get better Oracle's database is losing it's foothold. I for one won't be sad to see that happen.
Re: (Score:2, Informative)
I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware. Want a tomcat server that barely works? Get it from Oracle! Otherwise it'll work solid everywhere else.
Re: (Score:3)
I think most people don't understand that the actual database product is rock solid.
You're right we don't understand that because we know better.
Re: (Score:2)
I've been working with Oracle databases for a couple of decades now.
"rock solid" is an extremely good description of them.
They're fucking expensive and some of the configuration is a royal pain in the arse but they work, they work well and they keep working.
I wouldn't recommend anybody starting a business to actually use one, but that's completely and entirely due to cost and Oracle's business practices, and fuck all to do with the underlying technology.
Re: (Score:2, Interesting)
I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware.
The bulk of Oracle DB was made in the past, at a time when Oracle the company actually employed talented engineers, designers, and programmers.
It really was built to be rock solid and with plenty of features to make heavy workloads a breeze.
Sadly that time has long since past and is not the Oracle the company of today.
A large portion of their middleware was either a 3rd party acquisition they purchased and had their off shore code monkeys try to integrate, or was actually made by said offshore code monkeys,
Re: (Score:2)
Note that the cost isn't just monetary. If you buy Oracle, you will forever have to fear their licensing antics. You never know when an audit might happen, and the licensing terms are so convoluted that you're likely in breach. Just to make it worse, the terms constantly change.
Re: (Score:2)
Because it's a damn good database. The question isn't about it's capabilities
Actually, it is. The Oracle vs. Google lawsuit was about Oracle's wanting to use Java patents to hammer Google into cross-licensing its map-reduce patents so that Oracle could scale to the levels demanded by customers like Amazon. Cringely had a leaker years back confirming this.
Google won that one, and now Amazon has broken free of Oracle.
Personally I like it that my Subscribe-and-Save stopped taking 3 minutes to update an order
Prime Day was worse (Score:2)
I feel like sometimes the IT department must not be shoveling enough coal into the boiler or something beacuse this antiquated inflexible interface just stalls all the time
Ok, so imagine that, but worse. That was Prime Day. Hours on hours of not stalling, but simply not working at all.
What you are describing sounds like maybe the devs aren't as good as they could be at optimizing, or maybe the company is stingy on hardware. What happened to Amazon was a world-class system brought to a halt simply because
Re: Prime Day was worse (Score:2, Insightful)
"What happened to Amazon was a world-class system brought to a halt simply because of too many users and the system fell over. That is something that Oracle is just better at handling (when it's administered right and has some powerful hardware at work, which Amazon has in spades for anything they stand up)."
You seem to have not read the articles about Prime day, such as:
https://www.cnbc.com/2018/07/19/amazon-internal-documents-what-caused-prime-day-crash-company-scramble.html
Sable is:
- Is not an RDBMS
- Is
Re: (Score:3)
It is really easy to screw up your Oracle database server. It's practically an operating system in itself, and there are multiple resource pools that, improperly managed, can starve various back end processes your DBA has barely even heard of. That said, properly managed it should handle heavy workloads for the iron you're running it on.
This is why Oracle *doesn't* make sense for a lot of installations. You need DBAs who either have a great deal of arcane Oracle server management knowledge, or who have
Re:Really? (Score:5, Insightful)
Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?
Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.
This, and the obvious risk of issues anytime you make such a large change. You fix them and move on. "thousands of delayed packages" sounds like a blip for Amazon. Bad weather can do that.
Re:Really? (Score:5, Funny)
Oracle is a silver bullet if your wallet is made from werewolf fur!
Re:Really? (Score:5, Informative)
Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?
Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.
I have some contacts at Amazon and can shed some light on this. Normally, Amazon retail prioritizes "Prime Day prep" above all else. Every team must prove they can stand up to the spike in load, and fill out lots of paperworks demonstrating they did adequate diligence. Rumor is that Prime Day was actually started as a way to do this exercise twice a year (and thus get better at it), rather than only for Christmas shopping.
However, this year is different. Moving off Oracle has been made the first priority of every retail team (well, every one that uses Oracle in any way, which is most). No doubt that shift in priorities is what's at play here: given the thousands of teams, it's no surprise that some team somewhere dropped the ball given the conflicting priorities.
So it's less about "Oracle was a silver bullet" and more about "changing stuff you don't usually change".
Bad things will happen to you! (Score:5, Funny)
Re:Bad things will happen to you! (Score:5, Funny)
Apparently we need a +1 Ominous moderation.
Shilling . (Score:2)
Do you mean -1 shilling?
Re: (Score:1)
Well played. ;-)
My question: Do we also get Erasmus points for slightly "out of the box" thinking too?
Re:Bad things will happen to you! (Score:4, Insightful)
>> thousands of delayed package deliveries
Leading to what...maybe $100K's of losses at a ridiculously inflated top-end? Vs. $100,000K's of savings from not having to write Oracle checks? I think that's a trade-off any smart business would take.
Re: (Score:2)
Re: (Score:2)
Oracle: Don't you dare change to a competing product. Bad things will happen to you.
Right, and what "competing product" will they change to?
Re: (Score:1)
I'm betting that Amazon is switching to that. I would hope they are not switching to MySQL/MariaDB/PerconaDB.
Re: (Score:2)
I like Postgres too, but it's not even close to having the features of Oracle. The problem is that using those features ties you to Oracle, so it's something you don't want to do casually (although Oracle likes you to).
More features is not necessarily better, particularly when the features are non-standard, but some of the things Oracle does are actually quite useful. For example it's possible to fork and merge database versions, and the various versions of the database will share common database pages.
Big woop (Score:1)
So the only glitch was a short delay in a single warehouse?
Sounds like a massive success story to me.
Re: (Score:2)
The failure is they don't know root cause, and they need better tools and capacity to manage savepoints with their new system.
It sounds like a secondary failure is insufficient testing prior to rollout...
Their own technology? (Score:3)
That phrase confused me.
I can absolutely understand wanting to move off Oracle. But why would they re-invent the wheel and write their own database? At least, that's what it sounds like they're doing based on the way the article was phrased.
Wouldn't it have been better to just switch to Postgres and use the oracle compatibility layer if they needed things like PL/SQL support?
Ilsa
Re:Their own technology? (Score:5, Informative)
https://en.wikipedia.org/wiki/... [wikipedia.org]
They're developing their own technology because of implementing RDS. IIRC, RDS was originally a customized MySQL, and then they implemented Aurora.
Re: (Score:2)
I presume that this is DynamoDB that we are talking about, so a Document Store (the typical NoQSL type) rather than a Relational database. At the scale the Amazon is using Postgres is simply not going to compete without having a lot of extra custom logic on the top. And once you are doing that, most of the advantage of something like Postgres is lost and Document Stores (and their inherent scalability) start to be better solutions.
Re:Their own technology? (Score:5, Informative)
Look up Amazon Aurora.
They've basically created new a DBMS that runs on top of their cloud infrastructure and is optimized for their EBS (elastic block storage). They have Postgres and MySQL flavors of the database, both of which utilize the actual DB "engines", Amazon has written their own storage backends and added a bunch of other optimizations to the codebase (they've made most messaging asynchronous where possible). Because of the use of the actual database engines they claim 100% compatibility for both Postgres and MySQL. We use the MySQL flavor and haven't run into any compatibility issues with SQL queries or stored procs. Because of the performance optimizations inherent in how it was designed to run in their cloud, we were able to significantly reduce the amount of CPU/RAM utilized to run our application and still retain similar throughput - in essence, we were able to use a smaller RDS instance size, thus reducing our costs.
One of the really nice things about it is virtually instant (and faultless) replication due to the way they rely on EBS itself to replicate data, rather than through a replication system sending queries (or binary data) to another remote system.
Re: (Score:2)
I would mod you +1 informative if I could. Thank you for that! I've seen Aurora but haven't had time to really explore it. And I didn't know they had expanded that to Postgres too.
Re: (Score:2)
Cloud is where Oracle really dropped the ball. I'm a DBA that's worked with Oracle, MySQL and SQL Server (among others) in a production capacity. MySQL and SQL Server both offer superior cloud offerings, whether with Azure or AWS, and make migrating data to the cloud easy (I haven't worked with PostGresQL enough to offer an opinion on it). Oracle's cloud offerings just can't compete at the same level as those products, and Oracle knows it; this is why they maintain their traditional sales tactic of makin
Re: (Score:2)
Yes I do.......I guess I should clarify that as "MySQL/MariaDB".
Re: (Score:2)
They have RDS, which is just managed postgres/mysql/maria,
They also have Aurora, which is (I think) compatible with Postgres/mysql/maria, but designed from the ground up to run in the cloud.
A lot of traditional software is designed to run on a traditional server, and has certain design constraints that follow you when you move to the cloud. Designing something to be both compatible but cloud-native has been an important step and both Amazon/Google have created this type of product, if Micro
I think Oracle sees the writing on the wall... (Score:5, Interesting)
Between Java and their Enterprise platforms, if Oracle spent as much time listening and responding to their customers as they spent threatening them, they might be in a far better position today. Any major platform transition is going to have problems unless you're exceptionally lucky. There's just too many moving parts in Enterprise systems for humans to get everything right on the first try. Oracle won't tout all of the problems people have moving ONTO their software from a competitor, but that transition pain happens too.
Every year that goes by, it seems like Oracle is in a more tenuous position, despite their increased revenue. They've already lost the SME space -- I don't know of a single company anywhere in our client base, or within my sphere of influence, that still uses Oracle software. Organizations are bumping up against the limits of NetSuite -- the costs to integrate 3rd-party or industry-specific components, compared with other ERPs, are turning out to be more significant than expected. So we have clients and vendors migrating ERPs over time.
Oracle is becoming the Comcast of the software world. They treat everyone like crap, but were so deeply embedded that they were hard to dislodge. With every passing year, that is less true, and I think Oracle knows it. Unfortunately, they seem to be choosing to double-down on the "treat everyone like crap" strategy, rather than actually fixing the systemic problems that might eventually sink them...
Re: I think Oracle sees the writing on the wall... (Score:1)
I think Oracle sees the writing on the wall...
Of course they do; that's why they're bullish on pineapples and fighter planes.
Re:I think Oracle sees the writing on the wall... (Score:5, Interesting)
The funny thing is that Oracle could get back into many peoples' good graces. If they offered ZFS under the GPL and allowed it to become part of the default Linux kernel, this would be one of the biggest enterprise issues that would get solved.
Similar if they opened up a lot of their Solaris IP, instead of letting it die a slow death. Zones and LDOMs would be quite useful in Linux, even with it duplicating existing hypervisor functionality.
Re: (Score:2)
Re: (Score:2)
Hell will freeze over before Oracle do any good; their corporate culture and legacy has been toxic.
I worked '97 - '07 for someone who is now effectively a VP at Oracle, and he's still as bad as the rest of them. When the Director who replaced him retired, this VP shows up, which wasn't surprising, but then spent the next several hours attempting to persuade me and my somewhat drunken coworkers and managers to throw out all the MS SQL, and 'invest' in Oracle. None of us wanted anything to do with Oracle, it's about as welcome as a STD.
I fully believe that this guy would even push that crap at a funeral
Re:I think Oracle sees the writing on the wall... (Score:4, Informative)
They'd do a far better job of returning to customers' good graces by not being such totalitarian get-every-last-dime asshats about their licensing terms.
Ever wonder why Oracle was so slow to get any traction in/among virtual machines?
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
the licensing scheme tends to influence architectural decisions
Beautifully articulated.
Stick a couple of racks of fully licensed Exadata appliances into your data centre and you've got a seriously powerful database.
Fuck around for eight months first with design and you've got an adequate one for a tenth of the price - and still using Oracle.
They really do make it too hard to use the best of their technology, and by hard, I mean fiscally irresponsible.
Re: (Score:2)
It's too late. If they had done this before BTRFS became production-worthy, it would have taken the air out of BTRFS. Now it's got momentum.
Re: (Score:3)
Between Java and their Enterprise platforms, if Oracle spent as much time listening and responding to their customers as they spent threatening them, they might be in a far better position today.
Maybe Oracle needs one of those "Codes of Conduct", that seem to be the rage these days . . . ?
Listening to customers is for startups . . . not for established market leaders. Their market dominance leads them to believe that their customers must listen to them.
Re:I think Oracle sees the writing on the wall... (Score:4, Insightful)
Oracle has simply overplayed their hand. For years, they have used the intrinsic difficulty of migrating as a tool to keep customers on-board in spite of constant abuse.
They finally tightened the thumb screws one turn too tight and their customers have decided that the intrinsic pain of migration is less than the pain of staying with Oracle.
Re: (Score:2)
people have moving ONTO their software from a competitor
Does that actually happen though? I mean who would migrate to Oracle from something else at this point?
hmm (Score:2)
The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, and how difficult it is to re-create that level of reliability. It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.
Nothing in the article really supports those conclusions.
Was it due to some actual inferiority in "their own technology" (postgresql?), or was it just a migration issue?
Comment removed (Score:5, Insightful)
Re: (Score:1)
Likely as well, the $90K that this incident cost them is a rounding error in the total budget of the project, and the long term savings that the project will provide over the years, and additional monies coming in due to being able to now sell this as a services on their AWS platform.
I am sure Amazon probably looses more money per year, maybe even month due do damages of product in shipment than this little mishap cost them.
Re: (Score:2)
$90K is likely similar to what the Oracle license costs them per day. If you think I'm joking, that's $30M/year - which wouldn't surprise me for a company the size of Amazon.
Re: (Score:2)
$30m/year could go just on Oracle Financials at their scale, let alone the database.
I don't believe this for a second. (Score:2)
Re: I don't believe this for a second. (Score:1)
"Oracle's database is more efficient" (Score:2)
It's certainly unsurpassed in the efficient manner in which it eats all available IT funding. What licensing scheme are they using to rip off their customers this year? By CPU cores? By clock speed? Both?
Amazon could, obviously, have done a better job of testing before flipping the switch on a migration this big. It's not like the company is hurting for the money that could have been used to put together an appropriate environment to prevent a snafu like this.
Re:"Oracle's database is more efficient" (Score:4, Interesting)
I cannot believe I just recommended
Re: (Score:2)
There's OpenJDK. We've been running enterprise stuff on it since 2014, you don't need Oracle.
Re: (Score:2)
Of course there will be problems... (Score:2)
To paraphrase Nelson Munce (Score:1)
I see a flaw in your logic (Score:1)
Outright slow or lack of tuning? (Score:3)
Big databases usually require careful tuning to handle big loads. Could it be the new incarnation has yet to undergo such tuning? The new incarnation may also have a different trade-off profile such that the porting process moved operations mostly as-is instead of rebalance the trade-offs to fit the new host. Much of the Oracle DB tuning may be direct production experience, something the new incarnation won't have by definition.
For a car analogy, suppose you are used to hauling big loads up the mountain in a Ford pickup truck. You switch to a Chevy truck and find your productivity drops. At first you blame the Chevy.
After weeks of experience you find the Chevy less powerful at directly going over boulders; however, it's more maneuverable than the Ford such that you just learn to swerve around boulders instead of try to go over them. Once you get used to the Chevy, the haul time is roughly the same.
Re: (Score:2)
Sure would be nice to hear from somebody who has worked with both, whether Postgres really can fill the Oracle boots. I only know about the Oracle apps, somehow popular in enterprise but universally hated. Absolute rubbish. So why am I supposed to believe that Oracle's other products are magically better?
Re: (Score:2)
OEM, I'm pointing at you....
Re: (Score:2)
Ellison made it personal [cloudpro.co.uk] like an idiot. Now Amazon doesn't care about the expense any more. And obviously, if Amazon can use AWS instead of Oracle then other companies can too, so Amazon thanks Larry for providing that extra motivation to just do it.
Re: (Score:2)
Ellison taunts Amazon (Score:2)
Larry Ellison taunts Amazon [cloudpro.co.uk] that they still use Oracle and can't do without them, thus ensuring that Amazon will stop at nothing to be rid of Oracle and him.
teething pains (Score:2)
Not for long.
Not for long.
Not for long.
No doubt, forever and ever.
Paid by Oracle (Score:2)
Oh, look, a 'news' article paid for by Oracle.