How Would You Handle a $1,000,000 Coding Error? 878
theodp writes "The Chicago Tribune's efforts to upgrade its computer system over the weekend turned into a fiasco when the system crashed, halting all printing operations and leaving about half of the Trib's subscribers without papers. The software contained 'a coding error,' according to a spokesman who estimated the cost to resolve the problem at 'under $1 million.' Any advice for the poor schmuck who's going to get the blame?"
Just one (Score:5, Funny)
Re:Just one (Score:4, Funny)
Dod\/ge
|_______________________________________>schmuc k
McDonald's (Score:5, Funny)
Re:McDonald's (Score:5, Funny)
Re:McDonald's (Score:5, Funny)
Isn't that what McDonald's food does anyway?
You Slashdotted Illinois (Score:5, Funny)
Re:You Slashdotted Illinois (Score:5, Funny)
Yeah, like anyone will care? Or even notice? *psssh*
Re:Just one (Score:5, Funny)
Tribune's version (Score:5, Informative)
A story we never thought we'd print
By James Coates
Tribune computer columnist
Published July 19, 2004, 6:40 PM CDT
Nothing built by humans can go wrong in as many ways or with as nasty an outcome as a computer system.
The people who create the Chicago Tribune started relearning that fact about 4 p.m. Sunday when they noticed that nothing was getting through as they attempted to beam the stories, artwork and ads from Tribune Tower to the Freedom Center printing plant.
About 13 hours later, they finally started printing a 24-page version of Monday's Tribune that should have already been landing on their readers' porches.
It was a misfortune that most people in the news business don't ever expect to experience. Newspapers do not miss days -- and Monday was close.
The only time the Tribune failed to print was during the Great Chicago Fire of 1871. That time, the lesson was that nature can be fickle and dangerous.
Now, the paper has learned that the same goes for the computer technology that has graced the industry with unparalleled productivity since the 1990s.
Business computer systems are cobbled together as row upon row of workstations, each running an operating system based on an estimated 50 million lines of instructions. In turn, the worker bee desktop computers connect to the queen machines with their own millions of lines of code in a different language.
An endless nest of wires, cables and even radio signals move instructions at light speed between the central computer and the workstations. The main computer also talks to all the peripheral devices needed to accomplish the mission.
The peripherals can be banks of hard drives, storage bays, printers, scanners, cameras and specialty devices as diverse as a pager or a printing press several stories tall.
The certainty that each and every one of these massively complex systems will crash haunts the people charged with keeping this thoroughly digital world up and running.
Those people are engineers, and so they often reduce it to numbers.
An often quoted study by Carnegie Mellon University computer scientists studied 30,000 software programs and found five to six defects per 1,000 lines of code.
And this is for finished software sent to customers.
When writing new programs, there is typically a defect in every 10 lines of code. About a half dozen defects per 1,000 lines remain after a process of checking, rechecking, cross checking, testing, retesting and finger crossing.
The hubris of computing becomes clear as one realizes that each of these errors in code branch out with instructions to millions of other lines of code. Quite often, they find pathways never before taken by that particular program.
Collisions occur on these pathways and trouble is spotted. Maybe it can be fixed or maybe technicians can only perform a "workaround" that can't be guaranteed.
Dick Malone, the Tribune's senior vice president and general manager, said that around 9:30 a.m. on Sunday technology crews started a planned upgrade to increase the newspaper's Sun Microsystems servers from so-called 10K models to 15K machines.
To do this, experts from the company that makes the newspaper's core Windows-based publishing software, Denmark-based CCI Europe A/S, needed to install upgrades of its Newsdesk brand software that the Tribune and other clients use.
Malone noted that they checked and rechecked, tested and retested all day. Everything seemed to be working without a hitch. Then, they punched the button that was supposed to send all of the content for the newspaper to the printing plant.
Nothing arrived.
Frantic hours went by as deadline after deadline slipped while crews struggled to find a fix. Malone said he went so far as to start setting up the newspaper's pages on the art department's Macintosh desktops, hoping to get at least something printed.
Re:Just one (Score:5, Informative)
Re:Just one (Score:5, Interesting)
That, and the story from one of Tom Peters' books about the guy who rented a helicopter on the fly (intended pun) to get up to the top of a mountain to restore clientele service. I consider these to be things we'll never see, only hear about.
Re:Just one (Score:5, Informative)
Re:Just one (Score:5, Interesting)
Or say this incident [209.157.64.200] - blamed on technicians...
Or say you were an air-traffic-controller... - how big a mistake do you want to make.
Re:Just one (Score:4, Insightful)
You know.. being involved in such an accident changes you for life, its not like most people who get involved in this will ever be able to put it aside and forget about it.
Adding social pressure to that is not going to solve much at all, not for the victims either.
No matter how terrible the results, accidents happen, and we'll haev to live with that. Yes, we need to deal with the consequences, but an attitude that results in more people paying for the rest of their life as a result from accidents is not going to accomplish that, it is only going to generate more 'guilty' people who are too much stuck in solving their guilt issue and can't contribute to societuy as a whole as a result.
Dogbert Strategy (Score:5, Funny)
I would have to follow Dogbert's Top Secret Management Handbook [amazon.com], and take full responsibility for the bungle. That way when the next job comes up two or three rungs above me, I'll be at the top of the list of people with actual experience with massive projects, and it won't matter that it was a colossal screw-up because I will have jumped two or three pay-grades. Corporate fall-guys, if they take it right, always end up better off than quiet behind the scenes types.
So my advice is that you should take full responsiblity and sharpen that resume, but be sure to make it known that you have learned from your mistakes and you worked hard to correct them. Nobody gets anywhere without making big blunders along the way. Be a good sport and you'll jump at least two pay grades for this blunder.
Re:Dogbert Strategy (Score:5, Insightful)
Re: Dogbert Strategy (Score:4, Funny)
> In my experience being honest about your mistakes and having the willingness to learn from them always pays off.
Yes, they'll just pull the lever that instantly drops your seat into the pool of piranhas, skipping those inconvenient steps where they would have to torture a confession out of you first.
Re:Dogbert Strategy (Score:5, Funny)
Re:Dogbert Strategy (Score:4, Interesting)
Money lost is money best spent, since it directly pays off into wisdom.
Bad News, Good News..... (Score:5, Funny)
Good news: Rainforest saved.
Re:Bad News, Good News..... (Score:5, Informative)
Good news: Rainforest saved.
Actually, most of the wood pulp comes from trees grown in managed forests where trees are replanted to replace the old ones.
So it's a bit like growing corn or wheat to eat.
Strangely, we don't see many people shouting "save the corn!".
Re:Bad News, Good News..... (Score:5, Informative)
These days companies like champion and plum creek are finding that it's more profitable to sell the logged areas then to replant them. For example in maine [maineenvironment.org] and montana [seeleyswanpathfinder.com].
It's more profitable to sell land (especially waterfront land) and then log the federally subsidized national forests.
Your tax dollars at work!
grow canabis, stupid morons.... (Score:4, Insightful)
A) grows 10000x faster than trees
B) makes 10x more pulp per acre
C) uses 100x less water.
D) stick it to the govt.
But would they ever do that? NOOOO coz there are no patents in the process to expoit and oh the trouble of the govt wackos like bush n old guys being so anti-canabis (to protect their buddies profits)
I guess they wouldnt want 100s of pot heads heading up to the 100000s acres of weed to take a few home, but what is so wrong with that OTH?
Re:grow canabis, stupid morons.... (Score:4, Insightful)
A) grows 10000x faster than trees
B) makes 10x more pulp per acre
C) uses 100x less water.
D) stick it to the govt.
I think you forgot your "...profit" clause, except here it would say
D) Use a bunch of arguments of dubious value to misdirect attention from the fact that what you really want is to get stoned
Re:grow canabis, stupid morons.... (Score:4, Informative)
"Fibre hemp is an annual herbaceous plant which flourishes in temperate regions. All cultivars tested in Alberta have been low-THC (delta-9 tetrahydrocannabinol) cultivars. Canada has adopted the 0.3% THC standard established by the European Union as the concentration which separates non-psychoactive strains suitable for legal fibre production from those which are illegally grown for their properties of intoxication. The 0.3% THC designation is very conservative. Most narcotic strains range from 3-5% THC, with cleaned, high potency material reaching as high as 15% THC."
You forgot... (Score:5, Informative)
E) Pulp does not require hardly any bleaching or even a tiny fraction of the toxic chemicals wood-pulp requires to process.
E.1) No toxic chemicals to expensively dispose of (less pollution).
F) Pulp requires a fraction of the processing compared to wood-pulp.
G) Same (non-THC-producing) hemp grown for rope and clothing can be used... existing/established farming methods.
H) Requires _much_ less fertile ground (no fertilizer) for growing... technically it _is_ a weed (not just a nickname).
H.1)
I) Requires much less expensive processing equipment to farm (ground requires drastically less/no tilling, collection can be done with hay-baling equipment instead of heavy trucks and tree-cutting machinery, etc.).
I'm sure I'm forgetting some.
Note the reference to a non-THC-producing strain... I'm not into pot, but I certainly can see a phenomenal idea when I see one (seen this one many years ago).
Re:You forgot... (Score:5, Informative)
The US actually managed to eradicate a weed that grew on the roadside from their shores by agressive burning along with a demonisation campaign to try to turn people off the (then popular) drug... a bit like the 'war on terror' but with even fewer facts behind it
There are many strands of non-THC containing Hemp (given that the social effects of introducing wide availability of another drug are undesirable - alchohol is bad enough). In Europe at least there are fields full of the stuff, as hemp rope and linen is still very popular. Even hemp paper is available, given it's cheap/easy to produce...
Medical Hemp (the THC kind) is grown under license and given to selected patients to treat certain conditions, although that's mostly still under trial (and is the motivation for the reclassification of cannabis possesion in the UK, so that the drug companies could legally do their trials).
Re:You forgot... (Score:4, Insightful)
My experiences tell me cannabis is a much more desirable drug than alcohol, both from the users and society's point of view.
Use both drugs with some sense and nothing bad will happen. Overdo alcohol and it will make you loud and often aggressive. Overdo cannabis and you will fall asleep (which can be loud but seldon aggressive). Neither are very suitable for driving. (Although I prefer people who smoked over people who drank: they drive more relaxed.)
It is when talking addiction that the large difference arises. Alcohol is a hard drug, you get physically addicted, cannabis is not. Alcohol demolishes you while it degrades you. Cannabis use over large timeperiods is claimed to deteriorate memory. (So, don't drink to forget, smoke! ;) If you smoke the cannabis (instead of eating it) you get the same risks as with tabacco use.
Here in Belgium, cannabis is more or less legal now (we are allowed to carry upto 3.3 grams on the street and use it in private places and such). It is a good thing, because we did that anyway (I live about 40 kilometer from the closest cannabis shop in the Netherlands where I can buy as much as I like legally).
There were no sudden changes in behaviour. No millions extra addicts, no stepping stones, nothing. The people who are inclined to (ab)use drugs usually do not care about legality.
Re:You forgot... (Score:5, Funny)
I bet I know why....
Re:Bad News, Good News..... (Score:4, Informative)
You couldn't be more wrong. I live near a large paper mill that produces products for news paper companies. I've lived here all my life. I've seen first hand how they rape the forests, the mountains, etc. Sure, they plant yellow pine because yellow pine grows fast and fits their purposes, but where they plant the yellow pine was once a lush hardware forest of oaks, maples, etc. They take out the large hardwoods that provide acorns for deer and other small animals and replace them with pine, so now the pines grow unabated. The animal populations suffers. Also, any smaller hardwoods they cannot use they slash or poison so it will die. Next, since there are so many pines we recently had a plague of pine beetles. Huge tracts of pine forest (man-made pine forests) lay in waste in the mountains, hills and along the highways here. This is partly the fault of the paper company. Also, the chemicals they use creates an artificial/chemical fog that wreaks havoc. I kid you not. We had one of the largest traffic accidents in US history here some years back where 100s of cars piled up on I75. It made national news. I think the paper company paid off the victims families nicely enough though. Finally, the workers in this mill are exposed to harmful chemicals such as chlorine that takes a toll over time. Usually, late in life there are massive respiratory problems.
It's easy to arm-chair quater-back where your news paper comes from, but I for one don't subscribe to anything but online sources. You should too....
One-line CODE ERROR $60 million - AT&T phone c (Score:5, Informative)
The scoop (Score:3, Funny)
(It pays to use Splint [splint.org])
Do as any knee-jerk slashdotter would... (Score:5, Funny)
Re:Do as any knee-jerk slashdotter would... (Score:5, Interesting)
So was it Sun or Microsoft?? Or maybe Apple?
Frantic hours went by as deadline after deadline slipped while crews struggled to find a fix. Malone said he went so far as to start setting up the newspaper's pages on the art department's Macintosh desktops, hoping to get at least something printed.
Re:Do as any knee-jerk slashdotter would... (Score:5, Funny)
This post has been approved by the Slashdot Ministry of Truth.
My advice. (Score:3, Funny)
Re:My advice. (Score:5, Funny)
Re:My advice. (Score:4, Funny)
Well, if they're outsourced to India... (Score:5, Funny)
*ducks*
Or South Florida (Score:5, Funny)
Who Was It REALLY? (Score:4, Funny)
It's my first week! (Score:5, Insightful)
Now if the cause was insufficient testing, well then QA has to answer for it.
And if there's no QA, well that's managements fault...
Now if it all comes down to dumb circumstances, it's poor planning on the papers fault for not testing themselves
That said, fess up, worse comes to worse, you now have national infamy, and any fame is good fame, right??
Re:It's my first week! (Score:5, Insightful)
I work in newspapers, and have for the past 7 years. The blame for this fiasco should be pinned directly on the project manager. Not the coders, not the people trying to get the thing running, but the project manager. Right in the middle of his fucking forehead.
I've torn the guts out of many newpaper networks upgrading or improving them, but never have I ever put anyone in the position of "If the new system doesn't work, we're fucked." I've always made ab-so-fucking-loutely certain there was a fall back position where the paper would hit the press. I actually had this conversation before:
<Management weenie> What happens if this new server fails?
<me> I haven't touched the old server. If the new one hiccups one whit, we fire up the old box and produce product.
<Management weenie> I don't like that - we've spent a million bucks on the new gear. Delays make me look bad.
<me> Well, if you're willing to man the phones when the advertisers call demanding re-prints of thier ads because of human error somewhere, I have no problem with it.
<Management weenie> You're an asshole. I could have you fired.
<me> In this instance, I'm paid to be an asshole. You can't fire me for doing my job.
<Management weenie> Heh. OK, we'll go with your plan.
Not planning some way to get the paper on the press is dereliction of duty, and deserves your professional head to be lopped off.
Is there _no_ professionalism anymore? Fuck, I should be paid more. Morons like that burn me - when you blow up a critical system with no backup, it's not just your livelyhood, but for everyone who depends on that system functioning as needed - it's thier livelyhood as well. Fucking morons.
Soko
Re:It's my first week! (Score:5, Insightful)
"We need to reduce spending in non-core areas!" IT usually ends up being defined as non-core (unless you're an IT company).
Suddenly management questions you if you want to buy so much as a network hub (el-cheapo consumer grade at that - not for infrastructure). You have to justify any expenditure, and so the guys on the bottom just stop asking since it is such a pain.
I'm sure anybody on that failed project could have identified steps that would have yielded a fallback. They could have built a new server, and then switched it out with the old server and kept the old one ready to go in an emergency for a couple of weeks. But that would require a $2000 server requisition - or maybe $3000 since the corporate standard was picked by some idiot on the vendor's kickback list.
For the guy on the bottom, they look bad for asking for money, and chances are that the fix would have worked fine with no failsafes at all - the last 15 upgrades probably did. He has to ask for money each time, and will have nothing to show for it.
On the other hand, every person on that project was probably thinking the same thing. Sure, spending $2k is a good business decision, but upper management wouldn't recognize that, so let's just not ask. We won't point out how much we're saving on server hardware by not having backups - we'll just let our overall expenses speak for themselves and not call attention to our negligence. And then we'll get promoted year after year and if something goes wrong we just all look dumb and nobody understands computers anyway so management will just figure that these costs come up any time you use one.
And you know what? This approach usually works in the end.
The real responsible party is the one which made cost-cutting-at-any-cost the corporate line. Oh, sure, the corporate policies usually have exception clauses, but what bottom-rung employee is going to bother running a request 12 links of the chain of command just to spend an extra $1000 on hardware? The opportunity to use it would pass before it ever got approved.
The problem is the question-everything approach of corporate fiduciary management. Sure, there is waste out there, but it doesn't take many botched migrations to drarf what you save by pinching pennies...
Re:It's my first week! (Score:4, Interesting)
A few years ago I worked for a publishing company that sold software to newspapers and magazines for publishing (mostly ad layout stuff). we became the re-seller of a pice of content management software that was being customized by us and installed (for the first time ever anywhere) at one of the larger magazines published by one of the largest mega-media companies.
We didnt just rush in headlong and try to install and run the software in production the first time. for a while the system ran in paralell with the production system as a proof of concept (just a few of the pages at the time). Then, when it was deamed ready those few pages were published live out of the system (still had other sources if it went bad)
the system worked as designed and we were able to publish the pages out of it. unfotuantely the software wasnt very usefull or costeffective so the project was ultimately scraped. Still, this is obviously the way to handle something like this, dont just rush headlong and detach your old software and systems for the new ones. run them in parallel in a production environment... its realy the only way to be sure.
I would get drunk. (Score:5, Funny)
Well, you asked.
Re:I would get drunk. (Score:5, Insightful)
I think I'd rather debug someone else's assembly language than someone else's perl.
Can't see the forest for all the trees? (Score:4, Interesting)
Take a look at some K [kx.com] code (there are examples in the user manual) and then come back and say that. If K is too exotic, then try looking at some macro-heavy LISP code -- it has the same problem just slightly less so.
Code density can be good when you're trying to see the big picture (fewer screenfulls of code is a good thing in this case), but it can work against you when you're trying to understand the little details.
Regular expressions are nothing more than a hack to make up for the fact that generalized LR parsers were quite inefficient up until a few years ago. Just compare a reasonably complex regular expression to the BNF form of a grammar for parsing the same input to see how much easier GLR is to use -- you can see some examples of just how easy GLR parsing is to use here [sourceforge.net]. And it can actually handle more general patterns with nesting, etc. I really think regexes are really just a question of premature optimization -- with GLR you just start out with an incredibly readable and simple grammar, and if it proves to be slow (i.e. if there are lots of points of ambiguity along certain parse trees) you can optimize it towards a purely LR(k) grammar.
Advice? (Score:3, Funny)
Well my first advice is to come clean, yes I mean you theodp, I think we all know who this poor schmuck is
Testing? (Score:5, Insightful)
A good test should have identified some errors, especially if it blew up IMMEDIATELY.
Deployment? (Score:5, Insightful)
You don't just change a system like in a weekend. There WILL be problems, so you have to have ways of dealing with it. Maybe that means flicking the switch back to the old system if it fails, or maybe it means running with degraded capacity a while, but whatever it is, it's dead-in-the-water is not your Plan B.
Re:Deployment? (Score:5, Insightful)
Probably in the hands of someone who decided:
(1) Cost of catastrophe: $1,000,000.
(2) Chance of catastrophe: 5%
(3) Cost of setting up parallel system, including hardware, software licenses, system administration: $250,000.
If (1) times (2) is less than (3), then it's actually better not to spend the money on (3).
Of course you can argue with the actual numbers in (1) (2) (3). (1) is the Tribune's own estimate. (2) is estimable by looking at the history of past projects, I'm just guessing 5%. And I just pulled (3) out of the air.
That said, I bet they do have degraded capacity, and that they used it to print half their papers on Monday and all their papers on Tuesday.
planning? (Score:5, Insightful)
Good planning would have had an abort procedure, so the show would go on. Everything changed should be undone if it did not work. They could figure it out after the paper was printed.
Errors are inevitable. Good planning and implementation keep you from falling on your face even when you publish seven days a week. It's not the coder's fault.
Re:planning? (Score:5, Interesting)
I can not count they number of battles I have fought just to get some time to design an emergency rollback plan.
I wish I had more balls to jump up in a emergency meeting and sream "I TOLD YOU TO GIVE ME A FEW DAYS SO I COULD DESGIN A ROLLBACK PLAN, ASSHOLE. BUT NOW ALL THE DATA CORRUPTED, AND WE CAN'T DO ANYTHING ABOUT IT BECASUE OF YOU!!"
Instead, I just keep a copy of the emails where I made the request and was denied, and then forward them to the CTO.
Re:Testing? (Score:5, Insightful)
Blame the project manager (hopefully their was one) that led testing the services thoroughly before deployment. Individual coders shouldn't be held to any legal liability.
Any legal action should be directed towards the'outside provider' (as noted in the article).
Check the Jobs section soon (Score:4, Insightful)
Re-engineer (Score:3, Funny)
Uptime (Score:4, Funny)
No Paper this morning (Score:3, Interesting)
I'm not angry anymore, I'm sympathetic for the poor schmuck as well as all the customer service people who probably got yelled at this morning.
-- Kevin J. Rice
1 million is not that much (Score:5, Insightful)
The culprit should just admit it. Shit happens, it's unavoidable even if you take all precautions. Don't make the same mistake again, though.
Point to EULA (Score:5, Funny)
Software provided as-is. Softare developer/company is not liable for any physical, financial, or any other loss or damage arising from use of software.
Doesn't all software come with things like this? (nevertheless, thank-goodness I'm not a software developer)
My advice (Score:5, Funny)
My advice: Prepare three envelopes
Re:My advice (Score:5, Informative)
Re:My advice (Score:4, Informative)
Only one thing to do now... (Score:5, Funny)
And this is why... (Score:4, Funny)
advice to hapless code monkey (Score:5, Funny)
Down, not across. (motto of alt.sysadmin.recovery referring to best method of slashing one's wrists).
slashing one's own wrists (Score:4, Funny)
Blame the users... (Score:3, Funny)
No one person should be at fault (Score:5, Insightful)
How to handle $1,000,000 coding error? (Score:5, Funny)
Re:How to handle $1,000,000 coding error? (Score:3, Funny)
Poor schmuck probably already got that e-mail, and this "coding error" was a last-ditch attempt to generate the FOURTHY-THOUSAND DOLLARS he needed.
Fix it. (Score:5, Interesting)
Take responsibility and ownership of the problem. Don't make excuses, but give real reasons.
Fix it..do whatever it takes, even if it means working over a weekend.
Write a good post mortem, explaining how th e fix is different from the original problem.
And hope to god that your management is understanding enough to keep you on.
This is comong from a guy, who in 1997 blew a $100,000 test weekend by kicking off the systems tests by loading the wrong generation of tapes.
I took the blame, and expected to lose my job. But I knew that the right thing to do was to try to recover from the problem. I stayed in the office from 1:00AM Sunday to 10:00AM Monday morning rerunning every job and report and proving out the results.
Not only did I keep my job, but I got promoted a year later. I made a name for myself that weekend....sure I could f*k up, but I work hard to keep things right for the company.
wbs.
I've seen this problem before (Score:5, Funny)
So your choices:
Plan A: Blame managers for forcing you to work under stressful conditions that lead to a workplace hazard (stress) that caused you to make the error. Cite that you had to work a lot of overtime and the lack of breaks and sleep caused you to miss a major bug.
Plan B: find someone like me who takes their time coding and have them look over the code and fix the problem for you. Sometimes another pair of eyes helps to find things you've missed.
Plan C:
Go to work in flip-flops, a Hawaiian shirt, sunlasses and tell everyone you are on vacation. Make Pacman noises, and talk to your invisible friends. Claim insanity and see if that works.
Plan D:
Start looking for another job ASAP.
UAT/QA anyone? (Score:5, Insightful)
The person writing the code can unit test to his or her best ability, but it is really the job of someone else to put it through the wringer testing thousands of simulated real-world scenerios. Sure, a coder could do this testing. But a QA guy or gal is doing really well if he makes 3/4 the salary of the guy who wrote the code- so a divison of labor only makes sense.
Not to mention the person writing the code makes the worst tester in the world. You only test it the way you THOUGHT people would use it. So, while a coder is perhaps the one who created the original problem, the real fault is in whoever let this slip through to production. Assuming, of course, that it wasn't some kind of time-bomb easter egg that would have been impossible to test. Although, good QA testers should alter their system date/time when testing date sensitive routines.
My advice (Score:5, Funny)
And send his supervisor too for not testing the system properly before trying to roll it out.
"angry or confused" (Score:3, Informative)
Mind you, here in Perth we only have one daily newspaper and it sucks, so I can't imagine getting worked up about a failed delivery.
Very carefully! (Score:5, Funny)
Frankly, I can't believe anyone would pay $1M for a coding error. Hell, the guys I work with make coding errors all the time, and practically for free!
(That's free, as in beer.)
Nothing to see here (Score:5, Insightful)
Noone [in their right mind] orders a brand new paper publishing system from a single consultant. The software probably was priced in several million dollars. Somewhere between the components something broke. For example, the file format that the publisher produced was rev. 2.1, but the software at the presses side was only aware of rev. 1.7 and below... If the coder only tested his code with the "other" piece of latest revision, he would never see any problem; and it is not his guilt that in real life the real customer uses some obsolete stuff that isn't compatible...
This kind of problem is clearly of administrative nature, of a system design and of checking which pieces work with which other pieces. Clearly, blame should be assigned to non-existent QA procedures, insufficient unit testing and [obviously] inadequate integration of components. The coder is nowhere here, it's all system design and QA stuff, realm of managers.
More common than you think... (Score:5, Informative)
Most daily newspapers produce various editions, between 2 and four, and I've seen a couple of times, where only one edition is printed due to "codeing errors" (like the 1 billion seconds from the epoc thing - my personal favorite).
Of course the vendor had to be called at the $500/hour emergency rate to fix their own error.
Once I saw a print pre-processor go off line because
The call daily newspapers "the daily miracle" and when you look at some of the computer band-aids they have producing them, you can see why.
Re:More common than you think... (Score:5, Interesting)
We had a reporter screw up and drag a folder into the trash instead of the volume it was in (MacOS is absofuckinglutely retarded for having you unmount volumes by dragging them to the trash).
He went on with his business, and then around 5pm he emptied the trash. He suspected something was wrong when it was taking over 5 minutes to empty the trash.
Turns out the folder he trashed contained *all* the quark documents for the paper (the next day's stories and advance stories).
While there were backups, some people had to scramble to rewrite their stories. Paper was a little light the next day.
That's the problem with OS9 and OSX. The users need permission to delete stories in order to have permission to modify stories.
Set the Unix sticky bit on the directory (Score:5, Informative)
NAME
sticky - sticky text and append-only directories
DESCRIPTION
A special file mode, called the sticky bit (mode S_ISVTX), is used to
indicate special treatment for shareable executable files and directo-
ries. See chmod(2) or the file
tion of file modes.
STICKY DIRECTORIES
A directory whose `sticky bit' is set becomes an append-only directory,
or, more accurately, a directory in which the deletion of files is
restricted. A file in a sticky directory may only be removed or renamed
by a user if the user has write permission for the directory and the user
is the owner of the file, the owner of the directory, or the super-user.
This feature is usefully applied to directories such as
be publicly writable but should deny users the license to arbitrarily
delete or rename each others' files.
Any user may create a sticky directory. See chmod(1) for details about
modifying file modes.
1 Million? That's nothing! (Score:5, Interesting)
I was "allowed" to resigned gracefully, quietly, and have learned a valuable lesson about software testing: It's not whether you miss something, it's whether or not someone else will find it in time to cost you your job. (nods sagely)
Re:1 Million? That's nothing! (Score:5, Funny)
X = Will accept any date 1975-Present.
Z = *.*
Y = Will accept any product made in the history of Microsoft. The Fabric of Space-Time is also an acceptable answer.
I don't worry about it (Score:5, Funny)
As long as I keep checking in my code as someone else, I won't have to.
computer + printing press = computer (Score:5, Insightful)
So the paper can deliver every day for 158 yrs using mechanical printing presses ~ except where natural disasters occur ....
The printing problems at the Chicago Tribune were related to efforts to upgrade computer equipment used to produce the newspaper, Malone said. The Tribune acquired customized software for the upgrade from an outside provider, and it contained a "coding error," he said.but as soon as computers are involved their printing press has morphed into a computer system. I wonder what provisions to *test* the upgrade before use where made?
fail to recognise newspaper as computer system?it would be easy to blame the developers and company and there should be some recognition of responsibility for technical accuracy. but what about the newspaper. they have made a fundamental mistake in not recognising that printing press + computer = computer and let their newspaper system fail at the mercy of coding mistake.
It seems while the paper can handle *mechanial* failure (158 yrs, 1 non delivery) it has yet to grasp *software* failure.
You've never seen a modren web press, have you? (Score:5, Informative)
This is necessary too, if we wish to efficently print the massive quantity we desire. There are a lot of daily newspapers. Even in my small city there is at least 8 I know of. An old mechanical pres simply wouldn't be able to keep up. Never mind printing speed or anything else, setup time was a bitch. You had to have plates made to stamp your text on the page. These then had to be loaded and calibrated for each run that was to be done.
Now it's all electronic. At the minimum, you place the reference prints under a camera, and normally the layout files themselves are loaded in to the press. It then can go to work right away.
I know it's kind of retro-geek cool to bag on how much harder technology makes everything and how much better it was in "The good ol' days" but that's not usually the case. Old nechanical presses simply cannot compete with the speed of computerised presses, which are necessary to operate with the speed and efficency that is demanded today.
Where I work... (Score:4, Insightful)
"If you say 'oops', it's OK."
Did he say Oops?
Seriously though...shit happens. That's why you don't bill employees directly for the mistakes they do. Suck it up, learn, and move on.
--
BMO
Bah (Score:5, Interesting)
Testing is Boring (Score:5, Insightful)
These days programmers have a Sword of Damocles hanging over them. Once they finish a major piece of code they may have a hard time finding new work. The economy has not lived up to forecasts of more jobs. Outsourcing has reduced computer opportunities. Management of many companies do not see new uses for computers. Off-the-shelf programs abound for almost every aspect of computerized work.
Stress may distract software engineers enough that someone will make a major mistake.
Been there (Score:5, Interesting)
A mistake I made in May (discovered this very day, by yours truly) had backed up about $400,000 per week.
Did I get stomped?
No.
A bottleneck had been identified, repaired, and eliminated!
Behold the power of positive thinking.
$1mil is nothing (Score:4, Insightful)
What's that you say, this is all the same person? No wonder you had the bug to begin with...
You really only have 1 choice (Score:5, Insightful)
When they come after you, present it as if it you were trying to do it right, but somebody wouldn't let you.
If they fire you, sue.
Unless:
a) you work for one of the few companies that actually supports a real team atmosphere, or
b) Everything was done by the book, and you still screwed up.
When someone in an industrial field is forced to work 16 hour a day, 7 day a week, and has a mistake the company suffers the ramaifications, not the worker(or the workers faimly).
Quality is a management issue (Score:5, Interesting)
"Time / Quality / Functionality: Choose Two"
"You can't test quality into a system"
"Measure twice, cut once"
"We need to parallel run the UT system"
"Engineers shouldn't be testing their own code!"
"I wouldn't be using NT for that, mate"
and so on.
These are the words technical people use to warn management of impending doom. Managers on the other hand have other things to worry about like delivery dates, sales, penalty ratchets and so on. When the "go" decision was made it will have been made by senior managers who get paid the big bucks to take the big decisions and the big sh*t when it all goes pear shaped.
The question is how the management handled mitigation by way of backups to manual processing, rollbacks to the old system or risk analysis during project planning.
Automation of an entire printing plant is a big job and it is probable they planned for a failure as a worst case scenario and will just put the 1M loss down to experience.
Re: How Would You Handle a $1,000,000 Coding Error (Score:5, Interesting)
Most of the time you don't actually break something (be it product or be it equipment), but fixing the bug and getting everything rolling again takes time.
And since the "value" of the product that is running on the line is about $5000 a minute, time is indeed money.
I've probably had a couple 1+ hour breakdowns, but this doesn't even compare to the time my buddies plant went down for three days x 2 shifts per day ($14M).
They were Lear-jetting parts in on a daily basis (they kept blowing up the new stuff and didn't seem to have the sense to order spares). Ron would show up at the service entrance at the airport to pick them up and it got to the point where the guys would just open the gates when he drove up
My most recent one was when we changed the line speed of the skillet line and the thumbwheel switch messed up and opened up the 8's bit in the ten's digit (faulty thumbwheel switch) so that instead of running at 42 jobs an hour it was trying to run at 80 JPH (it would have tried to run at 122 but it's limited in the software to 80 JPH)
Zoom zoom.
Oh wait, that's the other guys
John
The problem with management (Score:4, Interesting)
For 5 years we have worked on cutting costs instead of doing what we originally did; produce a newspaper. This has led to a lot of cut corners, patchy systems and above all stupid decisions. Now we have to spend most of our time with our hands tied behind our backs because there's no way to prove a _direct_ profit we can put on the price-tag we show to a (non-technical) customer when we are suggesting a change. It's always cost > functionality.
Companies that only sell services to customers has no goal, does not work. There has to be something you produce, something to live for instead of just being a money making machine.
Management cannot be just management to be management. A good manager is someone involved working with something they have a passion for. My boss didn't create this newspaper, nor did the boss of the actual newspaper and they probably don't have a special interest in media, it's just a career pushing money making machine for them.
Oh, I guess this turned into a rant
Re:from Office Space (Score:4, Informative)
#56 Michael Bolton [myfavmovies.com]: I must have put a decimal point in the wrong place or something. Shit, I always do that, I always mess up some mundane detail.
Re:The Coder? Nothing... (Score:5, Interesting)
Most project managers (especially ones with no technical experience... who shouldn't be let near a technical project) plan their projects with timelines with rose colour glasses. They assume there will be no coding issues discoverered in testing. Or worse, they do, but then let scope creap come into it, and borrow time from testing for the new items introduced in the scope creep. Bye bye testing time.
Mind you, I have also seen QA managers who believe that the testers only need to understand the software, and not the business where the software is to be used. This has sometimes leads to problems in end use. In any case, I tend to blame poor management before I blame the little guy. Projects like this are big enough that the process should have been able to catch things like this... unless the process was flawed.
My opinion... ready, set, slag away!
Re: The Coder? Nothing... (Score:5, Interesting)
As you pointed out, QA should have caught something this basic. There had to be a lot of careless decisions made here, and none of them are necessarily any one coder's fault. Blaming a "coding error" is simple, and makes people forget that a manager didn't do their job correctly. I've seen this particular scenario played out a dozen times before:
Last Monday Suzy Manager shouted at her team, "The schedule says we install on July 18th, so this damned product damned well better be installed on July 18th, you all got that?!"
But the vendor's ship dates slipped, and testing dates got pushed back, even though there was nothing particularily important about July 18th; except for Suzy Manager's promise to the CIO that she'd get WhizBang 2.0 installed by July 18th. And she would, too -- she had 25 points on her review riding on that very promise.
By the 14th, when a new patched version arrived that fixed the bug they discovered on the 10th, Suzy was visibly distressed. "They damn well better have that transmit bug fixed, they've been dragging their feet long enough."
Perhaps the testers just kept testing the version from the 10th instead of upgrading to the version of the 14th. It was beautiful on Saturday, so maybe the tester called in with a bad case of 'weekend flu.' Perhaps they got the patch late Friday afternoon, and the vendor swore up and down that it was just one little bug, our guy knows it's fixed, don't worry, it's better now. Whatever -- Suzy was under the gun, so she simply said "ship it."
Regardless, some nameless coder is flapping in the breeze today. Suzy is probably running around the IT department at the Tribune screaming, "we'll never buy code from those bastards again, I swear!" in a vain attempt to deflect criticism from her department.
But the CIO usually knows better, and Suzy knows the CIO knows better, and she's already sent out her interview suit to the cleaners. Even so, she'll feign total surprise to her department as she boxes up the little wooden carving she picked up during a drinking cruise to Mazatlan a couple years ago. A couple of tears later, she's interviewing over at Microsoft Consulting Services.
Or, maybe I'm completely off the mark. Perhaps they've been testing the code for a month and it's worked fine, but they installed the new code with the old libraries, or the new libraries with the old code, or the destinations were SP2 with some new security turned on. Of course, the QA department should be testing the installation packages as well, but we all know that in hindsight, right? As Yogi Berra might once have said (were he an IT manager,) "In theory, there's no difference between the lab and production, but in production there is."
Re:revel in the publicity! (Score:5, Interesting)