How Would You Handle a $1,000,000 Coding Error? 878
theodp writes "The Chicago Tribune's efforts to upgrade its computer system over the weekend turned into a fiasco when the system crashed, halting all printing operations and leaving about half of the Trib's subscribers without papers. The software contained 'a coding error,' according to a spokesman who estimated the cost to resolve the problem at 'under $1 million.' Any advice for the poor schmuck who's going to get the blame?"
why wasn't this caught in testing? (Score:1, Informative)
1) 100% unit test coverage, verified by a second outside team using whatever tools are appropriate for whatever language they are using (i.e., something like Jester for Java).
2) Reduction of existing code at each iteration. After the project implements the basic features, I would demand reduction of logical code lines (i.e., actual code statements, not comments or multiple statements on each line) at each iteration. In other words, existing code must *shrink* before you can add new code.
3) full source code and copyrights
If they couldn't meet these requirements as well as the actual project requirements, I would fire them, not pay them, and find someone else. Cost and deadline would be secondary concerns to these.
And on the flip side, if I were the programmer and I couldn't get clear requirements and enthusiasm from the customer, I'd drop the project.
Ahh.. a man can dream can't he?
"angry or confused" (Score:3, Informative)
Mind you, here in Perth we only have one daily newspaper and it sucks, so I can't imagine getting worked up about a failed delivery.
More common than you think... (Score:5, Informative)
Most daily newspapers produce various editions, between 2 and four, and I've seen a couple of times, where only one edition is printed due to "codeing errors" (like the 1 billion seconds from the epoc thing - my personal favorite).
Of course the vendor had to be called at the $500/hour emergency rate to fix their own error.
Once I saw a print pre-processor go off line because
The call daily newspapers "the daily miracle" and when you look at some of the computer band-aids they have producing them, you can see why.
Re:advice to hapless code monkey (Score:2, Informative)
Re:Just one (Score:5, Informative)
Re:from Office Space (Score:4, Informative)
#56 Michael Bolton [myfavmovies.com]: I must have put a decimal point in the wrong place or something. Shit, I always do that, I always mess up some mundane detail.
Advice (Score:2, Informative)
Re:My advice (Score:4, Informative)
Educate Management and Test First (Score:1, Informative)
First, buy one of Kent Beck's book and start testing. Secondly, you need to get management's buy-in to the philosophy of test-driven development. You need to get them to realize that the cost of a single failure of this magnitude (both technical costs and costs to the company's reputation) is far, far greater than the cost of taking the time to test properly. Human based testing is not adequate; we make far too many assumptions and subjective decisions. If you're unsure, then wait. Don't deploy and pray for the best. As an added benefit, test-driven development actually builds developer confidence and allows for much more rapid development.
I haven't seen all of the details of the problem, but it's possible this was an issue with business and/or technical requirements, failure to review a third-party's work or perhaps worrying about a deadline and the implications to one's invoice. I've personally seen far too many cases of management bringing in consultants to do the work and then not reviewing the product that's been delivered. Mistakes that have cost companies millions of dollars, forced them to sell to new owners and most importantly, disrupted families.
In any case, I really hope the Tribune does a thorough root cause analysis, identifies issues with their process and implements real change. Don't look to place blame (I know, it's a public corp), identify assumptions, locate weaknesses and come up with solutions.
You've never seen a modren web press, have you? (Score:5, Informative)
This is necessary too, if we wish to efficently print the massive quantity we desire. There are a lot of daily newspapers. Even in my small city there is at least 8 I know of. An old mechanical pres simply wouldn't be able to keep up. Never mind printing speed or anything else, setup time was a bitch. You had to have plates made to stamp your text on the page. These then had to be loaded and calibrated for each run that was to be done.
Now it's all electronic. At the minimum, you place the reference prints under a camera, and normally the layout files themselves are loaded in to the press. It then can go to work right away.
I know it's kind of retro-geek cool to bag on how much harder technology makes everything and how much better it was in "The good ol' days" but that's not usually the case. Old nechanical presses simply cannot compete with the speed of computerised presses, which are necessary to operate with the speed and efficency that is demanded today.
Re:My advice (Score:5, Informative)
Re:Bad News, Good News..... (Score:5, Informative)
Good news: Rainforest saved.
Actually, most of the wood pulp comes from trees grown in managed forests where trees are replanted to replace the old ones.
So it's a bit like growing corn or wheat to eat.
Strangely, we don't see many people shouting "save the corn!".
Re:Just one (Score:5, Informative)
Re:Fix it. (Score:3, Informative)
1AM Sunday to 10AM Monday = 33 hours not 9 hours.
More realistically it would have been something like he was in on friday for the test. Got everything setup left at about 8pm Saturday afternoon. Got home went to ben and then got a page to come in because the system failed a test. Came in to check it out then spent the next 33 hours trying to get the system into a working state by the time everyone arrives. Wait for a couple of hours to make sure nothing goes wrong then go home and crash
My wasn't a disaster but I put in the time just incase it was
Tribune's version (Score:5, Informative)
A story we never thought we'd print
By James Coates
Tribune computer columnist
Published July 19, 2004, 6:40 PM CDT
Nothing built by humans can go wrong in as many ways or with as nasty an outcome as a computer system.
The people who create the Chicago Tribune started relearning that fact about 4 p.m. Sunday when they noticed that nothing was getting through as they attempted to beam the stories, artwork and ads from Tribune Tower to the Freedom Center printing plant.
About 13 hours later, they finally started printing a 24-page version of Monday's Tribune that should have already been landing on their readers' porches.
It was a misfortune that most people in the news business don't ever expect to experience. Newspapers do not miss days -- and Monday was close.
The only time the Tribune failed to print was during the Great Chicago Fire of 1871. That time, the lesson was that nature can be fickle and dangerous.
Now, the paper has learned that the same goes for the computer technology that has graced the industry with unparalleled productivity since the 1990s.
Business computer systems are cobbled together as row upon row of workstations, each running an operating system based on an estimated 50 million lines of instructions. In turn, the worker bee desktop computers connect to the queen machines with their own millions of lines of code in a different language.
An endless nest of wires, cables and even radio signals move instructions at light speed between the central computer and the workstations. The main computer also talks to all the peripheral devices needed to accomplish the mission.
The peripherals can be banks of hard drives, storage bays, printers, scanners, cameras and specialty devices as diverse as a pager or a printing press several stories tall.
The certainty that each and every one of these massively complex systems will crash haunts the people charged with keeping this thoroughly digital world up and running.
Those people are engineers, and so they often reduce it to numbers.
An often quoted study by Carnegie Mellon University computer scientists studied 30,000 software programs and found five to six defects per 1,000 lines of code.
And this is for finished software sent to customers.
When writing new programs, there is typically a defect in every 10 lines of code. About a half dozen defects per 1,000 lines remain after a process of checking, rechecking, cross checking, testing, retesting and finger crossing.
The hubris of computing becomes clear as one realizes that each of these errors in code branch out with instructions to millions of other lines of code. Quite often, they find pathways never before taken by that particular program.
Collisions occur on these pathways and trouble is spotted. Maybe it can be fixed or maybe technicians can only perform a "workaround" that can't be guaranteed.
Dick Malone, the Tribune's senior vice president and general manager, said that around 9:30 a.m. on Sunday technology crews started a planned upgrade to increase the newspaper's Sun Microsystems servers from so-called 10K models to 15K machines.
To do this, experts from the company that makes the newspaper's core Windows-based publishing software, Denmark-based CCI Europe A/S, needed to install upgrades of its Newsdesk brand software that the Tribune and other clients use.
Malone noted that they checked and rechecked, tested and retested all day. Everything seemed to be working without a hitch. Then, they punched the button that was supposed to send all of the content for the newspaper to the printing plant.
Nothing arrived.
Frantic hours went by as deadline after deadline slipped while crews struggled to find a fix. Malone said he went so far as to start setting up the newspaper's pages on the art department's Macintosh desktops, hoping to get at least something printed.
One-line CODE ERROR $60 million - AT&T phone c (Score:5, Informative)
Re:Bad News, Good News..... (Score:5, Informative)
These days companies like champion and plum creek are finding that it's more profitable to sell the logged areas then to replant them. For example in maine [maineenvironment.org] and montana [seeleyswanpathfinder.com].
It's more profitable to sell land (especially waterfront land) and then log the federally subsidized national forests.
Your tax dollars at work!
Re:More common than you think... (Score:3, Informative)
That's actually not quite true. But it might as well be, under any OS - you can always modify the file by truncating it to zero bytes. Just as effective. Someone will always be stupid or malevolent enough to do this. (Make it idiot-proof and someone will make a better idiot.)
The real solution? Revision control. Imagine if a day's paper were stored in a Subversion system. Make it accessible to everyone through WebDAV + automatic versioning. (OS X has slick native support for this.) They'd never notice the difference...but you could pull any old version you want, in case something like this happens. Or any number of more minor disasters.
CCI is the vendor. (Score:1, Informative)
Re:Deployment? (Score:1, Informative)
You forgot... (Score:5, Informative)
E) Pulp does not require hardly any bleaching or even a tiny fraction of the toxic chemicals wood-pulp requires to process.
E.1) No toxic chemicals to expensively dispose of (less pollution).
F) Pulp requires a fraction of the processing compared to wood-pulp.
G) Same (non-THC-producing) hemp grown for rope and clothing can be used... existing/established farming methods.
H) Requires _much_ less fertile ground (no fertilizer) for growing... technically it _is_ a weed (not just a nickname).
H.1)
I) Requires much less expensive processing equipment to farm (ground requires drastically less/no tilling, collection can be done with hay-baling equipment instead of heavy trucks and tree-cutting machinery, etc.).
I'm sure I'm forgetting some.
Note the reference to a non-THC-producing strain... I'm not into pot, but I certainly can see a phenomenal idea when I see one (seen this one many years ago).
Re:You forgot... (Score:5, Informative)
The US actually managed to eradicate a weed that grew on the roadside from their shores by agressive burning along with a demonisation campaign to try to turn people off the (then popular) drug... a bit like the 'war on terror' but with even fewer facts behind it
There are many strands of non-THC containing Hemp (given that the social effects of introducing wide availability of another drug are undesirable - alchohol is bad enough). In Europe at least there are fields full of the stuff, as hemp rope and linen is still very popular. Even hemp paper is available, given it's cheap/easy to produce...
Medical Hemp (the THC kind) is grown under license and given to selected patients to treat certain conditions, although that's mostly still under trial (and is the motivation for the reclassification of cannabis possesion in the UK, so that the drug companies could legally do their trials).
Re:I would immediately fire anyone (Score:2, Informative)
The use of both printf and strlen in this case is absolutely fine. I've never heard of strlen leading to a security problem, and have rarely seen a non-trivial C program that doesn't use it. The only security problem I'm aware of with printf is that you shouldn't ever pass unchecked content to the format parameter, as this can lead to arbitrary code execution on some systems, apparently (I don't understand the precise mechanism involved).
So, either you're trolling against the entire C language, or are just ignorant.
Set the Unix sticky bit on the directory (Score:5, Informative)
NAME
sticky - sticky text and append-only directories
DESCRIPTION
A special file mode, called the sticky bit (mode S_ISVTX), is used to
indicate special treatment for shareable executable files and directo-
ries. See chmod(2) or the file
tion of file modes.
STICKY DIRECTORIES
A directory whose `sticky bit' is set becomes an append-only directory,
or, more accurately, a directory in which the deletion of files is
restricted. A file in a sticky directory may only be removed or renamed
by a user if the user has write permission for the directory and the user
is the owner of the file, the owner of the directory, or the super-user.
This feature is usefully applied to directories such as
be publicly writable but should deny users the license to arbitrarily
delete or rename each others' files.
Any user may create a sticky directory. See chmod(1) for details about
modifying file modes.
Re:grow canabis, stupid morons.... (Score:4, Informative)
"Fibre hemp is an annual herbaceous plant which flourishes in temperate regions. All cultivars tested in Alberta have been low-THC (delta-9 tetrahydrocannabinol) cultivars. Canada has adopted the 0.3% THC standard established by the European Union as the concentration which separates non-psychoactive strains suitable for legal fibre production from those which are illegally grown for their properties of intoxication. The 0.3% THC designation is very conservative. Most narcotic strains range from 3-5% THC, with cleaned, high potency material reaching as high as 15% THC."
Testing often overlooked (Score:2, Informative)
Translation: not enough testing.
Testers are often looked upon as the bottom rung of the overall software life cycle. Their duties are perceived by many to be hum-drum and easy to take out of the cycle. Unfortunately, cases like this show exactly why testing is one of the most importants facets of the software life cycle.
Remove, or severely limit, testing in your product, and you have only yourself to blame when problems arise out in the field. For this particular mele, if testing was removed from the project, I would blame the project manager and whoever made the decision to remove it. If testing was to blame, I would instil better procedures, beefed-up test cases, and possibly hire test engineers who ARE test engineers and not some developer who has a few cycles to burn.
It was DuPont, actually (Score:2, Informative)
Re:Bad News, Good News..... (Score:4, Informative)
You couldn't be more wrong. I live near a large paper mill that produces products for news paper companies. I've lived here all my life. I've seen first hand how they rape the forests, the mountains, etc. Sure, they plant yellow pine because yellow pine grows fast and fits their purposes, but where they plant the yellow pine was once a lush hardware forest of oaks, maples, etc. They take out the large hardwoods that provide acorns for deer and other small animals and replace them with pine, so now the pines grow unabated. The animal populations suffers. Also, any smaller hardwoods they cannot use they slash or poison so it will die. Next, since there are so many pines we recently had a plague of pine beetles. Huge tracts of pine forest (man-made pine forests) lay in waste in the mountains, hills and along the highways here. This is partly the fault of the paper company. Also, the chemicals they use creates an artificial/chemical fog that wreaks havoc. I kid you not. We had one of the largest traffic accidents in US history here some years back where 100s of cars piled up on I75. It made national news. I think the paper company paid off the victims families nicely enough though. Finally, the workers in this mill are exposed to harmful chemicals such as chlorine that takes a toll over time. Usually, late in life there are massive respiratory problems.
It's easy to arm-chair quater-back where your news paper comes from, but I for one don't subscribe to anything but online sources. You should too....
Re:It's my first week! (Score:2, Informative)
Is there _no_ professionalism anymore?
One problem is that most people outside the newspaper industry don't understand the problems of meeting multiple daily deadlines. Missing a deadline cascades to other deadlines and there is no way to make up for lost time.
If a page goes to plate-making late, the press starts late, the trucks that deliver the paper leave late, the readers have left for work and never read the paper, and a whole day's effort is wasted.
The newpaper industry is almost unique in this regard. Other industries, like the medical industry, require high precision and accuracy but, outside of the operating room, if the computer fails, you just reschedule the test.
Senior IT people at newspapers who did not rise through ranks often fail to appreciate the need for redundancy and fall-back options that producing a newspaper requires. There's something visceral about meeting those deadlines. You can only appreciate it by doing it day after day without fail for years. Nothing in computer science prepares you for it.
Re:You Slashdotted Illinois (Score:3, Informative)
It may be rehab propaganda, but... (Score:2, Informative)
Then, days later, they repeated the tests/questions. The alcohol recipients said they felt normal and tested normal. But the pot group said they felt normal, but still tested as impaired.
This is why the teacher said MJ will never be legalized in the US - too difficult to set a legal limit on a DUI level the way BAC% works for ethanol consumption.
Re:Just one (Score:3, Informative)
Here's another word for you, digitalgiblet: Prometheus [nasa.gov]!
Most people don't know Challenger was supposed to be transporting a satellite powered by 46.7 pounds of plutonium in its very next trip after the one where it was destroyed. Had the disaster occurred on that next trip, a whole lot more people would have died of lung cancer and plutonium poisoning.
The Challenger disaster and Chernobyl, both the same year, were together enough to persuade Nasa to give up its dangerous desire for nuclear fission powered engines (then Project Ulysses). For a time at least...
As the Columbia disaster happened, Nasa was pushing for a new nuclear fission engine program: Project Prometheus. This time, Nasa doesn't seem to be stopping or even slowing down its plans, despite its current safety problems, or the newly available high-energy solar power, that is far, far safer.
Prometheus of old stole fire from heaven, and was punished for his crime by Zeus, who sent an eagle every day to rip out his liver. This new Prometheus steals fire from the heart of the atom to fly into the heavens. One stupid mistake (and human stupidity that is the topic of this thread always is the cause in nuclear accidents), and the radioactive ancestor (from the mesozoic) of the eagle will be there to attack your liver, or any other organ he can get, with cancer.
Assuming, of course, that the reactor doesn't do something spectacular: like falling intact, while heating up enough to get fission going. Don't look now, but Chernobyl just landed in your back yard!
Extra credit for the Slashdot geek who can slap a coolant system on that puppy before it causes a disaster, and hook it up to power his home. ;)
Shinoda: "Is Godzilla showing his hatred toward man-made energy?"
Godzilla: "Human! Impertinent! I rule the Atom!"
"Godzilla 2000 Millennium" (Japanese version)