Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

RHN Bind Update Brings Down RHEL Named 312

Posted by kdawson on Friday July 18, 2008 @08:14AM from the remind-me-of-your-name-again dept.

alexs writes "Red Hat's response to update bind through RHN, patching the DNS hole, made a fatal error which will revert all name servers to caching only servers. This meant that anyone running their own DNS service promptly lost all of their DNS records for which they were acting as primary or secondary name servers. Expect quite a few services provided by servers running RHEL to, errr, die until their system administrators can restore their named.conf. Instead of installing etc/named.conf to etc/named.rpmnew, Red Hat moved the current etc/named.conf to etc/named.conf.rpmsave and replaced etc/named.conf with the default caching only configuration. The fix is easy enough, but this is a schoolboy error which I am surprised Red Hat made. Unfortunately we were hit and our servers went down overnight while RHN dropped its bomb and I am frankly surprised there has not been more of an uproar about this."

This discussion has been archived. No new comments can be posted.

RHN Bind Update Brings Down RHEL Named

Load All Comments

Search 312 Comments Log In/Create an Account

Comments Filter:

You didn't test before deploying an update? (Score:5, Insightful)

by Anonymous Coward writes: on Friday July 18, 2008 @08:17AM (#24240283)

So, you didn't test the update on a non-production server? Just install any old patch and let it take your network down? Who do you work for again? I have to make sure not to do business with that.

Share
twitter facebook
- Re:You didn't test before deploying an update? (Score:5, Insightful)
  
  by suso ( 153703 ) * writes: on Friday July 18, 2008 @09:03AM (#24240753) Journal
  
  Actually, I caught the error just from looking at the output of up2date/yum. It clearly said named.conf saved to named.conf.rpmsave. So all you have to do is compare what changed, implement any changes and copy named.conf.rpmsave over named.conf.
  Just as I said on the day of the release, be careful, don't just blindly update things.
  
  Parent Share
  twitter facebook
- Re:You didn't test before deploying an update? (Score:4, Insightful)
  
  by illumin8 ( 148082 ) writes: on Friday July 18, 2008 @09:26AM (#24241067) Journal
  
  So, you didn't test the update on a non-production server? Just install any old patch and let it take your network down? Who do you work for again? I have to make sure not to do business with that.
  No kidding. The only "schoolboy error" as the submitter put it, was not testing the patch on a non-production server before deploying it on a production DNS server.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by xalorous ( 883991 ) writes:
    
    People make mistakes. Even _________ (insert linux vendor here) packagers. This patch should be remembered and trotted out every time a new sysadmin is taught how to do the job. Remember the DNS bind patch of '08? That's why you test before patching production servers.
    Oh, and the 'I don't have enough money for a spare of every server' excuse won't play. That's one reason to buy consistent models. Your test equipment can serve as emergency replacements or vice versa. And if all else fails, testing on
  - Re: (Score:3, Interesting)
    
    by mi ( 197448 ) writes:
    
    The only "schoolboy error" [...] was not testing the patch on a non-production server before deploying it on a production
    Can the same line be used to defend Microsoft the next time they screw up a bug-fix or "service pack"?
- Re: (Score:3, Insightful)
  
  by jocknerd ( 29758 ) writes:
  
  You know, not everyone has non-production servers. Every server we have IS production. And if you are paying for Red Hat Enterprise, you expect Red Hat to have tested these updates themselves. If this was a Microsoft error, Slashdot would be all over Microsoft for allowing this to happen.
  - Re:You didn't test before deploying an update? (Score:4, Insightful)
    
    by numbsafari ( 139135 ) writes: <swilson.bsd4us@org> on Friday July 18, 2008 @10:15AM (#24241807)
    
    And the rest of slashdot would be all over MS admins who blindly update their systems from AutoUpdate.
    I find it really hard to believe you don't have at the very least a strawman test system. The fact that you don't says volumes.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by bucky0 ( 229117 ) writes:
    
    To be sure, red hat ballsed it up, but if you're running a service that "can't go down", you HAVE to test your patches out. If you don't have spare physical machines, test them in a virtual machine or repartion your workstation to have enough room for a server install.
    If it's important enough to not go down, it's important enough to test.
    - Re: (Score:2, Insightful)
      
      by GleeBot ( 1301227 ) writes:
      
      And contrariwise, if it's not important enough to test, then it's not important enough to not go down. So grin and bear it.
  - Re:You didn't test before deploying an update? (Score:5, Insightful)
    
    by Sleepy ( 4551 ) writes: on Friday July 18, 2008 @10:27AM (#24242003) Homepage
    
    >You know, not everyone has non-production servers. Every server we have IS production. And if you are paying for Red Hat Enterprise, you expect Red Hat to have tested these updates themselves. If this was a Microsoft error, Slashdot would be all over Microsoft for allowing this to happen.
    You are wrong; stop whining. You're just painting yourself as misinformed.
    1) The updates WERE tested.
    2) The admin installed "caching-nameserver", then configured his install to act far outside the default.
    3) He allows automatic updates straight into production. So do you it seems. Good luck with that! RHEL documentation says to not do this, but you're a bigshot "paying" for something different. I suggest you get a sidekick, and stick to the Windows side of your "enterprise".
    4) He didn't revert his .conf file, as is usually needed when some new line is added to a server .conf. This is SO NORMAL you'd have to be a n00b to get bitten!
    Your MS comparison is apples and oranges. If this guy did TEN MINUTES worth of testing he'd realize something's up, and he could revert the rpm package. How many MS updates prohibit uninstall? Quite a few!
    In Windows, you can't diff the before & after config, since Windows admins would rather be blind to what they're installing, since that's the norm and it's accepted.
    
    Parent Share
    twitter facebook
  - Re:You didn't test before deploying an update? (Score:4, Insightful)
    
    by poot_rootbeer ( 188613 ) writes: on Friday July 18, 2008 @10:31AM (#24242063)
    
    You know, not everyone has non-production servers. Every server we have IS production.
    Well, there's your problem right there...
    
    Parent Share
    twitter facebook
    - Re:You didn't test before deploying an update? (Score:5, Insightful)
      
      by Dr Caleb ( 121505 ) writes: on Friday July 18, 2008 @10:59AM (#24242531) Homepage Journal
      
      Perhaps it is his problem, but not his fault. Sounds like he's in the dreaded zone where IT is a necessary evil, not a department that can help leverage the business.
      He gets what he needs, or just barely what he needs. When management hands you crap, you learn to make crapade.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Insightful)
        
        by ebuck ( 585470 ) writes:
        
        You are right about making do with what you get, but exactly how did he lack resources in this case? He already has RHEL (and updates, so I'm guessing his support contract is up to date).
        It's not like they're charging more for a non-caching domain name services server. In fact, he took a perfectly good non-caching name server, and then installed pre-packaged configuration files to make it a caching-nameserver. Then he started hacking away at the config file. Small wonder that fixes to the caching-namese
      - Re: (Score:3, Informative)
        
        by Bryansix ( 761547 ) writes:
        
        What the fuck is wrong with you people? You think every System Admin out there had just one job to do and that's administer the servers? In my job I do everything. VOIP Phones, new employee setup, updates, backups, desktop support, fix the copier, follow up with accounting and executive assistant as to why we ran out of paper yet again etc. etc. etc. The point is the company SHOULD hire another IT person but they can't afford it and there is no freakin way I could ever test every update that comes out. Of c
  - Re: (Score:2, Insightful)
    
    by IchNiSan ( 526249 ) writes:
    
    You mean to tell me you don't even have an old desktop machine sitting around with RHEL on it to "play" with? Come on, pull the other leg. Or maybe find a new line of work. Not being able to afford non production servers and test lab is one thing, but not taking the old computer you replaced on the secretaries desk and using that to do some basic testing for mission critical updates is ridiculous. Or hell, just dual boot your machine if it comes to that. You have to do SOME testing of SOME things.
  - Re: (Score:2)
    
    by The Cisco Kid ( 31490 ) writes:
    
    So run it on another machine that isnt an authoritiative nameserver?
    And/or, if your company is so broke it can't afford a couple hundred bucks to put together a low-end box to run update tests on, then you are doomed anyway.
    "Free Software", even when serviced and supported by a corporation such as RedHat, is about knowing WTF you are doing and being responsible for your own stuff, as opposed to being a drooling button-pusher assuming everyone else will take care of you and suing them when they don't.
    And if
- - Re: (Score:2)
    
    by gormanly ( 134067 ) writes:
    
    also, this does not happen if you're running BIND in a chroot jail. So those falling to this are doubly dumb.
    - - Re: (Score:3, Insightful)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
A schoolboy error? (Score:5, Insightful)

by something_wicked_thi ( 918168 ) writes: on Friday July 18, 2008 @08:17AM (#24240285)

What? And isn't it an error of similar proportion to upgrade your primary DNS servers without first testing the new install?

Share
twitter facebook
- Re:A schoolboy error? (Score:5, Informative)
  
  by imipak ( 254310 ) writes: on Friday July 18, 2008 @08:38AM (#24240499) Journal
  
  Note as well that the initial release included a default conf file which specified a fixed source port, which of course breaks the fix.
  [Updated 10th July 2008] We have updated the Enterprise Linux 5 packages in this advisory. The default and sample caching-nameserver configuration files have been updated so that they do not specify a fixed query-source port. Administrators wishing to take advantage of randomized UDP source ports should check their configuration file to ensure they have not specified fixed query-source ports.
  
  Personally I'm surprised there's not been more uproar about the requirement to move internal DNS servers (yes, that means your Windows Domain Controllers in most corporate environments) outside any NAT'ing devices (eg: firewalls), as many NATs also break the fix by rewriting outbound UDP DNS queries to use the same or incremental source ports, which also breaks the fixes. Anyone here moved their AD outside the firewall?
  
  Parent Share
  twitter facebook
  - Mod parent up! (Score:4, Insightful)
    
    by Chrisq ( 894406 ) writes: on Friday July 18, 2008 @09:06AM (#24240783)
    
    I am sure that many people do not realise that going through a NAT device usually means that predictable port numbers will be allocated.
    
    Of course until we get details of the hole and fix we cannot be 100% sure but it is very likely that exposing predictable port numbers (which the fix randomised) reintroduces the hole.
    
    If DNS software vendors had a year's notice then why didn't the NAT firewall vendors. They could have introduced a patch at the same time.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by afidel ( 530433 ) writes:
      
      Apparently [checkpoint.com] they did, either that or Checkpoint's protection feature for the general class of DNS poisoning attacks just happened to protect against this one too. However, even if it did protect against it I doubt they could have release a same day press release stating it did if they hadn't been notified of the vulnerability ahead of time.
    - Re: (Score:3, Informative)
      
      by something_wicked_thi ( 918168 ) writes:
      
      Judging by the CERN details, it sounds like there are two things you need to do. You need to be able to predict the 16-bit random number, and the 16-bit random port. My reading (and this was very brief, so someone *please* correct me if I'm wrong here) is that the older DNS servers had two flaws: a flaw in the RNG for the 16-bit transaction number, and they used fixed or predictable ports.
      A NAT will reintroduce only the second problem because it gives you predictable ports, but obviously, relying solely on
      - Re: (Score:2, Informative)
        
        by Anonymous Coward writes:
        
        Judging by the CERN details, it sounds like there are two things you need to do. You need to be able to predict the 16-bit random number, and the 16-bit random port. My reading (and this was very brief, so someone *please* correct me if I'm wrong here) is that the older DNS servers had two flaws: a flaw in the RNG for the 16-bit transaction number, and they used fixed or predictable ports.
        A NAT will reintroduce only the second problem because it gives you predictable ports, but obviously, relying solely on the unpredictability of a 16-bit transaction id is a little scary. Because of the birthday paradox, (assuming the attacker has perfect knowledge about which port you're choosing) an attacker would need to send only something on the order of 2^8 packets to poison the cache.
        No, the birthday problem doesn't apply when you are trying to match a specific person's birthday.
    - Re:Mod parent up! (Score:4, Informative)
      
      by Effugas ( 2378 ) * writes: on Friday July 18, 2008 @02:02PM (#24245331) Homepage
      
      [This is Dan Kaminsky]
      The NAT vendors didn't get as much notice because we didn't realize so many of them were doing this.
      If we had, they'd have been brought in from the start.
      Now they're scrambling, to their credit. It's a bit of a facepalm for me.
      
      Parent Share
      twitter facebook
  - Re: (Score:2, Insightful)
    
    by evilviper ( 135110 ) writes:
    
    Personally I'm surprised there's not been more uproar about the requirement to move internal DNS servers (yes, that means your Windows Domain Controllers in most corporate environments) outside any NAT'ing devices (eg: firewalls),
    NAT is not a firewall.
    A firewall is not NAT.
    I wouldn't think that practically any major sites are running their public-facing DNS servers from behind a NAT (though I expect most are behind a firewall).
    - Re: (Score:2)
      
      by oyenstikker ( 536040 ) writes:
      
      What is wrong with running behind a NAT?
      w.x.y.z -> heavy duty router -> 10.x.y.z
  - Re: (Score:2)
    
    by db32 ( 862117 ) writes:
    
    Personally I'm surprised that this is getting modded Informative. I suppose the NAT piece is informative, but I think "Anyone here moved their AD outside the firewall?" qualifies as either -1 Job or +5 Funny.
  - Re:A schoolboy error? (Score:4, Informative)
    
    by igb ( 28052 ) writes: on Friday July 18, 2008 @09:36AM (#24241185)
    
    Hand off DNS queries emerging from AD servers inside your firewall to caching-only servers in your DMZ. I have all my AD servers on RFC1918 IP numbers with no NAT, because they strike me as devices I'd prefer to keep as far away from the big bad Internet as possible.
    ian
    
    Parent Share
    twitter facebook
- - Re:A schoolboy error? (Score:4, Insightful)
    
    by something_wicked_thi ( 918168 ) writes: on Friday July 18, 2008 @08:54AM (#24240675)
    
    IMHO, rhel should have tested this.
    'Course they should. Nobody said otherwise.
    I'm not sure what you're getting at with building from sources. Seems like overkill and doesn't solve the main problem because you can still screw it up. All anyone's saying is that you should test this on a server that you don't care about, or at least test it on one, before upgrading all of them.
    
    Parent Share
    twitter facebook
MS (Score:4, Insightful)

by FozE_Bear ( 1093167 ) writes: on Friday July 18, 2008 @08:17AM (#24240297)

If it was a Microsoft product, we'd all be carrying pitchforks and torches....

Share
twitter facebook
- Re: (Score:2)
  
  by hughesjr ( 734512 ) writes:
  
  sure, if it was really a bug ... in this case, it is totally user error and the installing of a package called caching-nameserver.
- yea smartypants (Score:2)
  
  by unity100 ( 970058 ) writes:
  
  the thing youre forgetting is that microsoft REGULARLY does that, and even with irrelevant minor updates. thats why people are too worked up because of microsoft. they will gonna let this red hat incident slip by, because red hat doesnt have a track record of messing it up.
- Re: (Score:2)
  
  by debatem1 ( 1087307 ) writes:
  
  Yup. I've got all the torches, can you grab an extra pitchfork? I'll pay you back when we get to the castle.
  
  Seems like the general opinion is that no admin worthy of avoiding the boiling oil treatment wouldn't have applied the patch blindly to a production environment, but it still doesn't let RedHat off the hook.
- - Re:MS (Score:5, Informative)
    
    by prandal ( 87280 ) writes: on Friday July 18, 2008 @09:01AM (#24240729)
    
    MS08-037 was released on the same day, and was much loved by ZoneAlarm users :-)
    
    Parent Share
    twitter facebook
- - Re: (Score:2)
    
    by unity100 ( 970058 ) writes:
    
    thousands of web hosting companies use RH to serve millions of websites. you are clueless about the market pal. rh one of the most popular web host platform choices.
    - Re: (Score:2)
      
      by debatem1 ( 1087307 ) writes:
      
      looks like you fell into the sarchasm.
bug details (Score:5, Informative)

by tommis ( 1328303 ) writes: on Friday July 18, 2008 @08:23AM (#24240353)

Here's the bug details: https://bugzilla.redhat.com/show_bug.cgi?id=453340 [redhat.com]
One of the bug comments says: "Latest caching-nameserver renamed my named.conf to named.conf.rpmsave in /var/named/chroot/etc" - so this should mean that you can still restore the lost conf file.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by Chris Mattern ( 191822 ) writes:
  
  In other words, as somebody else posted, he installed the "caching-nameserver" package and got, surprise, a caching nameserver. Shocking.
- Re:bug details (Score:5, Informative)
  
  by hughesjr ( 734512 ) writes: on Friday July 18, 2008 @08:50AM (#24240641) Homepage
  
  it is not a bug to get a caching nameserver if you install caching-namesever ... it would be a bug to install caching-nameserver and NOT GET a caching nameserver.
  A caching name server IS one that does not have any zones and only looks up zones from the DNS root servers. It is a configuration error to install the caching-nameserver package on a machine that doing anything other being a caching name server.
  Stupid admins have been complaining about this for 5 years ... but the documentation and bug entries all make it clear NOT to install the caching-namesever packages on DNS servers that control zones.
  
  Parent Share
  twitter facebook
  - Parent should be modded up (Score:2, Informative)
    
    by dstech ( 807139 ) writes:
    
    I wish I had mod points with which to mod you up. This is NOT a bug, and a few RHEL test machines I have here updated just fine, keeping their zone files as expected.
  - Re: (Score:2)
    
    by DaveAtFraud ( 460127 ) writes:
    
    Yes but a caching name server is (or at least was for a long time) the Red Hat default. Go figure. That's all most people want or need. Any bets that lots of the people who got bit with this built the machine internally with the caching name server RPM installed and then just edited or copied over the production named.conf file to turn it into their "real" name server when the box went into production?
    Cheers,
    Dave
  - - Re:bug details (Score:4, Insightful)
      
      by _Sprocket_ ( 42527 ) writes: on Friday July 18, 2008 @10:55AM (#24242451)
      
      It is a bug when an update overwrites your configuration file.
      Normally I'd say you've got a valid point. The problem here is that the config file seems to be part of the intent of the package (please correct me if I'm wrong).
      A rough example would be if someone replaced a packaged binary with a custom-compiled version and then complained when the package update overwrote that modified binary.
      
      Parent Share
      twitter facebook
Um... (Score:5, Funny)

by wellingtonsteve ( 892855 ) writes: <wellingtonsteve.gmail@com> on Friday July 18, 2008 @08:23AM (#24240355)

"I am frankly surprised there has not been more of an uproar about this"
That's because the entire Internets are now broken!

Share
twitter facebook
- Re: (Score:2)
  
  by exi1ed0ne ( 647852 ) writes:
  
  That's because the entire Internets are now broken!
  It was as if millions of p0rn sites cried out in terror, and were suddenly silenced.
argh (Score:2, Insightful)

by __aardcx5948 ( 913248 ) writes:

I guess the syadmins could put in an option in a configuration file somewhere on what files to "keep untouched" when doing package upgrades, no? So that the configuration file wouldn't be overwritten. I think I've seen something similar in Debian distros. Anyway when I install a new (custom) kernel in Ubuntu for example, synaptic asks me if I want to overwrite GRUB's menu.lst with the newly generated one, view the differences or keep my old one etc. Surely there's something similar in Redhat?
That why they get paid (Score:3, Insightful)

by nicolas.kassis ( 875270 ) writes: on Friday July 18, 2008 @08:24AM (#24240367)

Half of whole point of a subscription to RHEL is to ensure that patches they put out are properly QAed. The other side is support, but I never had a chance to test that part out.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by MikeDawg ( 721537 ) writes:
  
  Umm. . . I disagree completely. The only way I would consider a patch "put out properly" if it was tested in my exact, or near exact environment. I can only assume that I'm not important enough for that.
No worries (Score:2, Interesting)

by FlyingBishop ( 1293238 ) writes:

I don't need to worry about that, I run Debian
Also, I don't run my own DNS. But if I were paying someone to make sure my patches weren't idiotic, I'd be pretty pissed, whether the patch was for something I used or not.
- Re: (Score:2, Insightful)
  
  by larry bagina ( 561269 ) writes:
  
  Idiotic.... like Debian's openssl "enhancements" that made the random number generator not so random?
- What about Debian's OpenSSL bug (Score:2)
  
  by dfcamara ( 1268174 ) writes:
  
  Recent Debian's OpenSSL bug was orders of magnitude worse...
You are WRONG :D (Score:5, Interesting)

by hughesjr ( 734512 ) writes: on Friday July 18, 2008 @08:26AM (#24240385) Homepage

This article is absolutely wrong.

The user has misconfigured their DNS and has installed a package called, SURPRISE, caching-nameserver along with the other bind packages.

caching-nameserver IS just that, a caching-nameserver. It SHOULD NEVER BE installed on a DNS server that is used for Primary or Secondary DNS control. The bind packages do not in any way modify named.conf, but if you want a caching nameserver and if you have installed the caching-nameserver package, then you would EXPECT that it would replace the named.conf file.

The real question is, how does crap like this get posted as a feature article on slashdot.

Share
twitter facebook
- Re: (Score:2)
  
  by MicktheMech ( 697533 ) writes:
  
  /. QA is worse than the poster's?
- Re: (Score:2)
  
  by Kamokazi ( 1080091 ) writes:
  
  There is a reason a lot of articles used to get tagged 'kdawsonfud'
- - Re: (Score:2, Insightful)
    
    by hughesjr ( 734512 ) writes:
    
    BUT ... how can you create a caching-nameserver without changing that file???
    If you do not change that file, you do NOT have a caching-namesever ... which was the whole point of installing that package.
    - - Re:You are WRONG :D (Score:5, Informative)
        
        by _Sprocket_ ( 42527 ) writes: on Friday July 18, 2008 @11:13AM (#24242791)
        
        I'm not familiar with the package in question, but I assume it also installed some binaries. If it found that there already was a configfile of that name, it should have asked what to do.
        If setting up the caching-nameserver was a matter of changing config options, you don't need a package for that, you need a HOWTO.
        I would hazard to guess that unfamiliarity with the package is the real root cause of this. From the package description for caching-nameserver-7.3-3 (which could be a very old version):
        The caching-nameserver package includes the configuration files which
        will make BIND, the DNS name server, act as a simple caching nameserver.
        Many users on dialup connections use this package along with BIND for
        such a purpose.
        If you would like to set up a caching name server, you'll need to install
        the caching-nameserver package; you'll also need to install bind.
        The file contents show:
        Copyright.caching-nameserver
        caching-nameserver.spec
        localdomain.zone
        localhost.zone
        named.broadcast
        named.conf
        named.ip6.local
        named.local
        named.root
        named.zero
        rfc1912.txt
        And so there we have it - a package designed to install and maintain the very generic files needed to configure a caching DNS server. DNS server not included.
        And sure - this could be a HOWTO. But making a package allows for quick-and-simple configuration. And since this kind of thing is so generic, it really lends itself to packaging. I disagree that it should only be a HOWTO.
        
        Parent Share
        twitter facebook
Test your patches (Score:3, Insightful)

by MikeDawg ( 721537 ) writes: on Friday July 18, 2008 @08:27AM (#24240395) Homepage Journal

What kind of environment are you in where you don't first test your patches that are going out to live production machines? Regardless of the fact that it is linux and not windows, you should always test your patches before you roll them production.

Share
twitter facebook
- Re:Test your patches (Score:5, Insightful)
  
  by Just Some Guy ( 3352 ) writes: <kirk+slashdot@strauser.com> on Friday July 18, 2008 @09:38AM (#24241217) Homepage Journal
  
  What kind of environment are you in where you don't first test your patches that are going out to live production machines? Regardless of the fact that it is linux and not windows, you should always test your patches before you roll them production.
  Disclaimer: I test first.
  You know, lot of people work in small shops that can't afford multiple redundant servers. I suspect that business with a single DNS/web/mailserver are a lot more common than Slashdotters this morning seem to thing. What are those admins supposed to do? They're receiving a critical security patch from a trusted vendor, and I imagine a lot of them feel pretty safe applying that to their sole production server. This doesn't make them stupid or incompetent.
  I have the luxury of lots of hardware that can fill in for other gear in a pinch, but lots of people don't. They don't deserve scorn for it.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by radish ( 98371 ) writes:
    
    Couple of points. Firstly, the smart but under resourced admin should use this incident as evidence of what happens when management won't cough up for required equipment. A test environment is not a luxury, it's a necessity for a reliable system. Secondly, something like vmware can let you set up and test an environment on whatever hardware you do have lying around - you could even run it on the prod box at a pinch.
  - What are those admins supposed to do .. (Score:2)
    
    by rs232 ( 849320 ) writes:
    
    "lot of people work in small shops that can't afford multiple redundant servers .. What are those admins supposed to do?"
    
    Keep two harddrives in the same machine and keep a clone of the DNS server on the second HD, if an upgrade borks, just swap the drives.
    - Re: (Score:3, Funny)
      
      by Just Some Guy ( 3352 ) writes:
      
      Summary: keep backups. :-)
  - Re: (Score:2)
    
    by Sleepy ( 4551 ) writes:
    
    You nailed the demographic, but these are EXACTLY the group that should not be running their OWN exposed servers.
    This would be true for any server, but "double" so for DNS.
    DNS hosting is cheap, and the expenses are unpredictable (unlike the commotion raised when you have "Windows IT" people whining about "what they pay Redhat for". UNIX does what you ask. If you want 10 meters of rope, tie it to a beam and stick your head in... it will let you.
    Automatic updates are FINE for the desktop, in shops like you de
- - Re: (Score:2)
    
    by MikeDawg ( 721537 ) writes:
    
    . . . and DNS servers not responding doesn't?
  - Re: (Score:2)
    
    by jeiler ( 1106393 ) writes:
    
    Not to mention that it would involve actually doing things correctly.
Red Hat's been kind of iffy lately (Score:2, Informative)

by propanol ( 1223344 ) writes:

A few months prior to the release of RHEL 5.2, they released a kernel update (2.6.18-53.1.6.el5) in which they had added a patch for an issue that could make a system oops upon when files with names of a certain character were present on NFS shares. However, this patch also contained a bug which broke NFS lookup caching and subsequently crippled NFS performance to the point of NFS being completely unusable when working with multiple smaller files. They released a patch for it, but it would only apply cleanl
Experienced Monkeys... (Score:2, Insightful)

by spankymm ( 1327643 ) writes:

...check for rpm mouse droppings by running find.
RH may have made a small coding mistake - you made an even bigger one.
Common Red Hat Mistake (Score:3, Interesting)

by Spazmania ( 174582 ) writes: on Friday July 18, 2008 @08:33AM (#24240445) Homepage

Red Hat makes this mistake a LOT. It makes the update process very unreliable. SuSE isn't as bad but they still have problems if you customize a piece of software's configuration in an unexpected way.
Debian is king here. The incremental patches almost never break a configuration and the major release upgrades tend to work; they often change package names if the new "version" has a major incompatible change in the configuration.

Share
twitter facebook
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:2)
    
    by Spazmania ( 174582 ) writes:
    
    openSUSE has warnings in those files NOT to use them and tell you which one you need to use.
    True enough. Doesn't help when I want the application to do something it is capable of but which wasn't envisioned by the SuSE packagers. Like binding sendmail to a non-privileged port as a non-privileged user and then using iptables to redirect port 25 up to that port.
    Debian won't automatically overwrite my modified init.d script during an upgrade. It'll ask permission with the default set to "no." SuSE and Red Hat
  - - Re: (Score:2)
      
      by andrewd18 ( 989408 ) writes:
      
      If that's not possible, how about just sticking with manual editing.
      Because 98% of all users hate having to edit a configuration file just to get the maximum resolution out of their video card.
    - Re: (Score:2)
      
      by X0563511 ( 793323 ) writes:
      
      Then suse is probably not for you. You should try something like Slackware (slamd64 is a good amd64 port) if you need that kind of flexibility/control.
      Hell, if you tried hard enough, I'm sure you could port YaST to Slackware!
Well (Score:4, Insightful)

by ledow ( 319597 ) writes: on Friday July 18, 2008 @08:44AM (#24240569) Homepage

Yeah, it's a silly mistake.
But you should be testing things like this first, and whenever you upgrade you should really be looking at/for all .rpmsave or equivalent files first to make sure nothing has changed in the meantime. Otherwise, you're just removing your config and replacing it with the default whatever happens. You should also be checking .rpmnew (or equivalent) each time to check that it hasn't changed in terms of syntax, defaults etc. (which, let's be honest, is quite likely for such an important update - especially given that we hardly know what the exact problem is yet). I wouldn't go so far as to suggest intimate analysis of packages while they are still packed unless the systems you are running are quite critical to the operation of a business.
Part human-error on RH's part (it happens). Part incompetence in not testing the updates yourself first. Chances are that if I were affected by this, I would catch it as part of "right, what did that package change?", or notice as part of usual testing later, and then just move the file. I probably wouldn't even bother to send RH a note.
If you have a DNS server, that suggests that there are reliant computers. As courtesy to all those reliant computers you HAVE to test changes and check carefully what they are doing first. If you were "stung" by it, it suggests you hit this problem on ALL your DNS servers and/or that you only have one DNS server anyway. To deploy packages like this on such a setup is just asking for trouble.

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by LeneJ ( 190881 ) writes:
  
  I just want to clarify a bit about rpmnew vs rpmsave.
  Red Hat will create an rpmsave file when we make a significant change to the configuration file, or a mandatory change. Other than that, we keep the original config file, and store the rpm-config as rpmnew.
Don't forget! (Score:5, Informative)

by prandal ( 87280 ) writes: on Friday July 18, 2008 @08:45AM (#24240579)

Don't forget to check your named.conf on RHEL 5.x (and CentOS 5.x).
Make sure that any lines like
query-source port 53;
query-source-v6 port 53;
are commented out or deleted so that forwarded DNS queries come from random ports.
Restart BIND if necessary.

Share
twitter facebook
Configuration management (Score:3, Interesting)

by cluening ( 6626 ) writes: on Friday July 18, 2008 @08:58AM (#24240703) Homepage

Have you considered using a configuration management tool such as Bcfg2 [bcfg2.org] or cfengine to make sure your own config files are restored after package updates are made? You can never really trust those package maintainers...

Share
twitter facebook
- Re: (Score:2)
  
  by Just Some Guy ( 3352 ) writes:
  
  Or even version control. As much love as Git and Subversion get around here, I'm surprised you don't hear more people advocating using them for config files.
Suprised... (Score:2, Redundant)

by kenh ( 9056 ) writes:

I must say that I am very suprised that this patch acted one way in the posters test environment and another when it was installed on their production machine... That's very odd.
What, he didn't test it before placing it in production? Never mind, move along - nothing to see here.
If the poster made an error (as suggested by a previous post), or if he installed a patch without testing it, bad on the original poster - but if the patch truely was bad (a possibility), then bad on RHN for letting something bad ou
At least... (Score:2)

by xalorous ( 883991 ) writes:

...the file was backed up and not deleted.
you were hit ? (Score:2)

by rs232 ( 849320 ) writes:

"Unfortunately we were hit and our servers went down overnight while RHN dropped its bomb and I am frankly surprised there has not been more of an uproar about this"

You mean you installed a patch that you didn't know was faulty and didn't have a rollback option in place, shame on *you* Mr. Sysadmin .. :)
question (Score:2)

by bucky0 ( 229117 ) writes:

Does the red hat version of apt-get (yum? I've used debian exclusively for a long time, so I forget what command it was) not prompt you when it wants to overwrite a config file? On any debian (or debian derived) machine I've used, apt-get always asks what you want to do if your config file is different than the package's.
If yum(?) blows away configs without prompting, that's pretty bad.
Welcome to third party packaging... (Score:3, Insightful)

by Venotar ( 233363 ) writes: on Friday July 18, 2008 @10:50AM (#24242367) Homepage

This is news? Redhat (like every OS vendor I've ever dealt with) have been pushing out updates with broken assumptions for years.
In fact, this isn't even the first time they've done something similar when updating bind:
back in 2004 they released RHEL 3 update 4 and many people had precisely the same experience. Additionally, when applied, Update 4 removed the /etc/rc*.d/S*named and /etc/rc*.d/K*named and then shut named off.
As a quick glance at redhat's bugzilla [redhat.com] shows, the first problem (the same one you experienced in this release) wasn't a schoolboy mistake on the packagers part, or a bug. It was the result of a poorly understood choice on the part of the person who originally provisioned the machine.
Rather than installing just the original bind-9.2.4, the people who had their named.conf overwritten had installed bind plus a package called caching-nameserver. It's that package that, when updated, backed up and overwrote their bind config. The "caching-nameserver" package should only be installed if you want to run a caching nameserver, because the caching-nameserver package isn't an application at all - it's simply a named.conf file.
The real bug (back in 2004) wasn't actually in Update 4's bind package. As it turns out, the package it replaced incorrectly contained a `chkconfig --del named` in its uninstall script.
Anyone without proper alerting and a good QA process found that one out the hard way. I had customers who'd gotten so blasè about performing nighttime maintenances without proper reversion testing that they scheduled nightly cronjobs that ran up2date at midnight and rebooted the production machine, Naturally, they woke up in the morning to find they'd just suffered 8 hours of downtime.
Lesson? Don't trust the vendor's QC work, don't install unnecessary packages, and make sure to QC your own work! Ask any experienced Windows admin about unintended consequences from "trusted" vendor patches...

Share
twitter facebook
On Blindly Updating - OS X Server 10.3.x (Score:3, Interesting)

by not_hylas( ) ( 703994 ) writes: on Friday July 18, 2008 @10:55AM (#24242447) Homepage Journal

GUILTY.
Seems the person that prepared the patch is a new hire at Red Hat.
Beware Latest 10.3.x security update - it replaces /etc/named.conf:
http://discussions.apple.com/message.jspa?messageID=5876624 [apple.com]

Share
twitter facebook
This is why I don't run Windows... (Score:3, Funny)

by Blimey85 ( 609949 ) writes: on Friday July 18, 2008 @10:55AM (#24242455)

Crap like this is why I lost faith in Microsoft and quit running Windows years ago. Thankfully my RHEL box isn't affected by this sort of... oh... wait... really? Shit.

Share
twitter facebook
Or you will just lose the file unless you backup (Score:2)

by Roberto ( 1777 ) writes:

Check here:
http://lateral.netmanagers.com.ar/weblog/2008/07/16.html#BB701 [netmanagers.com.ar]
In at least one very common config, named.conf is a symlink, so copying it doesn'tavoid it being overwritten.
The named update script "copies" symlinks by making another symlink, not by copying the underlying file.
Not a bug, expected behavior (Score:5, Informative)

by Todd Knarr ( 15451 ) writes: on Friday July 18, 2008 @11:17AM (#24242855) Homepage

This sounds like how RPM's behaved as long as I can remember. It looks at three versions of a config file: #1 the one from the old package, #2 the one currently on disk and #3 the one in the new package. If the config file hasn't been customized (1 and 2 are identical), it moves the old file to .rpmold (if 1 and 3 differ) and puts #3 into place. If the config file has been customized, it checks whether 1 and 3 differ. If they haven't then nothing's chanced, the customized config file's still valid and it drops #3 in with the .rpmnew extension. But if 1 and 3 differ, then something in the config file may have changed and the customized config file may no longer be valid. But it's got customizations in it that the admin may need to refer to. So it outputs a warning message about what it's doing, moves the customized config file to .rpmsave and installs #3, and the admin's expected to have seen the warning and to merge their customizations into the new config file. You do watch for warnings and errors during the update, right?
In this case RPM is right, old named.conf files aren't valid. If they're based off RH's old stock config files, they have the source port locked and that disables much of the security fix. So the admins do have to check and modify their customized files before the system's finally ready (or at least RPM has to assume they do, since it can't know exactly what their changes were). That's exacerbated by probably having caching-nameserver installed, but I think a stock BIND install has a similar named.conf until you add your own zones to it.
I'd chalk this one up to admins who a) don't understand an inherent limitation of package-management systems (namely, it doesn't know why you changed something, only that you changed it), b) didn't watch the update process for errors, and c) didn't check the systems for functionality after the update.

Share
twitter facebook
You'd Think the Slashdot Editors Would Know... (Score:2)

by afabbro ( 33948 ) writes:

...that neither 'bind' nor 'named' should be capitalized. Then again, they're not very technical people.
Oh so LATE (Score:3, Interesting)

by alexborges ( 313924 ) writes: on Friday July 18, 2008 @12:03PM (#24243613)

Thanks ./, ive known about this for TWO WEEKS.
And no one died.
So there.

Share
twitter facebook
Don't put all of your eggs... (Score:3, Informative)

by mi ( 197448 ) writes: <slashdot-2025q2@virtual-estates.net> on Friday July 18, 2008 @12:15PM (#24243793) Homepage Journal

Don't entrust the function like DNS to a single vendor. With some services it is hard, as authors support a limited range of OSes/hardware or charge too high a price for each installation to make redundancy affordable.
But not DNS. Free solutions abound, and the commercial ones are quite cheap too. They are available for all imaginable "server-grade" OS/hardware combination. If you use more than one servers for DNS in your enterprise, and both of them use the same platform, you aren't doing your job.
Mind you, I don't blame the victims here — Red Hat screwed up royally, and that's that. Just advising on how to avoid being hit by such (inevitable) mistakes — from any vendor — in the future.

Share
twitter facebook
- Re: (Score:2)
  
  by msuarezalvarez ( 667058 ) writes:
  
  The accountants doing the computation just called and they need a little more time before they have a good estimate on the cost of this: they are waiting for their computers to finish defragmenting their disks and for their antivirus apps to scan every word file for vba worms.
- - Re: (Score:2)
    
    by MikeDawg ( 721537 ) writes:
    
    If you don't check out how neat the RHN satellite server, or the new spacewalk server is, you're really missing out. It is really nice in the enterprise environment.
- Re: (Score:2, Insightful)
  
  by CrackerJackz ( 152930 ) writes:
  
  Because the named.conf file gets stomped, the 'backup' RPMSAVE file it creates is the caching-only file, not the original named.conf file.
  I caught this a couple of weeks ago on a test server (where *all* patches should be tested first, Microsoft or otherwise) best way to fix? cp /etc/named.conf /root/named.conf.backup ; up2date-nox -u ; cp /root/named.conf.backup /etc/named.conf ; /etc/init.d/named restart
  Little to no downtime on the prod servers :)
- Re: (Score:3, Funny)
  
  by Anonymous Coward writes:
  
  Yes, as an official red hat representative, I can say that we can. All you need to do at this time is respond posting your server addresses and login credentials. We will fix it from there.
  - Re:New update? (Score:5, Funny)
    
    by I cant believe its n ( 1103137 ) writes: on Friday July 18, 2008 @09:02AM (#24240743) Journal
    
    Yes, as an official red hat representative, I can say that we can. All you need to do at this time is respond posting your server addresses and login credentials. We will fix it from there.
    Ok, the login name is root and I use the default password: password for all our production machines.
    Oh, I almost forgot. Our IP is 207.46.19.254
    
    Please let our CEO know that I was the one who gave you this information.
    
    Parent Share
    twitter facebook
- Re:What kind of an idiot would...? (Score:4, Interesting)
  
  by ThePhilips ( 752041 ) writes: on Friday July 18, 2008 @09:02AM (#24240741) Homepage Journal
  
  On most (all?) other distros it works perfectly. I had Debian for ages in production (supporting piles of services) with apt-get update/upgrade running regularly. SuSE and Gentoo also do good job keeping you informed about changes in updates and if post-update human interaction is needed.
  The crucial difference here is mindset of RH. It didn't changed the damm yota in the decade. The very same problem why I threw away RH6/7 in past from production, the very same stupidity of RH, is still there.
  RH is only distro I have ever tried - and I tried many of them - would silently without any warning or prompt replace your config files with shipped version. It took them ages to learn that files can be renamed - yet it didn't went thru completely it seems.
  This is not a single mistake. This is happening now for more than a decade now: RH during maintenance can and does override your configuration. The RH folks simply have no trivial respect to their users...
  [/rants]
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Insightful)
    
    by nabsltd ( 1313397 ) writes:
    
    RH is only distro I have ever tried - and I tried many of them - would silently without any warning or prompt replace your config files with shipped version.
    First, it doesn't do this without any warning...the output of rpm (which does the actual install) is forward to yum, or rhn, or whatever is running the "figure out everything I need and get it" process, and that is displayed to you when you are applying the patch. It clearly states in that output what happened with the file.
    Second, for some updates (particularly security updates like this one), it is appropriate to save the old config file and load a default one, especially if that default one helps provid
- Re: (Score:2)
  
  by hughesjr ( 734512 ) writes:
  
  I just pulled up the SRPM and looked, and bind-chroot has:
  %ghost %config(noreplace) %prefix/etc/named.conf
  %ghost %config(noreplace) %prefix/etc/named.caching-nameserver.conf
  %ghost %config(noreplace) %prefix/etc/rndc.key
  It should not replace that file with an .rpmsave file
- Re: (Score:2)
  
  by MichaelSmith ( 789609 ) writes:
  
  I use netbsd but a current openbsd or freebsd would be fine I am sure.
- Re: (Score:2)
  
  by betterunixthanunix ( 980855 ) writes:
  
  There is a good reason this was tagged, "kdawsonfud." The person who reported the problem had caching-nameserver installed, not just bind, which explains why we aren't seeing widespread outages; most people don't install caching-nameserver when they don't want a caching nameserver.
- Re: (Score:2)
  
  by X0563511 ( 793323 ) writes:
  
  Because I really have the time or want to work with a hand-installed package system, and build all the packages myself.
  The idea is that you TRUST the vendor. If you don't trust them, why the hell would you use their software at all?
- Re: (Score:2)
  
  by Sleepy ( 4551 ) writes:
  
  Did the OP have the package caching-nameserver installed? If so, that packages whole point is to change the bind configuration into doing just caching.
  I PAY REDHAT GOOD MONEY FOR THIS!
  I don't need you implying that they can't prevent my mistakes, or read my mind.
  (joking, but look at all the "if this were Microsoft.." people skimming right OVER this fact. You saw it, I saw it, and that should be enough to shut them up... and never mind the bad practices by the submitter. He installed the WRONG PACKAGE, folks

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

You didn't test before deploying an update? (Score:5, Insightful)

Re:You didn't test before deploying an update? (Score:5, Insightful)

Re:You didn't test before deploying an update? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3, Insightful)

Re:You didn't test before deploying an update? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2, Insightful)

Re:You didn't test before deploying an update? (Score:5, Insightful)

Re:You didn't test before deploying an update? (Score:4, Insightful)

Re:You didn't test before deploying an update? (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:3, Informative)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

A schoolboy error? (Score:5, Insightful)

Re:A schoolboy error? (Score:5, Informative)

Mod parent up! (Score:4, Insightful)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2, Informative)

Re:Mod parent up! (Score:4, Informative)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:A schoolboy error? (Score:4, Informative)

Re:A schoolboy error? (Score:4, Insightful)

MS (Score:4, Insightful)

Re: (Score:2)

yea smartypants (Score:2)

Re: (Score:2)

Re:MS (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

bug details (Score:5, Informative)

Re: (Score:3, Informative)

Re:bug details (Score:5, Informative)

Parent should be modded up (Score:2, Informative)

Re: (Score:2)

Re:bug details (Score:4, Insightful)

Um... (Score:5, Funny)

Re: (Score:2)

argh (Score:2, Insightful)

That why they get paid (Score:3, Insightful)

Re: (Score:3, Insightful)

No worries (Score:2, Interesting)

Re: (Score:2, Insightful)

What about Debian's OpenSSL bug (Score:2)

You are WRONG :D (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Insightful)

Re:You are WRONG :D (Score:5, Informative)

Test your patches (Score:3, Insightful)

Re:Test your patches (Score:5, Insightful)

Re: (Score:2)

What are those admins supposed to do .. (Score:2)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Red Hat's been kind of iffy lately (Score:2, Informative)

Experienced Monkeys... (Score:2, Insightful)

Common Red Hat Mistake (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Well (Score:4, Insightful)

Re: (Score:2, Informative)

Don't forget! (Score:5, Informative)

Configuration management (Score:3, Interesting)

Re: (Score:2)

Suprised... (Score:2, Redundant)

At least... (Score:2)

you were hit ? (Score:2)

question (Score:2)