Slashdot Log In
RHN Bind Update Brings Down RHEL Named
Posted by
kdawson
on Fri Jul 18, 2008 08:14 AM
from the remind-me-of-your-name-again dept.
from the remind-me-of-your-name-again dept.
alexs writes "Red Hat's response to update bind through RHN, patching the DNS hole, made a fatal error which will revert all name servers to caching only servers. This meant that anyone running their own DNS service promptly lost all of their DNS records for which they were acting as primary or secondary name servers. Expect quite a few services provided by servers running RHEL to, errr, die until their system administrators can restore their named.conf. Instead of installing etc/named.conf to etc/named.rpmnew, Red Hat moved the current etc/named.conf to etc/named.conf.rpmsave and replaced etc/named.conf with the default caching only configuration. The fix is easy enough, but this is a schoolboy error which I am surprised Red Hat made. Unfortunately we were hit and our servers went down overnight while RHN dropped its bomb and I am frankly surprised there has not been more of an uproar about this."
Related Stories
[+]
Technology: Paul Vixie Responds To DNS Hole Skeptics 147 comments
syncro writes "The recent massive, multi-vendor DNS patch advisory related to DNS cache poisoning vulnerability, discovered by Dan Kaminsky, has made headline news. However, the secretive preparation prior to the July 8th announcement and hype around a promised full disclosure of the flaw by Dan on August 7 at the Black Hat conference has generated a fair amount of backlash and skepticism among hackers and the security research community. In a post on CircleID, Paul Vixie offers his usual straightforward response to these allegations. The conclusion: 'Please do the following. First, take the advisory seriously — we're not just a bunch of n00b alarmists, if we tell you your DNS house is on fire, and we hand you a fire hose, take it. Second, take Secure DNS seriously, even though there are intractable problems in its business and governance model — deploy it locally and push on your vendors for the tools and services you need. Third, stop complaining, we've all got a lot of work to do by August 7 and it's a little silly to spend any time arguing when we need to be patching.'"
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
You didn't test before deploying an update? (Score:5, Insightful)
So, you didn't test the update on a non-production server? Just install any old patch and let it take your network down? Who do you work for again? I have to make sure not to do business with that.
Re:You didn't test before deploying an update? (Score:5, Insightful)
Actually, I caught the error just from looking at the output of up2date/yum. It clearly said named.conf saved to named.conf.rpmsave. So all you have to do is compare what changed, implement any changes and copy named.conf.rpmsave over named.conf.
Just as I said on the day of the release, be careful, don't just blindly update things.
Parent
Re:You didn't test before deploying an update? (Score:4, Insightful)
No kidding. The only "schoolboy error" as the submitter put it, was not testing the patch on a non-production server before deploying it on a production DNS server.
Parent
Re:You didn't test before deploying an update? (Score:4, Insightful)
And the rest of slashdot would be all over MS admins who blindly update their systems from AutoUpdate.
I find it really hard to believe you don't have at the very least a strawman test system. The fact that you don't says volumes.
Parent
Re:You didn't test before deploying an update? (Score:5, Insightful)
>You know, not everyone has non-production servers. Every server we have IS production. And if you are paying for Red Hat Enterprise, you expect Red Hat to have tested these updates themselves. If this was a Microsoft error, Slashdot would be all over Microsoft for allowing this to happen.
You are wrong; stop whining. You're just painting yourself as misinformed.
1) The updates WERE tested. .conf file, as is usually needed when some new line is added to a server .conf. This is SO NORMAL you'd have to be a n00b to get bitten!
2) The admin installed "caching-nameserver", then configured his install to act far outside the default.
3) He allows automatic updates straight into production. So do you it seems. Good luck with that! RHEL documentation says to not do this, but you're a bigshot "paying" for something different. I suggest you get a sidekick, and stick to the Windows side of your "enterprise".
4) He didn't revert his
Your MS comparison is apples and oranges. If this guy did TEN MINUTES worth of testing he'd realize something's up, and he could revert the rpm package. How many MS updates prohibit uninstall? Quite a few!
In Windows, you can't diff the before & after config, since Windows admins would rather be blind to what they're installing, since that's the norm and it's accepted.
Parent
Re:You didn't test before deploying an update? (Score:4, Insightful)
You know, not everyone has non-production servers. Every server we have IS production.
Well, there's your problem right there...
Parent
Re:You didn't test before deploying an update? (Score:5, Insightful)
Perhaps it is his problem, but not his fault. Sounds like he's in the dreaded zone where IT is a necessary evil, not a department that can help leverage the business.
He gets what he needs, or just barely what he needs. When management hands you crap, you learn to make crapade.
Parent
A schoolboy error? (Score:5, Insightful)
What? And isn't it an error of similar proportion to upgrade your primary DNS servers without first testing the new install?
Re:A schoolboy error? (Score:5, Informative)
Note as well that the initial release included a default conf file which specified a fixed source port, which of course breaks the fix.
[Updated 10th July 2008] We have updated the Enterprise Linux 5 packages in this advisory. The default and sample caching-nameserver configuration files have been updated so that they do not specify a fixed query-source port. Administrators wishing to take advantage of randomized UDP source ports should check their configuration file to ensure they have not specified fixed query-source ports.
Personally I'm surprised there's not been more uproar about the requirement to move internal DNS servers (yes, that means your Windows Domain Controllers in most corporate environments) outside any NAT'ing devices (eg: firewalls), as many NATs also break the fix by rewriting outbound UDP DNS queries to use the same or incremental source ports, which also breaks the fixes. Anyone here moved their AD outside the firewall?
Parent
Mod parent up! (Score:4, Insightful)
Of course until we get details of the hole and fix we cannot be 100% sure but it is very likely that exposing predictable port numbers (which the fix randomised) reintroduces the hole.
If DNS software vendors had a year's notice then why didn't the NAT firewall vendors. They could have introduced a patch at the same time.
Parent
Re:A schoolboy error? (Score:4, Informative)
Hand off DNS queries emerging from AD servers inside your firewall to caching-only servers in your DMZ. I have all my AD servers on RFC1918 IP numbers with no NAT, because they strike me as devices I'd prefer to keep as far away from the big bad Internet as possible.
ian
Parent
Re:A schoolboy error? (Score:4, Insightful)
IMHO, rhel should have tested this.
'Course they should. Nobody said otherwise.
I'm not sure what you're getting at with building from sources. Seems like overkill and doesn't solve the main problem because you can still screw it up. All anyone's saying is that you should test this on a server that you don't care about, or at least test it on one, before upgrading all of them.
Parent
MS (Score:4, Insightful)
If it was a Microsoft product, we'd all be carrying pitchforks and torches....
Re:MS (Score:5, Informative)
MS08-037 was released on the same day, and was much loved by ZoneAlarm users :-)
Parent
bug details (Score:5, Informative)
Here's the bug details: https://bugzilla.redhat.com/show_bug.cgi?id=453340 [redhat.com]
One of the bug comments says: "Latest caching-nameserver renamed my named.conf to named.conf.rpmsave in /var/named/chroot/etc" - so this should mean that you can still restore the lost conf file.
Re:bug details (Score:5, Informative)
A caching name server IS one that does not have any zones and only looks up zones from the DNS root servers. It is a configuration error to install the caching-nameserver package on a machine that doing anything other being a caching name server.
Stupid admins have been complaining about this for 5 years
Parent
Um... (Score:5, Funny)
"I am frankly surprised there has not been more of an uproar about this"
That's because the entire Internets are now broken!
You are WRONG :D (Score:5, Interesting)
The user has misconfigured their DNS and has installed a package called, SURPRISE, caching-nameserver along with the other bind packages.
caching-nameserver IS just that, a caching-nameserver. It SHOULD NEVER BE installed on a DNS server that is used for Primary or Secondary DNS control. The bind packages do not in any way modify named.conf, but if you want a caching nameserver and if you have installed the caching-nameserver package, then you would EXPECT that it would replace the named.conf file.
The real question is, how does crap like this get posted as a feature article on slashdot.
Re:You are WRONG :D (Score:5, Informative)
I'm not familiar with the package in question, but I assume it also installed some binaries. If it found that there already was a configfile of that name, it should have asked what to do.
If setting up the caching-nameserver was a matter of changing config options, you don't need a package for that, you need a HOWTO.
I would hazard to guess that unfamiliarity with the package is the real root cause of this. From the package description for caching-nameserver-7.3-3 (which could be a very old version):
The file contents show:
And so there we have it - a package designed to install and maintain the very generic files needed to configure a caching DNS server. DNS server not included.
And sure - this could be a HOWTO. But making a package allows for quick-and-simple configuration. And since this kind of thing is so generic, it really lends itself to packaging. I disagree that it should only be a HOWTO.
Parent
Well (Score:4, Insightful)
Yeah, it's a silly mistake.
But you should be testing things like this first, and whenever you upgrade you should really be looking at/for all .rpmsave or equivalent files first to make sure nothing has changed in the meantime. Otherwise, you're just removing your config and replacing it with the default whatever happens. You should also be checking .rpmnew (or equivalent) each time to check that it hasn't changed in terms of syntax, defaults etc. (which, let's be honest, is quite likely for such an important update - especially given that we hardly know what the exact problem is yet). I wouldn't go so far as to suggest intimate analysis of packages while they are still packed unless the systems you are running are quite critical to the operation of a business.
Part human-error on RH's part (it happens). Part incompetence in not testing the updates yourself first. Chances are that if I were affected by this, I would catch it as part of "right, what did that package change?", or notice as part of usual testing later, and then just move the file. I probably wouldn't even bother to send RH a note.
If you have a DNS server, that suggests that there are reliant computers. As courtesy to all those reliant computers you HAVE to test changes and check carefully what they are doing first. If you were "stung" by it, it suggests you hit this problem on ALL your DNS servers and/or that you only have one DNS server anyway. To deploy packages like this on such a setup is just asking for trouble.
Don't forget! (Score:5, Informative)
Don't forget to check your named.conf on RHEL 5.x (and CentOS 5.x).
Make sure that any lines like
query-source port 53;
query-source-v6 port 53;
are commented out or deleted so that forwarded DNS queries come from random ports.
Restart BIND if necessary.
Not a bug, expected behavior (Score:5, Informative)
This sounds like how RPM's behaved as long as I can remember. It looks at three versions of a config file: #1 the one from the old package, #2 the one currently on disk and #3 the one in the new package. If the config file hasn't been customized (1 and 2 are identical), it moves the old file to .rpmold (if 1 and 3 differ) and puts #3 into place. If the config file has been customized, it checks whether 1 and 3 differ. If they haven't then nothing's chanced, the customized config file's still valid and it drops #3 in with the .rpmnew extension. But if 1 and 3 differ, then something in the config file may have changed and the customized config file may no longer be valid. But it's got customizations in it that the admin may need to refer to. So it outputs a warning message about what it's doing, moves the customized config file to .rpmsave and installs #3, and the admin's expected to have seen the warning and to merge their customizations into the new config file. You do watch for warnings and errors during the update, right?
In this case RPM is right, old named.conf files aren't valid. If they're based off RH's old stock config files, they have the source port locked and that disables much of the security fix. So the admins do have to check and modify their customized files before the system's finally ready (or at least RPM has to assume they do, since it can't know exactly what their changes were). That's exacerbated by probably having caching-nameserver installed, but I think a stock BIND install has a similar named.conf until you add your own zones to it.
I'd chalk this one up to admins who a) don't understand an inherent limitation of package-management systems (namely, it doesn't know why you changed something, only that you changed it), b) didn't watch the update process for errors, and c) didn't check the systems for functionality after the update.
Re:What kind of an idiot would...? (Score:4, Interesting)
On most (all?) other distros it works perfectly. I had Debian for ages in production (supporting piles of services) with apt-get update/upgrade running regularly. SuSE and Gentoo also do good job keeping you informed about changes in updates and if post-update human interaction is needed.
The crucial difference here is mindset of RH. It didn't changed the damm yota in the decade. The very same problem why I threw away RH6/7 in past from production, the very same stupidity of RH, is still there.
RH is only distro I have ever tried - and I tried many of them - would silently without any warning or prompt replace your config files with shipped version. It took them ages to learn that files can be renamed - yet it didn't went thru completely it seems.
This is not a single mistake. This is happening now for more than a decade now: RH during maintenance can and does override your configuration. The RH folks simply have no trivial respect to their users...
[/rants]
Parent
Re:New update? (Score:5, Funny)
Yes, as an official red hat representative, I can say that we can. All you need to do at this time is respond posting your server addresses and login credentials. We will fix it from there.
Ok, the login name is root and I use the default password: password for all our production machines.
Oh, I almost forgot. Our IP is 207.46.19.254
Please let our CEO know that I was the one who gave you this information.
Parent
Re:Test your patches (Score:5, Insightful)
What kind of environment are you in where you don't first test your patches that are going out to live production machines? Regardless of the fact that it is linux and not windows, you should always test your patches before you roll them production.
Disclaimer: I test first.
You know, lot of people work in small shops that can't afford multiple redundant servers. I suspect that business with a single DNS/web/mailserver are a lot more common than Slashdotters this morning seem to thing. What are those admins supposed to do? They're receiving a critical security patch from a trusted vendor, and I imagine a lot of them feel pretty safe applying that to their sole production server. This doesn't make them stupid or incompetent.
I have the luxury of lots of hardware that can fill in for other gear in a pinch, but lots of people don't. They don't deserve scorn for it.
Parent