Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Microsoft IT

The Setup Behind Microsoft.com 412

Toreo asesino writes "Jeff Alexander gives an insight into how Microsoft runs its main sites. Interesting details include having no firewall, having to manage 650 GB of IIS logs every day, and the use of their yet unreleased Windows Server 2008 in a production environment.
This discussion has been archived. No new comments can be posted.

The Setup Behind Microsoft.com

Comments Filter:
  • by 140Mandak262Jamuna ( 970587 ) on Thursday December 13, 2007 @12:35PM (#21684805) Journal
    I vaguely recall MSFT had to outsource load balancing to Akamai which used Linux boxes to redistribute the incoming traffic at some point in the past. Looking at Netcraft.com, it shows some subdomains of microsoft.com resolved to Linux boxes before the year 2000. So it is able to get out of the sandbox now? Is that the main story?
  • by Anonymous Coward on Thursday December 13, 2007 @12:41PM (#21684905)
    Anyway it looks quite impressive. I still don't understand how to handle 650 GB of logs :-).

    My question is why are the logs in ASCII text format? When all you want is say the IP [4 bytes], time of day [4 bytes], URI, referrer and return code [do you really care about their browser strings? You are MS after all, just assume it's IE].

    Storing an IP as text requires on average 15 bytes, so right there you can shave off 11 bytes with a binary IP. Time of day is worse, a date+time string is like 25 chars. Doesn't seem like much, but multiply the 32 bytes per entry you save by say 50 million hits and that's 1.5Gbyte you saved. That's not counting the white space you can remove, and a simple huffman code you could apply to the URL/referrer.

    Heck, just piping the binary IP/date and ASCII URL/referrer through gzip [or use libz's gzPrintf() etc...] could make a large difference as well.

    Point is, bragging about 650GB/day logs is not really impressive when you're "doing it wrong" (tm). That's like bragging about how much you cut your face while shaving.
  • by loconet ( 415875 ) on Thursday December 13, 2007 @12:43PM (#21684945) Homepage
    Interesting, I thought I was the only one. Why is it that every time I read about Microsoft related technology it's always an acronym salad. Not even commonly used acronyms either, they use acronyms for their own way of calling technology xyz. It's almost like they do it on purpose ..
  • by Amouth ( 879122 ) on Thursday December 13, 2007 @01:22PM (#21685549)
    i resent that - i personaly feel that xp and server 2003 have next to nothing in common with each other - XP is annoying crap - server 2003 on the other hand is quite nice and one of the first server implementations i have seen MS push out that i actualy look forward to installing on something - because it realy does jsut work. 2008 seems intresting but i am going to hold off migration till 2003 is in the stages to stop reciving updates.
  • by ashridah ( 72567 ) on Thursday December 13, 2007 @01:24PM (#21685569)
    Which we do on a regular basis. Every few weeks I see emails going around from higher-ups asking us to test their team's RC or beta stuff at home for them, and the project I'm working on has been dependent on VS2008 since beta2. Everyone here has their favourite project they like to keep tabs on. I've got longhorn server 2008 running on one of my machines here.

    That said, the choice to use longhorn server in production isn't actually a bad one. It's really, REALLY stable. I keep hearing (from people both inside and outside the company) that it's more stable than 2003 is (and 2003 has the benefits of multiple service packs). It's also a lot more configurable about what it runs, and how much of it it enables when it's installed. I wouldn't bet the entire stable on it, but I'd be willing to put money on it getting a place.

    All in all, it's pretty sweet, if you look at it from the sysadmin perspective. Also, the stuff you can setup when you couple it with vista is really nice (from a security standpoint, particularly). That said, some of that functionality is being backported to XP with SP3 or whatever.

  • Re:But generally.. (Score:3, Interesting)

    by Junta ( 36770 ) on Thursday December 13, 2007 @01:29PM (#21685621)
    The thing that's really troublesome here is, I don't think the person writing the article would care to mention that detail, at least not outside the ports IIS serve users on, which are the only ones he thinks matters. On the externally available ports that should be publicly available, there is *zero* applicability for stateful rules, particularly when you have external parties already tracking obvious DoS for you. For other ports (for example a port out of the IANA range), I wouldn't be surprised to find out they do have stateful inspection to allow traffic associated with an outbound connection in. The problem being their networking equipment might make it a transparent default. Of course, if they are running 100% microsoft software bottom to top, they may never even need to contact an external update server and forgo that entirely, something >90% of the world can't do, and is still a moot point with respect to how 'bulletproof' their server setup is.
  • Re:But generally.. (Score:2, Interesting)

    by AK Marc ( 707885 ) on Thursday December 13, 2007 @01:34PM (#21685701)
    If you tried to sell a stateless filter as a "firewall" today, you'd be laughed out of the market.

    Most of the low end routers claimed "firewall" when they did nothing other than nat. Though now someone else wrote code that runs on their Linux core so they have firewalls they didn't have to pay for. But what you are saying is that a filter firewall is a firewall under every documented definition of the word, but wouldn't sell well because people expect stateful operation. That sounds like you are violently agreeing. The first thing that comes to ones mind isn't the only correct answer. Otherwise, horses can no longer be mustangs, since if you mention someone went out and bought a mustang, nearly all people would picture a car and not a horse. Language doesn't work that way.
  • by misleb ( 129952 ) on Thursday December 13, 2007 @01:44PM (#21685865)
    Ok, but is the OS *still* organized like crap? I mean, is C:\Windows still a dumping ground for a bunch of arbitrarily named data files, log files, drivers, and libraries using, for the most part, the old 8.3 naming convention?
  • by module0000 ( 882745 ) on Thursday December 13, 2007 @01:44PM (#21685869)

    Isn't that just you announcing your ignorant of which tools to use? Are you that kid in gym class that was always trying to put his shoes back on without untying them, rather than take the seconds to untie/re-tie he'd stomp himself around the locker room for minutes until they fit right. Oh and, how long would it take you to create and print a tri-fold pamphlet using sed? Perhaps you're the problem, not the app.
    Damn straight. It would have taken him just as long to attempt the same operation in Linux, using OpenOffice. He's a tard for using a "full featured word processor" for a "simple find and replace". That's like using a pneumatic jack hammer to put in my 2-man camping tent spikes, and complaining that the setup and take down of my "spike-putter-in device" was far too excessive compared to the linux-rubber-mallet. What a fucking retard.

    The sad part is that despite your perfectly good retort and explanation to the gym-class idiot, he probably read a quarter of your post, mentally tagged you as a MS fanboy, and kept giggling. Makes all the non-idiotic GNU/Linux advocates look like idiots standing next to him.
  • Back In The Days (Score:1, Interesting)

    by Anonymous Coward on Thursday December 13, 2007 @02:11PM (#21686271)
    I heard that Back In The days, Microsoft were using FreeBSD for their outward-facing servers, hacked-up to look exactly like Windows NT (for that was the product they were selling at the time).
  • by Tacvek ( 948259 ) on Thursday December 13, 2007 @04:45PM (#21687868) Journal

    "Home" is really more of a "Workstation lite", with a lot of the workstation features disabled

    Alternately, you can think of "Home" as the successor to Windows ME, with an NT kernel. I'll try to do this schematically (WKS = Workstation, SVR = Server, and some other weird abbreviations used to make the alignment work):

    Wind. 98 --> Wind. ME --> XP Home --> Vista Home
    NT 4 WKS --> 2000 WKS --> XP Prof --> Vista Ultimate
    NT 4 SVR --> 2000 SVR --> SVR 2K3 --> SVR 2008
    In reality, things are a lot more complicated, because there are other editions, Win 2K Advanced Server, x64 editions, and God knows how many variants of Vista. (Maybe "Vista Business" is a better fit than "Ultimate" above too.) In addition, a lot of people who were or would have been in the 95/98 line moved to the "Pro" line for XP. But, for most people, things probably progressed as indicated.

    While that is more or less true, consider that tere are really only three main OS Codebases in Microsoft now. Windows NT (non server, the current offering is various form of Vista, as well as XP until they discontinue it). Windows server (a very close relative to the NT series, but optimized for server environments, and multi-processor usage.) Those two code bases are close enough that they share binaries (when on the same architecure) and they could even be used for the opposite purposes with only minor difficulty.

    However Windows CE codebase is a bit different. It is still distinctly Windows, but Executable compatibility with the NT series is rare. (That is due in large part to the fact that most CE devices seem to be platforms other than x86.) Interestingly it is possible to create .NET apps that run under CE and modern NT. Since the desktop Framework is largely a superset of the compact framework, the desktop assemblies get used, so code using only .net compact framework and no CE specific assemblies will run just fine on a desktop system.

    Now you may notice that there are also some special sub-codebases. For example there is the NT Embedded codebase (seen as Windows XP Embeded), and the NT PE versions

  • by lena_10326 ( 1100441 ) on Thursday December 13, 2007 @04:51PM (#21687970) Homepage
    I should have included this in my previous post. A real world example (1Kb for storing a URI path and 2Kb for a full URI) would drive home the point even more. Just for shits and giggles let's do something closer to a real example.

    Fixed binary

    [IP address] [Timestamp] [Method] [Path(/path/to/script.cgi)] [HTTP Version] [Return Code] [Referrer(http://from.domain.com?file.html)]

    4 + 8 + 1 + 1024 + 1 + 2 + 2048 = 3088 bytes * 1000 = 3,088,000 bytes

    Variable text

    [IP address] [Timestamp] [Method] [Path(/path/to/script.cgi)] [HTTP Version] [Return Code] [Referrer(http://from.domain.com?file.html)] [EOL]

    16 + 15 + 5 + 512 + 3 + 3 + 1024 + 1 = 1579 bytes * 1000 = 1,579,000 bytes

    Let's add one more variation: variable length binary records. Maybe that will offer some savings.

    Variable binary format

    [IP address] [Timestamp] [Method] [Path Len] [Path] [HTTP Version] [Return Code] [Referrer Len] [Referrer]

    4 + 8 + 1 + 2 + 512 + 1 + 2 + 2 + 1024 = 1556 bytes * 1000 = 1,556,000 bytes

    Pretty good, some savings over variable text; however, we now lost the ability to edit, head, tail, or do anything useful with command line tools. Not exactly worth it for a 1% gain. Oh yes, don't forget gzip will compress ASCII text better than binary because it'll drop the 8th bit on every byte so you'll automatically pickup a built in 12.5% gain with ASCII files which blows away the 1% gain of variable binary format.

  • by nuckfuts ( 690967 ) on Thursday December 13, 2007 @05:15PM (#21688400)
    Interestingly, I noticed that when pre-GUI disk checking occurs on Server 2008 it says "Windows Vista" at the top of the screen.

    At least this is true with the version I'm testing - June 2007 CTP (Community Technology Preview). I expect in later versions this will be obscured.

  • by ashridah ( 72567 ) on Thursday December 13, 2007 @06:56PM (#21690072)
    Because at least Unix has conventions.

    Conventions are a nice way of saying "that's the way it's always been, so that's the way it stays." Windows has similar problems left over from legacy, going all the way back to CP/M. Yes, this sucks, but so does some conventions in unixland. Just ask a Solaris 10 admin how much it sucks when your upstream vendor breaks decades-long convention.

    Really? Ok, lets open up C:\Windows on one of our Windows servers. Hmmm a folder named "$hf_mig$". I suppose you know what that means or what convention that follows? Or C:\Windows\adam. Kinda looks like it might be some directory tools. Maybe ADAM = Active Directory AdMinistration? What's that doing there anyway? I could keep going down the list. I suppose there is a very good reason why there are .BMP files in C:\Windows? Desktop wallpapers? Come on. I wonder if they're related the other brilliantly named files such as SET2.tmp and SET3.tmp in that same directory. And don't get me started on the insanity that is C:\Windows\System32. Hardly a single file/folder that doesn't use 8.3 naming. I haven't clue what have that stuff is doing there.

    You're not looking in the right place. Microsoft, love it or hate it, worked out a long time ago that 'filename' and 'metadata' aren't necessarily the same thing. The filename and path are just handy locational indexes, and don't necessarily need to mean *anything*. Sure, a DLL can, and often, for newer stuff, IS far longer than 8.3, but it wasn't until later versions of NT (3.5/4.0, I don't remember my history too well) that support for it kicked in well enough, and there's some legacy stuff around. You don't break legacy just because it's fun. Microsoft gets this right, even if they had to tread over it a fair bit in vista, and add some nasty hacks to deal with most of the fallout.

    Anyway, as I was saying, you're not looking in the right place. Case study: C:\windows\system32\apss.dll: Microsoft(r) InfoTech Storage System Library.
    Problem solved. (it's not at all difficult to use something like powershell (or possibly other tools) to just print this out in a souped up version of ls with a little scripting, I might add, just like I can do a few similar scripting tricks on my debian system to tell you who owns the copyright to 90% of .so's in /usr/lib.)

    Want another one?

    c:\windows\System32\bitsigd.dll: Background Intelligent Transfer Service IGD Support

    Oh look, another one, fully named.

    Of course, this starts to fall down when the file doesn't contain metadata, but that's a problem for, say, XML schema files in /usr/share/ on linux too. The organisation might be a bit better, but not by much. The saving grace there is that I have dpkg to work shit out for me. .NET goes even further. You can register as many different versions of a namespace as you like, and .NET will do the mapping for you if you request a specific version.

    First of all, I was only talking about superficial organization. And if you want to see something nice, have a look at OS X some time. Not only is the System (/System) well organized, but most applications are neatly self contained in /Applications/Some.app. They usually don't spew files all over the place when installed. You know where the term DLL Hell comes from, don't you?

    Yes. I do. .NET does a good job of solving this quite nicely. Adds public/private keys into the mix too, plus a bunch of other mechanisms. .NET isn't just for C# either. It deals with VB, C++, and (ahahahha) J# too.
    I will admit that the mac platform is neatly arranged, but their QA seems to have gone to the toilet right now. A place that windows' QA has emerged from rather nicely, I should mention.

    As for random stuff appearing in random places, try dealing with commercial software. Even on linux, the developers will put shit in strange places. Open
  • by misleb ( 129952 ) on Thursday December 13, 2007 @07:39PM (#21690724)

    You're not looking in the right place. Microsoft, love it or hate it, worked out a long time ago that 'filename' and 'metadata' aren't necessarily the same thing. The filename and path are just handy locational indexes, and don't necessarily need to mean *anything*.


    But you can have both... Metadata and reasonably named "locational indexes". Is it so strange to think that people, particularly administrators, might want to have some idea what a file does and why it is there just be noting its "locational index?" I see this is a significant flaw in the design of Windows. And then there is the Registry, of course. Who would have guessed that users might actually want/need to edit it manually. Certainly not Microsoft. That is just poor planning on their part and I won't excuse it.

    You don't break legacy just because it's fun. Microsoft gets this right, even if they had to tread over it a fair bit in vista, and add some nasty hacks to deal with most of the fallout.


    You can break legacy. It isn't fun, but it doesn't have to be disastrous either. Apple did it with OS X. And then they did it again when moving from PPC to x86. The only reason Microsoft can't do it is because they've got so much inertia. And it will be their downfall. Though it would probably help if Microsoft didn't wait 4-5 years between major releases (more granular change). Even if Microsoft did want to break legacy, everyone has gotten so used to the old flaws that they can't change. Vista might well be awesome. But the reality is that many people will still be running XP even 5 years from now. Apple, on the other hand, has gotten people accustomed to significant changes.

    As for random stuff appearing in random places, try dealing with commercial software.


    Fortunately I don't have to much on Linux. I will admit that much of the mess in Windows is as much the fault of developers as it is with Microsoft. But that distribution of responsibility doesn't make using and administering Windows any more pleasant.

    We can't be responsible for what third parties do, however. Neither can apple (I just *love* dealing with adobe's software on apples, btw. Or Zend Developer Framework. mmmhm. ) Nor you. Install maya on linux sometime. Or matlab, or something else that you can't fuck with the organisational structure of, because the licensing server would crack the shits.


    Indeed, Adobe does make a mess out of a Mac, that is for sure. Fortunately, the majority of applications I use on the Mac just drop right into /Applications without having to run instalers or uninstallers or worry about random libraries and temp files showing up in /System/Library. Apple has done a MUCH better job of encouraging reasonable software design... at least as far as logical distribution of application data. Microsoft could learn a lot from Apple, methinks.

    -matthew

After an instrument has been assembled, extra components will be found on the bench.

Working...