Don Jones

Tech | Career | Musings

I’ve recently felt that Microsoft wasn’t investing as heavily in the Windows client OS as they were in the server. I know a lot happens under the hood that isn’t immediately visible, but at the same time, apart from the Windows 8 massive UI flip, it seemed to me that the client OS just wasn’t going anywhere.

I’ve also been a bit (pleasantly) surprised at Microsoft’s embrace of non-Windows operating systems, especially when it comes to mobile. Office on my iPhone! Mac subscriptions for Office 365! It’s as if these products were told, “look, don’t rely on the Windows monopoly to make you your money – go make it wherever you can find it.” A great decision, but not identical to what the company’s best known for.

Then I’ll see technologies like Desired State Configuration come along, which are being engineered almost from a purely server-focused point of view. It occurred to me that Microsoft’s new “cloud first” engineering, wherein they build features for Azure and other cloud-based services first, and later migrate those features to our on-premises products, might have an interesting connotation for the client. After all, Azure isn’t used to run Windows 8. It can, but that’s not it’s big deal.

So I started wondering, “what would a world look like in which Microsoft owned the back-end, and didn’t really care about the client?” I came up with a pretty plausible scenario.

This is going to take some explaining.

First, What We’ve Done Already

The client has had attempts made on its life in the past. “Thin client” was basically a way of saying, “the client does’t matter; we’ll run it all in the back-end.” Thin client as it has been done, however, has two problems: first, it doesn’t work well for disconnected machines like laptops on airlines; second, it really just moved the computing power from the desk to the datacenter, without really changing what was happening. In other words, we couldn’t reliably get better computing density with thin client. So it tends to be used for niche applications within an environment.

Density is the big deal, there. Remote Desktop Services (RDS) is great, but it can’t run every app. Virtual Desktop Infrastructure (VDI) can be great, but it ends up using more-expensive computing resources to do the same thing, mainly because now you’ve not only got to run the desktop, you’ve got to emulate the hardware as well. More on why what’s coming is so different, a bit further down.

Add a Drawbridge

That’s why the Microsoft Research Drawbridge project is so fascinating to me. I’ve taken a whack at explaining this before, but let me try at a higher level. This is necessarily going to involve some oversimplification, but for this discussion we’re concerned about the net result more than how it’s accomplished.

Traditional computers of all kinds have had a single consistent low-level API that interfaces with hardware: the BIOS. Linux, Unix, Windows, and Mac all run on essentially the same hardware, because they’re all programmed to talk to the same super-low-level APIs. Those operating systems all take wildly divergent approaches from there, but they all have a single common goal: to make it easier for developers to write applications.

Look, let’s be honest. Nobody cares much about what we call an OS. We care about what’s running on the OS. We want email servers, Web servers, database servers, word processors, spreadsheets, and whatnot. Yes, the user interface differs, but UI is really just another application running atop the OS. There’s no reason a Linux build couldn’t look exactly like Windows – and folks have made ones that come really close. So we need to define some new terminology:

  • The OS is the stuff we don’t actually care about, like how bits get on and off of disk, in and out of memory, and so on.
  • The personality is the stuff we do care about, including application frameworks, APIs, UI, and so forth.

Today’s operating systems combine both OS and personality: when you install Windows, you not only get the low-level stuff, but also the visuals, the frameworks, the APIs, and all that. Under the hood, there’s actually somewhat of a separation between the two. For example, most of the personality is provided by the 100k+ Win32 APIs, which themselves talk to the lower-level don’t-care-but-need-to-have-it bits.

Drawbridge is an attempt to more firmly separate the two. As an OS, you can think of it is a kind of souped-up BIOS firmware package. It knows how to talk to hardware, and it might know how to do things like authenticate you and hold auth tickets in memory. It could probably talk to the network and perform other important “I don’t care how you do it, just get it done” tasks. It then exposes those capabilities as a set of services using a standardized API – just like the BIOS of today exposes its services via an API. Frankly, most developers probably wouldn’t want to write to that low-level of an API, any more than they do today.

So atop that core OS, you build personalities. Your Windows personality might include all of the Win32 APIs, reprogrammed to talk to the Drawbridge OS instead of two whatever’s underneath them today. You could make a Linux build that did the same thing. When an application runs, it simply links to the personalities it was built for.

A modern Web server provides a good example of this. Think of the Web server as the low-level OS, exposing its capabilities through APIs like FastCGI. On top of that, you could add personalities like PHP, Python, or ASP.NET. Developers would then build apps on those. When you needed to “run” a page like Users.php, the Web server loads up the PHP “personality” and then runs the Web page. Drawbridge is the same idea, only on a larger scale. Linux processes run next to Windows ones, without the need for a traditional virtual machine. Yes, it would be a specialized version of Linux or Windows, but there’s no reason the Drawbridge APIs couldn’t be made public, so that all “personalities” could be written for it. Provided Drawbridge stayed low-level enough, everyone could probably agree on core services it would provide to upper-level personality stacks. And frankly, from Microsoft’s perspective, it wouldn’t matter terribly much if Red Hat or Apple got on board; Microsoft could write personalities for whatever API stacks their customers wanted. Nothing stopping MS from refactoring, say, CentOS, right?

Talk About Flexible

One service a Drawbridge OS could provide is the ability to skim off the user interface of any running process, serialize that into a data stream, and send it to a remote client. This is basically what Remote Desktop Protocol (RDP) does today. After all, at a low level, application data eventually has to make it to a device driver for display on the screen. Intercept at that level, and send the “drawing instructions” to a remote client, and let the remote client draw the user interface instead of the machine running the process. That’s exactly how RDP works, in fact (and so do ICA and other remote-control protocols). Sound and other services could be done the same way. After all, they all end up interacting with hardware, and if the low-level OS is in control of the hardware, it could simply send the instructions to remote hardware.

That means individual processes become less tied to the hardware they’re running on. Just as we migrate VMs today, we might migrate processes tomorrow. API stacks like Win32 might need to be modified to understand that the “user profile” wasn’t a set of local folders, but was rather a data store on a SAN someplace – or maybe even spread across several places. We have the beginnings of that today with the folder redirection stuff, right? The APIs take care of finding the profile’s physical location, so that apps just ask for a “Documents” folder and get what they want with no worry.

Think about it: you might have some global application directory where users could fire up the apps they needed. Office applications run from a set of servers in your datacenter, while other apps might run in a hosted provider. They’re all real apps, with their UI skimmed off and sent to the user’s computer. This is massively different from VDI, because each user’s activity could be spread across a huge number of physical process hosts; it’s better than RDS, because each process would be more self-contained. You wouldn’t be constrained to a single “OS,” because a single user could run apps from many different “personalities” side by side.

And There Endeth the Client

And at that point, the “client” is pretty much just a screen, keyboard, mouse, and so forth. It doesn’t necessarily run an “OS.” It’s thicker than today’s thin clients, but not as thick as today’s client operating systems. The client needs to know how to get a list of applications, how to authenticate the user, and a few other basic tasks. From Microsoft’s perspective, it might as well be iOS as Windows – wouldn’t matter, because neither client OS would really be contributing much in the way of functionality. The personalities would still be important, but only admins and developers would worry about those. “Bring Your Own Device” suddenly becomes a lot less scary.

And what about disconnected clients, like a laptop on an airplane? Well, with a capable enough device – think “laptop” instead of “smartphone” – you’d simply run the small, low-level “library OS” on the client. You’d migrate the user’s processes to that host, perhaps migrating over some profile data to local storage. Logically, not that different from what some VDI schemes do today, except that, again, you’re moving processes. The underlying Drawbridge OS provides security and control (that’s an inherent part of it’s design, if you read up on it), but you’re not running anything like the full Windows OS of today. And you could still run multiple different personalities side by side. This is so similar to a mashup of today’s VDI and stuff like App-V – but without all the caveats those imposed, if it’s done right (and all evidence in the past couple of years is that Drawbridge is being done right).

Microsoft’s focus would doubtless shift over to “use our personalities right? That way they maintain their Windows lock-in? Well, maybe, maybe not. Personality API stacks could become less important. As a developer, you’d simply pick the one you were most familiar with and that best suited the task at hand. I could see a proliferation of task-specific personalities, each smaller and more specialized than today’s more general-purpose client operating systems. One “personality” might simply be the equivalent of a Web browser, for example, capable of running HTML+JS applications. To the user, apps could all look the same. It wouldn’t be like starting up a Linux VM, and a Windows VM, and a Mac OS X VM – the apps would all run next to each other, and likely have better interaction.

Imagine every decently equipped computer being able to run every client operating system ever built, side-by-side, without the overhead that today’s hypervisors impose. When you can have all the clients, and their applications can interact (through the low-level OS) as needed, then which client you choose doesn’t matter. You choose “all.”

But Years Away…

I think it’s a fascinating possibility. It’s obviously years away, and I’m doubtful it’d look exactly like this. But it’s an interesting strategy, isn’t it? If they went through with it, Microsoft could focus on controlling the back end – an area where they’ve obviously been making massive investments in the past few years. The “desktop” goes away, and the client just becomes a delivery mechanism. It’s thin-client computing all over again, but in a way that could actually make sense. The “Windows vs OS X vs Linux” argument would become kind of meaningless, because you could have it all.

Administrators wouldn’t have to worry about desktop management. There wouldn’t be anything to manage. That’s very much unlike today’s VDI approach, which simply moves the desktop without markedly changing how it’s managed. Many of today’s apps would run unaltered, provided the underlying personality API stack looked the same to the app. That’s what MS Research has done with Drawbridge, in fact – refactored enough Win32 APIs to get Office running on it.

…or Closer Than You Think

In fact, Drawbridge might be closer. In early 2013, Microsoft said they were moving ahead with implementing Drawbridge on Azure. Now that makes a buttload of sense. Remember, when Azure first launched, you were meant to run web sites on it, not virtual machines. Problem is, bigger implementations needed the “full control over the machine” that a VM offers and that a mere web site lacks. But if Drawbridge was a base OS in Azure… wow.

You could run any application as a process. Sure, MS might need to provide the personalities. Windows would be straightforward and obvious; they could choose to do Linux personalities if they wanted. They’d get much better application density per host without the overhead of emulating hardware and multiple low-level OSs, meaning they could have more competitive pricing than VM-hosting services. That’d lead to on-premises Drawbridge, giving you the ability to migrate individual processes from your datacenter to the cloud.

Let’s Do Some Comparison

I feel compelled to contrast the Drawbridge approach with VDI and RDS, because both of those have been, at one time, strong contenders in the “thin client” space… contenders that have, so far, seen only fairly limited implementations.

Let’s tackle VDI first, because it’s never, ever, ever been “thin client” or “no client.” You still have a completely thick client, just running someplace else. You also have to have a thin client to receive the UI. That thick client is the thickest possible client, in fact – it’s not only a full client OS and applications, but also emulated client hardware. VDI doesn’t even do a fantastic job of getting applications “on every device,” because you’re just remoting into a VM. If you’ve ever tried to use Windows 7 via RDP on an iPad, you know how non-compelling it is. I know VDI has some value in certain scenarios where relocating the hardware is really what you’re after – college labs, kiosks, and so forth come to mind – but it was never a play to minimize the client.

With RDS, you’re still running a full, thick OS+personality. You get minimal sandboxing between applications, something that’s vexed Terminal Services admins for years. Some apps simply won’t run. A significant problem with RDS is that you were only skimming the GUI off an entire OS session. It took successive generations to get USB redirection and other things to work well. Of course, what Microsoft learned making all that happen will benefit Drawbridge: all of that redirection can be implemented in Drawbridge, at the low-level OS that’s actually touching the hardware. In other words, when a process says, “hey, I need to get to the USB port,” you don’t have to hack the OS to redirect that connection. The OS is the layer that was going to pass the data off to the hardware USB; there’s no reason it can’t simply direct that traffic elsewhere. In fact, servers could potentially need a lot less hardware, since processes could seamlessly use the hardware of the client machine that initiated the process in the first place. “Redirection of hardware signaling” at the super-low-level OS layer would be a core part of this.

A Bit Deeper

Drawbridge is philosophically a lot like a traditional VM. After all, the hypervisor intercepts hardware calls inside the VM and mediates that traffic to the physical hardware. That’s often done through synthetic device drivers since it all has to happen inside the faked-out VM. Drawbridge proposes the abstraction at the thread (process) level, rather than at the virtual CPU level. It utilizes I/O steams, not virtual device drivers. In other words, when an application says, “display this on the monitor,” that call falls through whatever API stack the application is written on (say, .NET) and eventually falls through to Drawbridge. Instead of drawing pixels on the screen via a device driver, Drawbridge… sends the traffic elsewhere, to a client machine that can draw it instead.

Microsoft is currently saying Drawbridge has fewer than 50 downfalls and around 3 up calls, meaning its API consists of around 50 total things. That’s low level indeed. It’s actually close to the number of API calls in a BIOS, which is what operating systems are already used to running on… which is probably why it only took a couple of years to get a partially-refactored Win7 running on it.

There’s a great video if you’d like to learn a bit more. Here’s a great quote:

While Drawbridge can run many possible library OSes, a key contribution of Drawbridge is a version of Windows that has been enlightened to run within a single Drawbridge picoprocess. The Drawbridge Windows library OS consists of a user-mode NT kernel–informally referred to as NTUM–which runs within the picoprocess. NTUM provides the same NT API as the traditional NT kernel that runs on bare hardware and in hardware VMs, but is much smaller as it uses the higher-level abstractions exposed by the Drawbridge ABI. In addition to NTUM, Drawbridge includes a version of the Win32 subsystem that runs as a user-mode library within the picoprocess.

Upon the base services of NTUM and the user-mode Win32 subsystem, Drawbridge can run many of the DLLs and services from the hardware-based versions of Windows. As a result, the Drawbridge prototype can run large classes of Windows desktop and server applications with no modifications to the applications.

In other words, it’s still the Windows kernel, without the bits that talk to hardware, because Drawbridge handles that. So it’s smaller. But it runs Windows apps without modification, because it’s still Windows.

Think of the Compatibility… and How it Kills the Client

Remember XP Mode in Windows 7? This basically gives any machine the capability to run any OS (t=hat has been refactored; Microsoft could certainly go back and do older versions of Windows. After all, there’s only about 50 API calls they have to seek out and modify, right? So again, the client stops to matter. Have an app that needs XP? Fine. You’re not really “running XP;” you’re running a process that calls on some code from XP. Not quite the same thing. You could also run, in parallel, Windows 7, Windows Vista, and Windows 8 apps. And Linux apps, potentially. All without needing to spin up resource-intensive VMs.

So you start to get to a world where Microsoft has a lot less incentive to continually revise the client OS… and where such revisions are more the form of a new .NET Framework version, not an entire OS. With core services provided by a smaller, more lightweight “bottom layer OS,” the upper-level APIs become less fragile. Drawbridge itself would likely need fewer updates than a full OS, too – after all, how often do you have to flash the BIOS on your servers? More than zero, but less than every Tuesday, I bet.

Caveats

Again, I’ve indulged in some oversimplifications, and offered some projections based on Microsoft’s direction rather than their results to date. For example, Drawbridge itself isn’t currently a standalone OS. Rather, it’s been implemented to run – experimentally – on Windows and on Barrelfish, a new from-scratch OS being created by Microsoft Research. That’s partially because Drawbridge is, as yet, very new; you can certainly see how factoring some of what we currently call “OS” into it would make it the OS, with the “library OSs” (what I’ve been calling “personalities”) atop it. Today, Microsoft described Drawbridge as a “pico process,” meaning it’s built from a traditional OS process (like one running on Windows or Barrelfish or something else), but with the “traditional OS services removed,” meaning Drawbridge provides those services instead. The technical details get a bit esoteric at some point; for this article I was aiming more for high-level vision.

Oh, and the Hardware

Speaking of Barrelfish, and of Drawbridge, we might take a moment to think about what’s happening to hardware to make both of those things so compelling. Intel has been hard at work developing new busses and technologies that can more or less separate all of our traditional resource elements: CPU, RAM, network, and storage.

We’ve always had storage separated, because it’s always used an independent bus. From short-run copper SCSI arrays it really hasn’t been a massive leap to independent SANs reached by running the SCSI protocol over Ethernet – that is, iSCSI. Now, storage is a “black box,” and you add to is as needed.

Intel wants to do the same for processors and memory, two resources that have always been more tightly coupled. Busses like Light Peak will eventually offer enough bandwidth that processors and memory can be more physically disconnected. Slap a controller between them – a la the front-end controller of a SAN – and you can have giant pools of CPUs talking to a giant black box of RAM. Dynamically assigning RAM from CPU to CPU becomes easy. New incoming tasks are passed to a front-end controller, which selects a CPU with some free time. Migrating processes becomes instant, because you basically just assign a process’ memory to another CPU. You don’t move anything per se; you just have another CPU start “working” that section of the memory farm.

Technologies like Barrelfish are designed to deal with that level of scalability in more novel ways than past operating systems. Approaches like Drawbridge, which start to isolate processes rather than entire VMs, make for more granular workload assignment. Instead of needing a CPU that can run an entire VM just to boot your copy of Minesweeper, the system need only find one capable of running that process and its library OS.

Fascinating, Ain’t It?

I just think all this stuff is incredibly cool. You know, for a long time, IT has felt a little boring. The technical details have gotten more… detailed… but we haven’t had a revolution in a long time. We’re nosing up to the edge of a revolution.

The cloud wasn’t really a revolution, it was an acknowledgement of everywhere-connectivity and of purchasing scale. We put some nice management layers on it, but it’s not, for me, a revolution. Whittling things down from the VM to a process, and then building out processing farms consisting of racks of CPUs connected to racks of RAM connected to racks of SSD drives… that’s getting sexy. It’s new, it’s different, and I can’t wait to see where it goes.

2 thoughts on “How Microsoft Could Kill the Client

  1. rich siegel says:

    my favorite article you’ve ever written.i think this is _the_ path.

    i just hope ms delivers sooner rather than later. people are wasting a lot of time managing windows through a slew of inconsistent approaches instead of improving their moneymaking apps. I think the linux distro companies like redhat will not be able to compete with the likes of coreos. userland is where the money is at.

  2. Very cool post, Don.

Comments are closed.

%d bloggers like this: