Why Engineering Workarounds is Stupid

Back when TV was first invented, their Cathode Ray Tubes weren’t terribly well-designed. They curved a lot, especially at the edges, making the picture at the edges useless. To compensate – to workaround – manufacturers buried the edges of the tube in the TV cabinet, so you could only see the relatively-flatter middle portion of the tube.

As a result, broadcasters had to be careful not to put useful information near the edges of the picture. This concept because known as overscan, meaning the tube was scanning its electron beam(s) across a larger surface than could be seen.

Improvements in manufacturing and electron beam control happened almost immediately, though, making overscan less necessary. Unfortunately, it was built into the broadcast standards by that point, and so everyone kept playing along. Today, overscan is completely and utterly meaningless in our world of readily-available, cheap flat panels. But overscan persists, with broadcasters identifying a “safe zone” for content, and more or less ignoring the “edges” of the screen. The problem is starting to fade as content shifts over to all-HD (which presumes a flat panel type of display), but it’s been an annoying problem for decades.

This should be an object lesson in not engineering workarounds as permanent solutions. If something feels hacky or kludgy, either don’t do it, or at the very least don’t bake it into a standard that will be difficult to change later.

For example, those early TVs should have been built with a circuit that took a non-overscanned image and shrunk it down to the flatter, reliable portion of the screen – with a switch that could turn that feature off. It’s not unlike the scaling modern flat panels can all do, in fact. That way, the workaround can be put into use, but the hack – overscanning – doesn’t get built into the broadcast standard.

We build workarounds every day in IT, but you should pay attention to them. Identify things that you know are hacky, and schedule them for a mental re-visit every few months. Decide if things have improved in other areas, and if the problem can now be solved in a better-engineered fashion. Kludgy scripts should be replaced with more sensible, maintainable, reliable solutions.

Don’t make workarounds part of a permanent standard!

Whatcha Wanna Know?

I’m starting to line up my writing for next year (I plan ahead) and was wondering what y’all would like to read about here on DonJones.com. Not PowerShell – I’ll keep that on PowerShell.org – but other topics are wide open. Any suggestions?

DSC Summer Camp at My Place

So, we’re going to do DSC Camp at my place the weekend of August 21st, 2015. Yeah, NEXT year.

We’ll ask folks to arrive Friday the 21st sometime in the afternoon, and we’ll go down to the Strip for an informal meet-up. On the mornings of Saturday and Sunday, we’ll have classroom time at a Hilton Garden Inn near my house. Saturday afternoon and evening will be “unstructured wet discussion time” (e.g., pool day) at my place, possibly with Outdoor Movie Night thrown in for fun. Sunday evening, we’ll have more informal discussion time for folks who stick around.

Topic-wise, we’re going to spend a day on DSC planning and infrastructure, and a day on writing custom resources – fully expecting to be doing so the v5 way, since it should have been out for a while by then, I’m guessing.

We’ll probably limit this to something under 20 total people, and that’ll include a couple of specially invited guests. Cost-wise, you’re probably looking at $1500. We’re going to cover all the food and beverages and whatnot, as well as rent meeting space with that money. You probably won’t need a car (we’ll work out a shuttle bus) and you’ll need to stay at the Hilton Garden Inn on South Las Vegas Blvd in Las Vegas (it’s about half a mile from my house). There’s a second hotel that’s actually a bit closer (but doesn’t have meeting space) in case the HGI fills up, although we’ll encourage you to book a room as soon as you register with us.

If you’re interested, drop an e-mail note to DSCCamp over at PowerShell.org. That goes to Chris, who will record your e-mail address, and let you know when we open registration. Please only e-mail if you’re pretty serious about attending – we’re not going to use that as a general “we’ll send more information later” kind of advertising service. We’ll be opening registration in early 2015, and we’ll e-mail you when that happens (and probably won’t e-mail you otherwise).

Seriously, this isn’t a “send us an email if you’re vaguely interested in the concept.” We’re gonna do this, and just want to be able to e-mail folks who genuinely plan to register when the time comes. This is all being managed manually, so thanks for helping us keep it simple <grin>.

(Quick update – thanks for the many suggestions, but I won’t be doing this as some kind of visit-every-city roadshow; the main point was to simplify logistics and do something simple and unstructured; you’re welcome to do one at your own house and I’ll be happy to tweet about it, but I don’t have the bandwidth do take this on the road)

Do You Know How it Works Under the Hood? Really, Really?

Quick quiz:

Can you create an Active Directory user account that has a blank samAccountName?

Answer: Yes. Oh, not in ADUC, but using almost any other tool, sure. A blank samAccountName is legal so long as it’s unique.

I use this example in classes all the time, because it illustrates one of the difficulties in the Microsoft admin universe: we know our tools pretty well, but not necessarily the underlying technology so well, mainly because the tools have provided a layer of insulation for our entire careers. But without knowing the technology, you’re not as good at planning, troubleshooting, architecture, operations – well, all of it, really.

Here’s another one: do you know how the “dynamic memory” or “memory overcommit” features in VMware and Hyper-V work? If not, and if you’re using that feature, you might be using it in cases where it does more harm than good.

Think about it: in a physical server, you can’t simply yank memory out of a running machine, nor can you just pop in more memory. The guest OS in a VM thinks it’s on a physical machine, so it operates under the same restriction. So how does overcommit work?

The trick with VM memory is that every byte of memory actually being used by a VM must be backed up by physical RAM in the host. Traditionally, the hypervisor had no way of knowing what memory was in use, and what wasn’t, because the guest OS is free to rearrange that stuff constantly. Ergo, every byte assigned to a VM needed to be backed by physical RAM.

Overcommit requires the installation of a special device driver, called the balloon driver, in the guest VM. Device drivers operate in Windows’ kernel mode, which means if they ask for memory, they get it. The assumption by Windows is that device drivers don’t need much RAM, and that denying them RAM will make hardware not work, and so they get what they ask for. So when the hypervisor host asks the balloon driver to release some RAM, the balloon driver asks the guest OS for memory. Whatever the driver gets, it erases, setting the memory contents to 0. That means a known portion of guest memory isn’t in use, so the hypervisor doesn’t have to back that memory with physical RAM. Thus, the VM always thinks it has 4GB or whatever, but not all of that will be in use, because the balloon driver “locks” it.

The operating presumption is that running applications will only release RAM they don’t absolutely, positively need, so the balloon driver won’t impact system operations. It’ll essentially just “gather up” memory that wasn’t really in use.

The problem is that assumptions can sometimes be wrong. For example, some applications maintain large data caches in RAM, basically seeking to use all the RAM they can – the same approach as the balloon driver. When the OS sends a memory panic, because the balloon driver is requesting RAM that isn’t available, these user-mode applications will give up some memory. Their assumption is that the app is better off with less-than-optimal RAM than with the server crashing due to insufficient memory. So app performance suffers – sometimes significantly, depending on the effort involved in rearranging those data caches so that memory can be freed up.

The point is, you can’t make intelligent decisions about these features unless you know how they work, how they interact with other applications, and what the consequences might be. Knowing how things work under the hood is a crucial part of being an effective IT person.

And that “knowing” requires an insatiable curiosity. First-level documentation never discusses these under-the-hood secrets. In most cases, vendor marketing doesn’t either, because they simply want you to believe the feature is a no-brainer to use. So you have to be constantly curious, constantly asking “why” and “how,” and constantly seeking out the answers on your own. Yeah, it’s a lot to keep up with – but it’s what separates the true IT professional from the IT operator who simply pushes buttons and hopes for the best.