Why Engineering Workarounds is Stupid

Back when TV was first invented, their Cathode Ray Tubes weren’t terribly well-designed. They curved a lot, especially at the edges, making the picture at the edges useless. To compensate – to workaround – manufacturers buried the edges of the tube in the TV cabinet, so you could only see the relatively-flatter middle portion of the tube.

As a result, broadcasters had to be careful not to put useful information near the edges of the picture. This concept because known as overscan, meaning the tube was scanning its electron beam(s) across a larger surface than could be seen.

Improvements in manufacturing and electron beam control happened almost immediately, though, making overscan less necessary. Unfortunately, it was built into the broadcast standards by that point, and so everyone kept playing along. Today, overscan is completely and utterly meaningless in our world of readily-available, cheap flat panels. But overscan persists, with broadcasters identifying a “safe zone” for content, and more or less ignoring the “edges” of the screen. The problem is starting to fade as content shifts over to all-HD (which presumes a flat panel type of display), but it’s been an annoying problem for decades.

This should be an object lesson in not engineering workarounds as permanent solutions. If something feels hacky or kludgy, either don’t do it, or at the very least don’t bake it into a standard that will be difficult to change later.

For example, those early TVs should have been built with a circuit that took a non-overscanned image and shrunk it down to the flatter, reliable portion of the screen – with a switch that could turn that feature off. It’s not unlike the scaling modern flat panels can all do, in fact. That way, the workaround can be put into use, but the hack – overscanning – doesn’t get built into the broadcast standard.

We build workarounds every day in IT, but you should pay attention to them. Identify things that you know are hacky, and schedule them for a mental re-visit every few months. Decide if things have improved in other areas, and if the problem can now be solved in a better-engineered fashion. Kludgy scripts should be replaced with more sensible, maintainable, reliable solutions.

Don’t make workarounds part of a permanent standard!

DSC Summer Camp at My Place

So, we’re going to do DSC Camp at my place the weekend of August 21st, 2015. Yeah, NEXT year.

We’ll ask folks to arrive Friday the 21st sometime in the afternoon, and we’ll go down to the Strip for an informal meet-up. On the mornings of Saturday and Sunday, we’ll have classroom time at a Hilton Garden Inn near my house. Saturday afternoon and evening will be “unstructured wet discussion time” (e.g., pool day) at my place, possibly with Outdoor Movie Night thrown in for fun. Sunday evening, we’ll have more informal discussion time for folks who stick around.

Topic-wise, we’re going to spend a day on DSC planning and infrastructure, and a day on writing custom resources – fully expecting to be doing so the v5 way, since it should have been out for a while by then, I’m guessing.

We’ll probably limit this to something under 20 total people, and that’ll include a couple of specially invited guests. Cost-wise, you’re probably looking at $1500. We’re going to cover all the food and beverages and whatnot, as well as rent meeting space with that money. You probably won’t need a car (we’ll work out a shuttle bus) and you’ll need to stay at the Hilton Garden Inn on South Las Vegas Blvd in Las Vegas (it’s about half a mile from my house). There’s a second hotel that’s actually a bit closer (but doesn’t have meeting space) in case the HGI fills up, although we’ll encourage you to book a room as soon as you register with us.

If you’re interested, drop an e-mail note to DSCCamp over at PowerShell.org. That goes to Chris, who will record your e-mail address, and let you know when we open registration. Please only e-mail if you’re pretty serious about attending – we’re not going to use that as a general “we’ll send more information later” kind of advertising service. We’ll be opening registration in early 2015, and we’ll e-mail you when that happens (and probably won’t e-mail you otherwise).

Seriously, this isn’t a “send us an email if you’re vaguely interested in the concept.” We’re gonna do this, and just want to be able to e-mail folks who genuinely plan to register when the time comes. This is all being managed manually, so thanks for helping us keep it simple <grin>.

(Quick update – thanks for the many suggestions, but I won’t be doing this as some kind of visit-every-city roadshow; the main point was to simplify logistics and do something simple and unstructured; you’re welcome to do one at your own house and I’ll be happy to tweet about it, but I don’t have the bandwidth do take this on the road)

Do You Know How it Works Under the Hood? Really, Really?

Quick quiz:

Can you create an Active Directory user account that has a blank samAccountName?

Answer: Yes. Oh, not in ADUC, but using almost any other tool, sure. A blank samAccountName is legal so long as it’s unique.

I use this example in classes all the time, because it illustrates one of the difficulties in the Microsoft admin universe: we know our tools pretty well, but not necessarily the underlying technology so well, mainly because the tools have provided a layer of insulation for our entire careers. But without knowing the technology, you’re not as good at planning, troubleshooting, architecture, operations – well, all of it, really.

Here’s another one: do you know how the “dynamic memory” or “memory overcommit” features in VMware and Hyper-V work? If not, and if you’re using that feature, you might be using it in cases where it does more harm than good.

Think about it: in a physical server, you can’t simply yank memory out of a running machine, nor can you just pop in more memory. The guest OS in a VM thinks it’s on a physical machine, so it operates under the same restriction. So how does overcommit work?

The trick with VM memory is that every byte of memory actually being used by a VM must be backed up by physical RAM in the host. Traditionally, the hypervisor had no way of knowing what memory was in use, and what wasn’t, because the guest OS is free to rearrange that stuff constantly. Ergo, every byte assigned to a VM needed to be backed by physical RAM.

Overcommit requires the installation of a special device driver, called the balloon driver, in the guest VM. Device drivers operate in Windows’ kernel mode, which means if they ask for memory, they get it. The assumption by Windows is that device drivers don’t need much RAM, and that denying them RAM will make hardware not work, and so they get what they ask for. So when the hypervisor host asks the balloon driver to release some RAM, the balloon driver asks the guest OS for memory. Whatever the driver gets, it erases, setting the memory contents to 0. That means a known portion of guest memory isn’t in use, so the hypervisor doesn’t have to back that memory with physical RAM. Thus, the VM always thinks it has 4GB or whatever, but not all of that will be in use, because the balloon driver “locks” it.

The operating presumption is that running applications will only release RAM they don’t absolutely, positively need, so the balloon driver won’t impact system operations. It’ll essentially just “gather up” memory that wasn’t really in use.

The problem is that assumptions can sometimes be wrong. For example, some applications maintain large data caches in RAM, basically seeking to use all the RAM they can – the same approach as the balloon driver. When the OS sends a memory panic, because the balloon driver is requesting RAM that isn’t available, these user-mode applications will give up some memory. Their assumption is that the app is better off with less-than-optimal RAM than with the server crashing due to insufficient memory. So app performance suffers – sometimes significantly, depending on the effort involved in rearranging those data caches so that memory can be freed up.

The point is, you can’t make intelligent decisions about these features unless you know how they work, how they interact with other applications, and what the consequences might be. Knowing how things work under the hood is a crucial part of being an effective IT person.

And that “knowing” requires an insatiable curiosity. First-level documentation never discusses these under-the-hood secrets. In most cases, vendor marketing doesn’t either, because they simply want you to believe the feature is a no-brainer to use. So you have to be constantly curious, constantly asking “why” and “how,” and constantly seeking out the answers on your own. Yeah, it’s a lot to keep up with – but it’s what separates the true IT professional from the IT operator who simply pushes buttons and hopes for the best.

IT: The Only “Supportive” Function That isn’t Overhead, and isn’t (usually) Regulated

Most companies of almost any size these days have several functions that are considered overhead. Now, it’s important to understand, from a business perspective, what “overhead” actually means.

The expenses of a business that are not attributable directly to the production or sale of goods or services.

For example:

  • Real estate (e.g., rent, property taxes, building maintenance, and so on)
  • Facilities (custodial services, you might drop building maintenance into this category; usually includes basic utilities like power and water)
  • Finance
  • Human Resources / Personnel
  • Legal

You get the idea. I’d argue that some of these shouldn’t be considered overhead in the usual sense. For example, you should be able to allocate some utilities, like power, to the production and sale of good or services.

I’ll point out that many, although not all, of these overhead items are in regulated industries. Finance (bookkeeping, at its simplest) is a highly regulated industry, and finance leaders will usually have a CPA – an industry-maintained license. HR is also highly regulated, and in fact spends more time dealing with regulations and laws than almost anyone else. Legal is obviously a highly regulated, independently licensed profession.

Which brings us to IT.

For most organizations, IT is considered overhead. That is, mainly due to lazy management and a lack of tools, the costs of IT aren’t allocated back to the goods or services that IT helps produce. Nor is IT allocated back to other overhead functions that consume IT.  IT is also almost completely unregulated in most industries/ There may be laws (HIPPA, SOX, GLB) that place certain restrictions and responsibilities on a company’s information-handling practices, but those don’t target IT specifically. IT has no professional licensing, just vendor-based certifications.

You can make much the same argument with other “overhead” functions, like HR. Most HR costs – payroll, benefits, etc – can be allocated on a per-person basis pretty easily, which means they can and should be allocated back to the cost of producing or selling goods or services.

So why all the overhead?

Simple: lazy management. It’s easier just to dump all of these functions into an “overhead” bucket than to spend the time allocating them out to individual business functions. But that lazy management means that, for some companies, these overhead functions account for the company’s biggest expenses. That’s like dividing your household monthly budget into “groceries,” “bills,” and “other.” Should you need to cut back, you really can’t do so intelligently without digging deeper into that “other” category.

Overhead categories encourage poor decision-making. In reality, absolutely everything IT does either directly leads to a sellable product or service, or directly supports someone who does. If you have any IT that doesn’t directly support a business function, you should get rid of it – but that’s hard to do when it’s all one big lump of “overhead.”

And that overhead is getting bigger. It’s also getting more diversified: storage, communications, virtualization, infrastructure, they’re all becoming increasingly specialized. And, because it’s all lumped into “overhead,” it’s difficult to determine if a particular function can be outsourced, moved to a cloud platform, etc. Simply making smart decisions about IT is difficult when you can’t tie cost and benefit back to a revenue-producing effort.

In the coming years, the companies that will do the best will be the ones who know exactly how every IT penny is being spent, and why. Not to micromanage that spend, but rather to maximize it in the places where it will contribute the most to revenue-producing activity. The companies most effective at doing this will be small and medium-sized businesses, and they’ll be the ones able to disrupt their much larger peers simply through smarter management and smarter allocation of resources. They’ll maintain better profit margins because they’ll have a better handle on expenses, and because they won’t settle for “overhead” anymore.

What could you do to encourage your organization’s leaders to start thinking of IT has an attributable cost of doing business, rather than as lump-sum overhead? Do you do any of that today?

 

 

 

 

Customer Retention

I’m sure you’ve read the Comcast horror story about customer retention; if not, give it a quick skim.

I contrast that with the Quicken Bill Pay service I recently cancelled. They were polite, assuring me immediately that they’d cancel my service, add a free month so I could transition bills as needed, and welcomed any feedback I wanted to offer. Only then did they ask why I was leaving, and if there was anything they could fix that would change my mind. Once I said, “no,” they immediately processed the cancellation as promised. I was off the phone in minutes, with a very positive feeling about the company.

As a result, I’d do business with them again, if I had need. Comcast? Shudder. I’m glad I don’t have coax in the ground out where I live, so I can’t even be tempted.

But the Comcast story is a good one when it comes to considering pay incentives. Management guru Deming was famously anti-incentive, and the Comcast story is pretty much the reason why. Incentives always have a downside, and it’s virtually impossible to develop an incentive program that doesn’t have a downside, sometimes major ones. You can’t blame the rep in this one, because he was doing exactly what Comcast trained him to do and paid him to do – it’s just a shame that the company’s parting shot to a customer was to virtually ensure they’d never come back.

Why “Private Cloud” is Actually Important

The vendor community – Microsoft included – has done a horrible job of explaining what “private cloud” is supposed to mean. So horrible, in fact, that the term has become yet another hated marketing buzzword. But in reality, “private cloud” means something important, innovative, and disruptive.

It means the end of our users hating IT.

Unfortunately, as you’ll see by the end of this article, “private cloud” will take a long, long time to be truly implemented, because most companies lack mature-enough management to do so.

 

What is “private cloud,” then?

When the term was first launched, most people responded with something like, “so this is just the datacenter I’ve had all along, with some virtualization, right?” And from a technical perspective, they’re almost right. A private cloud-ish datacenter really doesn’t look a lot different, technically, from what you’ve got today. A private cloud datacenter almost, if not fully, virtualized. You’d have the ability to quickly move loads from host to host to accommodate workload changes. Most of you probably have something close to that.

The private cloud also has a lot of automation built in. When you need to spin up a new virtual machine to be a web server, or SQL Server, or domain controller, or whatever, you really just click a button. The infrastructure deploys the VM, finds a place for it to live, provisions the operating system, and so on. Zero manual effort. Some of you may be close to that, but most of you are miles away. Literally anything beyond filling out a single form (what kind of server, what will it be named, and so on) and clicking a button it too much work. In fact, if your stupidest, most entry-level tech can’t do the job, then it isn’t automated enough.

You see, that automation is the key.

 

This is a Management Play

Once your dumbest technician can deploy new VMs, any user can technically do so. So the difference between “users deploy their own stuff” vs. “IT deploys it for them” becomes a matter of who’s authorized to do the task, not who’s capable of doing the task.

When you log into Azure and spin up a new VM instance, nobody at Azure checks to see if you’re authorized or not. They don’t open a help desk ticket and route it to someone for approval. Because they have your credit card on file, they know you’re approved to make the decision, and the VM spins up. Nobody else can do that with your account, unless you authorize them in advance. In other words, you, the CEO of your little world, get to designate who may make those decisions. The IT department – the folks working at Azure – don’t care.

That’s what private cloud means: moving the authorization-and-approval point from IT to the business. This is good for two reasons.

 

IT Shouldn’t Be a Gatekeeper anyway

When exactly did IT get signed up to be “gatekeeper” on stuff? Well, in the beginning, only we were capable of doing the job, and so it made sense to make us a sort of bottleneck to make sure resources weren’t being wasted, and that they were being set up properly. The company set rules, and we enforced them. The problem is, that turned us into the “Naysayers” a lot, which made us a speedbump or something to be circumnavigated. We spend a lot of our time policing the infrastructure, and it shouldn’t be our job.

The biggest problem with IT being the gatekeeper is that is allows the company to conveniently write us off as “overhead.” Because we’re enforcing the company’s business rules for technology, the company gets to be lazy about accounting for IT usage. The problem is that IT isn’t overhead. Everything we do should either directly support a revenue-generating business function, or itself generate revenue. But because we don’t make the revenue, and we spend all the money on blinky-light toys, we’re “overhead.”

 

That’s Why Private Cloud is a Management Play

In private cloud, IT doesn’t spend any money, and it doesn’t authorize anyone to do anything. Instead, the company authorizes people to make specific IT decisions on their own. Line workers might have zero authority; their boss might be able to order new laptops when needed. His boss might be able to spin up new web servers at need, and her boss might be able to order entire new infrastructure elements like an extranet for partner communications. They do all that (ideally, but this is a maturity step) through a self-service portal, which IT sets up to run automation in the background.

Look, you’re the VP of Marketing. You want a website, push this button. You’ll get a website, and it’ll be consistent with our standards. There’s a cost for it, and it comes out of your budget. If you overspend your budget, that’s someone else’s problem, not IT’s. IT will provide Finance with a list of charges each month, and everyone pays for their share of the infrastructure.

You see, when business leaders who already have a bottom-line responsibility make the IT decisions, we aren’t overhead anymore. Business leaders make more careful business decisions about IT expenditures, because every one directly impacts their own bottom line, instead of being buried in some shared “overhead” category that everyone just ignores.

“OMG, let the VP of Marketing spin up web sites on demand?!?!?!” Yeah, let him/her. If he/she is incompetent, it’ll be a lot more obvious when the costs are more obvious. This doesn’t magically make every company better managed, but it puts crucial evidence in place to help make that happen. IT has no place enforcing the business’ financial or management goals. It’s not our job, it’s not what we’re good at, and it doesn’t serve the business. We need to move those decision points where they belong, and if that means people are allowed to make stupid business decisions, so be it. At least you’ll be able to see who’s making the stupid business decisions, instead of merely having anecdotal evidence of the fact. You already have crappy managers; let’s give them some rope and a spotlight.

 

But Here’s the Problem

The difficulty is that this requires a lot more managerial maturity than most companies possess. On the IT management side, you need to know what your resources cost. Newer tools – System Center Ops Manager is gaining some of these capabilities, for example – make it easier to send usage-based cost reports to Finance. But you also have to know what your administrators cost, because their “overhead” has to get rolled into the price of the resources you offer to the company. Most companies can’t tell you what a given administrator actually costs them, without turning to HR and doing some investigation on salaries and benefits and overhead and whatnot. It’s easier to lump us in as overhead… but it isn’t beneficial.

Private cloud is not a technological thing. It’s a management thing. We have most of the technologies needed to implement it, but we don’t have the management.