Fortune: You would do well in the field of computer technology.
Well, duh!

I received this fortune in a cookie from lunch the other day.

  Lunch was catered Chinese

    in the office of my current employer;

      an Internet technology start-up

        where I stand in front of computer screens

          For large portions of the day.

I manage a team of computer people.

  More often called System Administrators, or

    Network Administrators, or

      Site Reliability Engineers, or

        Site Reliability Operators, or

          IT, Ops, or DevOps*.

Some of my time involves meetings

  with software developers and engineers

    solving software and service architectural problems

      and coming to agreements on how to progress forward.

Sometimes I even write code

  something I’ve been doing since 1983.

    Other times I setup, configure, and maintain

      large scale Internet infrastructures and web services.

        The combination of those two functions has been my profession since 1993.

Over 20 years in this field.

  It doesn’t really seem that long.

    Probably because I’ve done well by it

      And for the most part enjoyed the work very much.

I guess this fortune cookie turned out true

  If maybe a bit after the fact had been well settled.

    Had I received it 30 years ago, though,

      I would’ve just said, “Duh, I’m going to do that anyway.”

* I consider DevOps to be an organizational structure and philosophy focused on development and operational practices. It is not IMHO a job title or position. Using it as such is frequently a sign that someone doesn’t know what they are talking about.

Alcohol content: a wee dram (Macallan Fine Oak 21 Year)

I’ve spent my life recording numbers to generate graphs to make sense of the world.  The numbers are what’s important, but humans do not really gain comprehension staring at rows and columns of numbers.  The human brain is made for visual processing.  A glance at a well plotted graph can present more value than attempting to read through the millions of data points that made up the graph.  The plot illuminates the data and should allow a reasonable person to intuit meaning.  I’ve probably tossed many kilograms of notebooks filled with data sets and plotted graphs.  Almost no one ever cared about the numbers, just the resulting graphs.

Figures at Work

I’ve used GIFs generated with ancient versions of  GNU Plot  to analyze performance data.  The pain of creating the graphs turned justified when insights were gained from simple visualizations.

File transfer comparisons
From:

File Distribution Efficiencies: cfengine vs. rsync

Certainly, in my line of work pretty pictures help explain what’s going on to a wider audience than the raw numbers.  You can immediately look at a picture and notice when something appears out of the ordinary.

Server CPU Utilization
Maybe we should look at why the CPU load just spiked there.

Or…

Elastic Search EPS
Maybe things are happily cycling along.

Other times, I’ve needed to translate the data to justify costs or ways to lower costs.  The following graph helped start a conversation with finance to purchase Amazon AWS Reserved Instances.  A full multi-sheet spreadsheet explaining the financial breakdown was still needed for full justification.  But without this and few other pretty pictures, the discussion and the approval of the up front costs would never have happened.

AWS Instances
How many EC2 Instances are we running?

But that’s just work and part of the expected drudgery of justifying your department’s daily existence.  You do them like annual reviews and power point presentations; get them done to prove your point, then try to get back to real work…  Unless you are a scientist (data or otherwise), in which case, collecting data points and graphing the results might be your real job.

Zen and the Art of Keeping Track of My Car

But even before I ever had to plot a graph in school, I watched my dad keep a small notebook in every car to record the mileage, fuel added, fuel costs, etc.  He’d calculate the MPG since the last fuel stop and record that, and then reset the tripmeter.  He’d also indicate when he’d perform an old change or some other service on the car.  It was a valuable service record for the car.  Noticing a change in MPG could also indicate that something .  I continued the tradition when I started driving my own cars.  And it didn’t take me long to realize there was so much more information in those numbers if I could plot them.

Automotive gas prices clearly cycle over time, but over the years have been trending down in Northern California.  The slight uptick at the end could mean summer prices are coming.  But in this case, it’s more a matter of which town I needed to get gas.

Premium Gas Prices ($/gal) Over Time
Premium Gas Prices ($/gal) Over Time

How’s my driving?  So as I drive more, a higher percentage of my time spent in this car is commuting.  Commuting seems to be so much more fuel efficient than track days and auto-crossing.

MPG
Fuel Efficiency (or lack therof)

I’m driving this car a whole lot more now.

Trip-o-meter vs Odometer readings
Trips vs Odometer

Example

Google Sheets Logbook

Always Look on the Bright Side of Life

Maybe all this tracking, accounting, and graphing is sign of some compulsion on my part.  Certainly when it comes to question of my personal health, I become obsessive about tracking as many variables as possible.  There have been a few times when I’ve been not entirely healthy and the cause was not immediately apparent.  When a doctor or group of doctors become interested in you, life becomes less than pleasant.  On more than one occasion, after months of specialists and testing my body has sorted itself out.  Whatever was wrong resolved itself with no definitive answer for a cause.  The interminable wait between lab results and doctors visits, though, left for too much time for fidgeting and speculation.

Until wearable medical devices came around, I kept log books.  During one of these months long investigative sessions my log filled dozens of composition book pages.  Over three months I documented when symptoms would occur, the severity, duration, what remediation steps were taken, and their efficacy.  One specialist upon reading through my notes suggested I was adding to my general stress levels.  He did not come right out and suggest this additional stress was the cause of my ills, but I suspect that was on his mind.  I won’t discount that possibility entirely.  But sitting around waiting for something to happen and someone else to do something felt worse.

With a wearable, especially one that can track heart rates, the paper log book becomes a bit less necessary.

Resting Pulse over Months
Resting Pulse over Months

There’s still a lot of manual data entry than has to go on, but apps like Apple Health and

My Fitness Pal make it less tedious.  Apple’s Health app has horrible graphing in my opinion and without paying for it, it’s difficult to get all your data out of FitBit.  Using QS Access on the iPhone helps to move the data points into a spreadsheet like Google Apps or Excel where you can manipulate and visualize the data.

Blood Pressure Measurements over Months
Blood Pressure Measurements over Months

I still don’t know what’s going on with my current malady.  At least by recording data points and making graphs I feel like I’m doing something for my treatment while I wait for the next test or visit.

Alcohol content: measurements not recorded

Continue reading

Even before BYOD further blurred the lines between work and home life, employees used their corporate supplied devices for personal use. Regardless, keeping personal browsing separated from work related browsing is a good idea. Perfect hygiene would dictate using different computers for personal and corporate use, that’s rarely ever happens. There are a number of different ways you can use the same computer for both personal and corporate browsing including having separate accounts on the machine (if you have permissions), running personal stuff in a virtual machine, or running different browsers for each purpose.

The last is probably the easiest and I have a co-worker who claims to run 5 different browsers on his laptop just for this purpose. Mind you, there are a lot of other reasons for wanting to run different browsers for different purposes. You could be running different browsers for testing, performance, and security reasons. I’m not going to delve into that. I’m just focusing on using the same browser for different tasks that are isolated from each other.

In this case I’m using Google Chrome and I want to use it for both personal and work-related browsing. I also need to make sure to keep the activities of one isolated from the other. Using private browsing for personal stuff might suffice in many cases. Since I sync stuff among personal machines, though, private browsing won’t work. Instead, here’s how I run multiple instances of Chrome on the same machine at the same time.

The following are instructions for OS X, but should work similarly on a Linux desktop adjusting paths as necessary. I haven’t touched a Microsoft Windows PC in nearly a decade, so I have no idea how to pull this off there.

Initializing a New User Data Directory

Chrome stores user configurations, extensions, bookmarks, cookies, caches, stored forms, and other transient data in a user data directory. Knowing this, you can setup different data directories for each type of “user” while having just one Chrome binary installed. Here are the steps to get that going.

The default location for Chrome to store the user data is /Users/[username]/Library/Application Support/Google/Chrome/. I use this default for my work account. To create and initialize a new user directory perform the following tasks in a Terminal window.

  • export chromeApp="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
  • mkdir ~/Library/Application Support/Google/Chrome-personal
  • "${chromeApp}" --user-data-dir="/Users/andy/Library/Application Support/Google/Chrome-personal" >/dev/null 2>& Notice that in the quoted line in item 3 you need to use the full path to the data directory. Tilde (~) expansion will not work within the quoted string.


The last command should start up a new Chrome browser. First Chrome will pop-up with a “Welcome to Google Chrome” window. Upon clicking “Start Google Chrome” it will populate the Chrome-personal directory with initial user data, and then bring forward a browser window. This will ask you to log into Chrome. Whether you do that is up to you.

With the new Chrome window open, you will need to install any extensions, bookmarks, etc you want to customize this browser. I would highly suggest setting a theme to this window so that it is easily visually different from any other copies of Chrome you might run at the same time.

Lather, rinse, repeat for any additional entities you’d like to create.

Startup Script

I wrote a simple shell script to start my browsers up with the requisite user directories. Since Chrome is set as the default browser on my system, it is important to note that the first Chrome process started will be the one that gets used for opening new links from other applications. In the example below, this makes my personal session the default for new tabs/windows from other applications.

Chromes.sh – gist link

#!/bin/sh
## Chrome App
app="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"

## Non-default user data directories
usrDir="/Users/andy/Library/Application Support/Google/Chrome-personal"

## Fire up Chrome
"${app}" --user-data-dir="${usrDir}" >/dev/null 2>&1 &
"${app}" >/dev/null 2>&1 &

## If Chrome is default browser, then the first process fired will be
## where new links/tabs are spawned from other applications.

Remember to chmod u+x chromes after you are done editing.

You can now fire off this script from any open Terminal window. Or you can do the following to make it run when double-clicked in the Finder.

Make a Shell Script Run Like a Native App

In your Terminal window perform the following:

  • mkdir -p ~/Applications/Chromes.app/Contents/MacOS
  • copy your shell script to ~/Applications/Chromes.app/Contents/MacOS/*

    e.g. cp Chromes.sh ~/Applications/Chromes.app/Contents/MacOS/

  • open ~/Applications/Chromes.app/Contents/MacOS/ Chromes in Finder
    Chromes script in Finder
  • In the Finder window that opens, select Chromes.sh and type ⌘-i to open the information pane for the shell script. Then change Open with to Terminal.app. Chromes Finder Info panel: Open With
    Open with Terminal.app
  • Run open ~/Applications to open a Finder window with the local user Applications. Chromes.app in Finder
    Chromes.app in Finder
  • Select Chromes.app in Finder type ⌘-i to open the information pane for the Application. Download and open in Preview the following png file. In Preview select all (⌘-a) and copy (⌘-c) the image. Then select the Icon at the top of the Application Information pane and paste the image (⌘-v). You will see the size of the application change and after a few seconds the Icon for Chromes.app will change as well.

    Chromes.app Finder Info
    Chromes.app Finder Info

  • You can now double-click the Chromes.app and fire up your multi-Chrome browsers. You can also drag the Application Icon from the Finder into your Dock for future use.


Example Icon


The Results:

Two Different Chrome Processes
Two Different Chrome Processes

Alcohol content: low (clearly huffing glue)

Continue reading


Parking at one of the Veritas/Symantec building in Mountain View and I saw this posted on a nearby dumpster enclosure.  Considering that this particular piece of real estate has cycled from Fairchild, National Semiconductor, Cisco, Netscape, AOL, VeriSign, Symantec, Veritas, and is owned now by Google, I was surprised and depressed that this is about the only relic from that first dot-com boom.

I probably already have co-workers who look at Netscape as a historical item in the same way a PDP-11 was to me.

Alcohol content: stone code sobriety (a 40 poured out in memorial)

Continue reading

1999… One day I worked for a company I liked and then next day I was part of a major corporate conglomerate. The days before the deal everyone was told from the highest levels that things would not change so much– at least not for a while. Reassurances were made that certain people would keep there jobs and that those jobs would be crucial to the new merged company. Mostly we were told that this was a merger and not a buyout. All that changed the day the deal was completed.

The HR department ended up spending the entire night processing the list of people who would get laid off the next morning. Then there was the paperwork that needed to be generated for the people would would allowed to stay. No longer would you be an employee of Netscape, but an AOL drone. The offers to stay were boilerplate and everyone was given a gift of 100 stock options. The options though, were completely worthless at a strike price of $108.00 on an already sinking stock. Many people left after the deal even though they had not been laid off. Many people left before the deal with completely on shear principle. Others, like yourself, stayed on, still believing that there really wouldn’t be that much change and that things could not get that bad.

Initially, after the shock of the layoffs was over, things had stabilized and were fine. Life was nearly normal. For a few months nothing really changed except some additional responsibilities and the addition of a new project. The drink machine remained free and the price of snacks didn’t change, so overall, everything was fine.

Work continued and a new project lent itself to discovering more and more disturbing things about the “merger.” Since the computer systems that would run this new service you were about to bring up would be housed in AOL datacenter space, you ended up having to deal with more AOL personnel than would normally be the case. You discovered their internal disorganization and the great inertia to change. There was one way to operate and that was the AOL way. Even when that way was old and outdated and served no function, that was the only way to operate. Groups within AOL were highly compartmentalized and there was little to no communication between groups. Attempting to get the AOL NOC to get in touch with the persons responsible for a machine was a nightmare of fruitless hunting. They didn’t have the answers and it took them hours to find the answer. Then, typically, the person in question would be completely unreachable except from their desk.

The event that broke the camel’s back, though happened one lovely Thursday morning when a PDU (Power Distribution Unit) failed back at AOL’s main datacenter. Even though all your machines had dual power and been specified as being powered through redundant PDUs, they were down. The first call to the AOL NOC went something like this:

“Hi, this is Andy from Netscape Operations, and the servers in section Z, row 24, cabinets 4 through 9 are unreachable. What’s going on?”

The NOC drone responds after a pause, “We had a PDU fail, sir.”

“Those machines are dual-power they should be on a redundant PDU.”

“Uh, I don’t know. I’ll look into it. What’s your number so I can call you back in a half hour.”

A half-hour turned out to be the minimum turn around time for any NOC request, I would come to find out.

“Wait,” I asked, “What’s the ETA on the PDU getting fixed?”

“Not sure, sir,” the drone responded, “I’ll get back to you on that.”

For thirty minutes, I ping and retest to see if the site has returned. At the fifteen minute mark, per Netscape protocol, I escalated up the chain of command to let folks know what’s down and why. The lack of an ETA doesn’t make anyone happy. A few minutes after the 30 pass and I get my call back.

“What’s the story,” I ask without allowing for a greeting.

“The PDU will take another 4 to 5 hours to replace. Then everything should come up.”

“WHAT,” I shout down the line,

“These are production systems. They shouldn’t have been allowed to be down this long. Why aren’t the secondary power plugs in a different PDU?”

“I don’t know. The ticket on the order for power was closed as completed.”

“Well, can we get someone to change the plugs now,” I ask attempting to calm down.

“I could create you another ticket. But since all the power folks are working on the PDU, I doubt they’ll be able to look at this until Monday,” the drone responds without emotion.

“Make the ticket and escalate it sev 1,” I demand.

The ticket gets made and I’m told I’ll get a call back in another half hour on the status of that and the original PDU. Through the regular half hour calls, I badger my management chain to bug their AOL peers. The lack of availability by desk phone, mobile phone, email, or pager strains credulity. While contacts are made and home phone numbers exchanged, no real progress is made. In slightly more than four hours, the replacement PDU comes online and my servers start lighting up. But not all of them.

I call the NOC, “Has power been completely restored?”

“Yes, sir.”

“Some of my servers are not coming up. Can you check if they have power?”

“Sure, let me get back to you in a half hour,” comes the typical response and not unexpected.

At least at this point I have more work I can do to continue restoring the system. By the time I get the call back to tell me that the machines have power and show as on, I have a theory as to the problem.

“Can you get someone on the console and have them type boot on the following machines,” I ask the NOC drone.

“Sorry, I can’t do that.”

“That’s fine. Just find someone who can.”

“No, sir, I can’t have anyone do that until after the change freeze.”

A change freeze is where nothing is modified within a system for a given amount of time in order to ensure stability. It’s fairly standard practice, but usually break-fix work does not apply.

“What do you mean there’s a change freeze? This is a production system which is down. It should’ve been back up hours ago. When can someone get in there to fix it?”

Without a sense of urgency the drone responds, “The freeze is in effect from four pm Thursday through Monday at 9.”

In a fit of apoplexy, I hang up my cell phone and then start slamming the receiver of my desk phone. The latter being so much more satisfactory. Running to find my VP, I explain the inanity of the situation with far too many expletives to be politically correct. A few calls and 45 minutes later some monkey is in the datacenter typing boot on about a dozen servers and validating that they come up. After giving the all clear, your boss and VP come over to tell you to go home.

“You can write up the post-mortem later in the week. Get some rest.” “Yeah, you look like you might kill someone.”

I apologize for my behavior and slouch in my chair.

“It’s fine,” explains the VP, “They need to learn from us. It is completely unacceptable how much of their infrastructure is down at any time. And the lack of response is really bad.”

I shrug as they leave and pack up my gear and go home.

As much talk as their was about AOL learning from us, the fiefdoms and bureaucracy were too well entrenched to really change. Netscape was just another conquered bit of territory. Bits of it were shut down or given away to partners. Over time almost everything was moved from California to headquarters in Dulles, Virginia. Before that happened, though, I left for a startup. It was a better choice than waiting to get laid off some other week. It was a better choice than walking around the empty sad halls of Netscape.

A startup. That was the ticket…

Alcohol content: None (They killed Beer Friday)

Continue reading