Every Company is a Learning Company

(Originally published on O'Reilly Radar).

Today our livelihoods and our very lives depend on software. With all the benefits of a software-rich world comes unprecedented complexity. Our ability to reason about the systems that we’re working with (and are part of) diminishes as their scale and interdependence increases. We can no longer rely solely on past experience, and instead have to continuously discover how systems are functioning or failing, and adapt accordingly.

This continuous adaptation requires an ability to learn deeply. Learning is not optional; it is the lifeblood of complex systems, including modern companies. And yet, we often default to the comfortable but shallow learning that is undermined by blame and biases.

Take, for example, a recent security incident at Symantec, a Fortune 500 technology company. The investigation revealed the “root cause [to be] a violation by specific individuals of established policies.” In other words, the root cause is a “few outstanding employees”, the bad apples who, despite “stringent on-boarding and security trainings,” failed to follow processes and screwed things up.

What about the recent Volkswagen diesel emissions scandal?

"This was a couple of software engineers who put [the cheating software] in for whatever reason," Michael Horn, VW's U.S. chief executive, told a House subcommittee hearing. "To my understanding, this was not a corporate decision. This was something individuals did."

And how do we deal with these folks? In the above cases, we fire them. In other cases, we suspend them, transfer them, demote them, dock their pay, prevent them from doing their previous jobs…

How comfortable do these two stories feel so far? At first glance, they’re quite coherent: we found why this incident happened, and who is responsible. It feels that we’ve dealt with the perpetrators in a way that’s proportional to their transgressions. Justice has been served, and trust has been restored. And let’s not forget the clear message we’re sending to our organizations (and the world), because after all, as the VW executive says, “the findings of this investigation do not reflect the values or who we are as a company.” Both of these are open-and-shut cases, right?

If you’ve been working with complex systems, you might feel a slight discomfort right about now, a suspicion that, comfortable as they might first feel, both of these stories are too simplistic, and far from complete. You might ask: Have we figured out how these folks were able to violate the policy or get the cheating “lines of software code” to 11 million cars? How often does that happen? Are such actions still possible—are there safeguards or are the risks of error or misconduct inherent in the system? Have we captured what the employees were thinking, how working around policy or safeguards made sense given the information they had at the time? What other tradeoffs do employees make during the normal course of doing their jobs? In Symantec’s case, do we understand how they were able to achieve such a low error rate (only 0.023% since 1995) despite having the ability to disregard the same established policies all along? In VW’s case, how was the aberrant software not detected for so long?

The stories that emerge from these questions (“the infinite hows”) are far more nuanced and richer. In this version, there is no simple single “root cause”, but many conditions, each necessary but only jointly sufficient. These stories go beyond the obvious (the known knowns) to uncover the less obvious (the known unknowns). They also highlight the fact that there are likely unknown unknowns present—unpredictable risks that might be hidden until the next incident. In fact, we might even feel anxious because we’ve just escorted out of the building the individuals most familiar with these particular errors and systems, who could help discover the unknowns. And we’ve sent another clear message to the organization: whatever you do, don’t get caught.

Daniel Kahneman writes in “Thinking, Fast and Slow” that the mere “achievement of coherence and of the cognitive ease ... causes us to accept a statement as true.” Quickly jumping to conclusions is an amazing human ability, and it serves us well most of the time. But when we blame or fall under the influence of cognitive biases, all we have is just a story that feels good, not one that’s realistic or helpful. Blame and biases are errors in judgements that severely limit our learning, and contribute to the fragility of our complex systems.

To paraphrase David Kirkpatrick’s insight on a world being consumed by software, regardless of industry your company is now a learning company, and pretending that it’s not spells serious peril. Given this, we cannot continue to construct comfortable stories—essentially fairy tales, complete with villains we blame and punish, and simple conclusions we quickly jump to—because these simplistic stories short circuit our ability to learn. Instead, we can choose to build richer, more realistic narratives for the sake of learning.

How do we go beyond blame or bias to build true learning organizations? That is the central question of my book, "Beyond Blame: Learning from Failure and Success." It’s a story of an incident that threatens the very existence of a large financial institution, and the counterintuitive steps its leadership took to stop the downward spiral. Their approach relies on complexity science, resilience engineering, human factors, cognitive science, and organizational psychology. This approach allows us to identify more underlying conditions for failure, and make our systems (and organizations) safer and more resilient. It also enables us to turn our companies into learning companies.

DevOps keeps it cool with ICE

How inclusivity, complexity, and empathy are shaping DevOps.

Over the next five years, three ideas will be central to DevOps: the need for the DevOps community to become more Inclusive; the realization that increasing Complexity of systems is the underlying reason for DevOps; and the critical role of Empathy in the growth and adoption of DevOps. Channeling John Willis, I’ll coin my own DevOps acronym, ICE, which is shorthand for Inclusivity, Complexity, Empathy.

Inclusivity

There is a major expansion of the DevOps community underway, and it’s taking DevOps far beyond its roots in agile systems administration at “unicorn” companies (e.g., Etsy or Netflix). For instance, a significant majority (80-90%) of participants at the Ghent conference were first-time attendees, and this was also the case for many of the devopsdays in 2014 (NYC, Chicago, Minneapolis, Pittsburgh, and others). Moreover, although areas outside development and operations were still underrepresented, there was a more even split between developers and operations folks than at previous events. It’s also not an accident that the DevOps Enterprise conference took place the week prior to the fifth anniversary devopsdays and included talks about the DevOps journeys at large “traditional” organizations like Blackboard, Disney, GE, Macy’s, Nordstrom, Raytheon, Target, UK.gov, US DHS, and many others.

The DevOps community has always been open and inclusive, and that’s one of the reasons why in the five years since the word “DevOps” was coined, no single, widely accepted definition or practice has emerged. The lack of definition is more of a blessing than a curse, as DevOps continues to be an open conversation about ways of making our organizations better. Within the DevOps community, old-time practitioners and “newbies” have much to learn from each other.

The inclusivity of the DevOps community extends beyond embracing different job roles or industries, as evidenced by the recent open conversations about gender diversity. The organizers of devopsdays events are actively reaching out to currently underrepresented populations (e.g., students, women, people of different ethnic backgrounds, LGBTQ+, folks outside IT). It’s a virtuous cycle: the more diverse points of view that DevOps includes, the richer and more widely applicable it becomes. Inclusivity is clearly the path for DevOps to meaningfully expand beyond just dev and ops to impact all parts of the organization (for instance, security), for all organizations.

Complex systems

More than ever, software is eating the world, and many companies are now building and operating systems of unprecedented scale. Systems of such complexity cannot be managed manually, which has lead to a wider adoption of modern configuration management and monitoring tools. This is the reason that automation and measurement have emerged as two of the key themes in DevOps, the “A” and “M” in CAMS. (It might also be the case that early DevOps practitioners naturally placed more emphasis on the technical aspects of DevOps, as opposed to the “softer” Culture and Sharing elements).

More fundamentally, the very reason that DevOps came into existence is because we are now working with (and in) complex, adaptive systems, which cannot be reasoned about in simplistic, linearly causal ways. In fact, they are often beyond human ability to comprehend — how complex systems function (and break) cannot be predicted based on past experience. Complex systems are constantly changing, and working with complex systems requires constant experimentation and continuous learning.

This is why DevOps places such a heavy emphasis on culture: without the ability to iterate on our organizations (e.g., by increasing communication between typically siloed groups), we lose our ability to successfully operate and evolve our products. Without the ability to learn from both our failures and successes (e.g., via blameless postmortems), we cannot improve the resilience of our complex systems.

Empathy

Empathy can seem out of place in a discussion about technology and organizations. However, empathy is not only about feeling what others are feeling; it is not just commiserating or sympathizing. Empathy is a two-way conversation, a way to resolve conflict and to meet people’s needs. Without an empathetic conversation, we cannot understand the needs of all the participants in our complex systems (e.g., devs, ops, finance, customers), and therefore we cannot possibly improve our systems.

We can certainly try to brute-force our way to DevOps — for instance, we can ban silos, mandate hourly deploys, and insist on automation and monitoring of “all the things.” However, if there is anything to gain from this approach, it will be short-lived. We cannot expect a wider adoption of DevOps without first understanding why the (often painful) status quo makes sense to people, and why DevOps might not initially make sense for them.

Conclusion

Empathy is at the core of many design- and user-focused disciplines and approaches (e.g., Design Thinking, User Experience and User-Centered Design, Service Design, Human Factors, Impact Mapping, etc.). It’s not surprising that empathy has been called the essence of DevOps, as it’s required for the other two emerging themes of inclusivity and complex systems. Only with empathy can we expand and build a more inclusive DevOps community. Only by having open conversations — by understanding each other’s needs — can the siloed teams resolve their conflicts and begin to work together. Empathy is also the first step in moving away from a blame-oriented, command-and-control company culture towards the blame-free, resilient learning organizations that are best suited to work with complex systems.

More fundamentally, only with empathy can we build and operate products that people need and companies where people want to work. And those are worthwhile goals for the next five years of DevOps.

Acknowledgements

I’d like to thank Patrick Debois, Bob Marshall, Bridget Kromhout, David Mortman, Dave Mangot, Yves Hanoulle, James Turnbull, Katherine Daniels, and Will Maier for their contributions to this article in particular, and to DevOps in general.

[Originally posted in on O'Reilly Radar.]

Peer 1:1s

Using Randomness to Strengthen Your Team

For many of the most inspiring leaders that I know, 1:1s with folks on their teams are sacred: the same time each week, at least 30 minutes, without fail. But what happens when you’re unavailable (e.g., traveling, or on an extended vacation) and still want folks to have someone to talk to? What if you also want to deepen relationships and improve the flow of feedback on your team? Peer 1:1s, set up randomly, can help!

Yet another 1:1 meeting?

Some of the most valuable feedback comes directly from the people that you work with most closely, ideally on a daily basis. If you are a developer or designer, you may already be getting timely, actionable feedback your team practices peer code or design reviews. However, these kinds of work-product reviews typically don’t offer an opportunity to ask and receive the kind of individual feedback that we could all use, for instance:

  • What are my strengths (what should I do more)?
  • Where can I improve?
  • How can I communicate better?
  • On a scale of 1-10, how much do you trust me?
  • What can I do to improve your work and life?

In fact, in most companies, there is no venue for such direct and timely feedback between peers. Managers are often conduits for feedback between individuals, however this can easily devolve into a game of broken telephone. Your company may be collecting some of this feedback from your peers as part of the review process. Unfortunately, this happens far too infrequently to be truly useful--feedback, like milk, has an expiration date, after which it turns sour and is best discarded. (That reviews happen so infrequently is also good news, because the vast majority of review processes are broken beyond repair and should be abandoned. I’ll explore alternatives to the typical review process in an upcoming post.)

Setting up peer 1:1s

If you let a little randomness help you, it’s simple:

  1. pick two people at random;
  2. find an open 30-minute time slot on their calendars;
  3. find (and book) a place for them to meet, in person or virtually; and
  4. pick a few questions at random from Jason Evanish’s list of 101 1:1 questions or Seth Godin's What's Next?

You may also choose to have an executive assistant or the people themselves do steps 2 and 3. To make things even simpler for whoever is running the peer 1:1 process, I’ve open-sourced a Google Apps Script that we’ve used at Next Big Sound along with a Google Spreadsheet that can get you to randomly pair people in no time. (Pull requests are most welcome!)

Why use randomness?

First, it’s much quicker than manually optimizing the pairings, especially in larger teams. Second, the risk of randomly selecting folks who under no circumstances should be in the same room--or randomly selecting questions that should never be asked--is also fairly low in most organizations. Of course, if something is not right, you can always re-run the randomizer. Most important, randomness often produces inspired choices (of both people and questions) that you would not think to make. Here are some recent peer 1:1 pairings and questions at Next Big Sound:

  • two engineers who’ve been working quite closely recently discussing what “the company [is] not doing today that we should do to better compete in the market”;
  • a UX Designer and a Systems Engineer who’ll be talking about the latter’s “biggest time wasters”; and
  • a Data Journalist and our VP of Operation who might share their tips “for getting unstuck”.

The benefits of peer 1:1s

Peer 1:1s may be a little uncomfortable at first, especially for folks who may not know each other well. This is why we only pair people who opt into the process, and also make what is discussed during these 1:1s confidential. This is also one of the main benefits of peer 1:1s--they help connect individuals who may not have an opportunity to frequently interact at work in a relaxed, low-risk setting. While these folks may not share much (yet), they do have in common the experience of working at the same company, or perhaps even with the same manager. The few random leading questions--and the shared discomfort of being selected at random--are great starting points for the conversation!

If the paired individuals do happen to work together more closely, peer 1:1s turn into a forum to offer each other direct and timely feedback, and strengthen existing relationships. Finally, peer 1:1s give folks some experience of what it’s like to run 1:1s as a manager, and may inform their decision to pursue a career in management (or run away screaming).

Of course, peer 1:1s are not meant to be substitutes for “regular” 1:1s. (Rands’ The Update, The Vent, and The Disaster is the classic post on why such 1:1s are so important to the health of the organization.) But they do give people who opt in a meaningful opportunity to strengthen the organization in only 30 minutes a week.

UPDATE (October 30, 2015): After I originally published this blog post, I found out that a number of organizations are practicing random peer 1:1s. Most notably, Etsy does this across the entire organization, and has open-sourced a helpful tool, Mixer, for making such "assisted serendipity" happen.

Global Retrospective: devops, the First 5 Years

Devops is officially 5 years old. In the time since the inaugural devopsdays event in Ghent in 2009, it has evolved from an idea about agile infrastructure to an emerging organizational philosophy (or practice), one that even huge, mainstream enterprises are adopting. Devops is also an open, vibrant, and diverse community of practitioners (or philosophers), who are actively debating culture, automation, measurement, and sharing both their successes and failures openly. The theme for the 5-year-anniversary devopsdays gathering in Ghent, Belgium is “the future of #devops”. This is a natural place and time to pause and reflect about how far we’ve come and where we’re going. To that end, part of this devopsdays will be devoted to a retrospective (a blameless postmortem of sorts). On the first day, October 27, Yves Hanoulle and I will have a place, where attendees can write down their observations and ideas about the past and the future of devops on post-it notes, placing them into one of 3 areas: stop, start, or continue.

  • What hasn’t worked well in the devops movement, and we should stop doing? Place in the “stop” area.
  • What could we do in the future to make devops even more successful (by some measure of success)? Add it to the “start” area.
  • What has devops gotten right, and practitioners should keep doing? Add to the “continue” area.

Can’t make it to Ghent? No worries! You can participate in the historic global devops retrospective on twitter, by using #devopsstop, #devopsstart, and #devopscontinue hashtags. Yves and I will collate and summarize all the ideas received by 17:00 (5PM) Belgium time on October 27, and will present the results at the conference and in a blog post on this site on the following day.

On the second day of the conference (October 28), there will be 3 open spaces devoted to “fleshing out” one (or more) of the ideas that we’ve all come up with during the retrospective. These ideas will likely come from the “stop” or “start” categories--we’ll have the what and the why, and the open spaces will help us brainstorm how we get there and who will be leading the way. In addition, each open space will conduct a premortem to identify potential problems with these ideas.

Finally, each of the groups will produce and share a blog post about the results of their open space, and nominate one or more people to represent the open space during the combined Retrospective Podcast with Devops Cafe, FoodFightShow, Arrested DevOps, The Ship Show.

Devops is a global phenomenon, continually shaped by its far-flung and inclusive community. We hope you take this opportunity to participate in the retrospective--in person or on twitter--and to make the next 5 years of devops even more awesome!

Update [October 27]: The raw results are here.

How devopsdays NYC built a well for a village in Cambodia. (A #devopsWater update)

Last year, the attendees of the devopsdays NYC conference used the money usually spent on t-shirts to drill a deep-bore water well for a village in Cambodia. They donated $2500 ($12/person) to Lotus Outreach, which quickly set out to find a suitable location, and a local partner organization to oversee the construction of the well. The result?

On July 5, 2014, clean, safe water started flowing from a newly-built well in the isolated Brormoay Commune in the Rike Reay Village, Veal Veng district, Pursat province, Cambodia. The well now serves 81 villagers, and even more people from the surrounding area during the dry months. The 36 village children, of whom 15 are girls, will no longer have to miss school and risk their lives to fetch water far away from their homes. The families no longer have to spend money to purchase water instead of paying their kids’ school fees.

Those of us who attend technology conferences are some of the most well-paid and financially secure people in the world. We can certainly afford to buy our own t-shits, and spare the landfills the other “swag” routinely given out for free at conferences.

So the next time you register for a tech conference, ask the organizers to donate the money they would otherwise use for t-shirts to a worthy cause. If you’re organizing a conference, give your attendees the option to donate part of their registration fee to charity. In a small way, the world will be a better place.

Here's the full report on the devopsdays NYC well.

The importance of attribution in nascent fields and communities

In the rush to be original, innovative, provocative, or first-to-market, we often forget to acknowledge “prior art” or provide the context for the new ideas that we’re espousing.  The resulting lack of credibility is one of the most serious threats to emergent fields and their practitioner communities (such as devops or systems safety). Would devops exist without ITIL or the work of Deming? Would Agile exist without Waterfall? Would the all-electric Tesla Model S exist without the hybrid Prius and the gas-guzzling Hummer?

That is not to say that there’s nothing new under the sun. However, even the most groundbreaking ideas do not exist in a vacuum, but only in relation to previous ideas. They build on—or refute—what came before.  Humans suffer from a built-in resistance to change, and when new ideas are presented without proper context or attribution, they risk becoming just someone's brilliant ideas, too easy to dismiss or accept, depending on the person’s popularity, without full and critical evaluation.

In science, it’s simply not enough to receive new ideas in dreams or visions; ideas that stick must have solid foundations, and often come with bibliographies many pages long.

Want to build your or your idea's credibility? Want to strengthen your nascent field or emerging community? Emphasize their lineage, and give full attribution.

An open letter to #1 Recruiter From #1 Hedge Fund In The World

Recently, a recruiter (who I'm lovingly calling "#1 Recruiter") sent this gem on LinkedIn with the subject "I would like to talk to you":

I work at [Company]  (#1 Hedge Fund in the world), reviewed your profile and I would like to talk to you. Please let me know your availability to connect next week.

I tweeted and ignored the SPAM, but a few days later, #1 Recruiter followed up:

I am following up with you because I work at [Company] (#1 Hedge Fund in the world), reviewed your profile and I would like to talk to you. Please let me know your availability to connect next week.

Notice the expert use of copy-paste. To be fair, he did include a few extra links with information about the company, including their "Culture and Principles" web page. Nice touch!

This interaction neatly summarizes just about everything that's wrong with recruiting (and LinkedIn). So instead of ignoring, this time I wrote a brief reply, cc:ing the CEO of #1 hedge fund in the world:

[#1 Recruiter],

If you actually reviewed my profile, you would see that I know at least half a dozen people who currently work at [Company]. I am *quite* familiar with the company, and appreciate its culture.

One of the core tenets of your company's culture is radical openness and honesty. With that in mind, I'd like to be open and honest with you. What you've sent is SPAM. It reeks of mediocrity, the opposite of your company's "overriding objective [of] excellence". Stating only that you work for a company with money ("#1 hedge fund in the world") as the reason to connect will not net you people who "value independent thinking and innovation". I'd be weary of anyone who actually responds to your message (and I'd guess only about 1 or 2 out of a 100 do); they, like you, hate their jobs and are just looking for money.

If you truly are looking for people who seek and can create "meaningful work and meaningful relationships", why do you approach those you're trying to recruit in such an utterly meaningless, repulsive way?

Why not take the time to actually tell candidates what working at [Company] would look like? Why not take 5 minutes to highlight the specific parts of the person's background that stood out to you, and that would be especially relevant at [Company]? It would save you time in the log run, and help you find amazing candidates.

It's not difficult, but it does require you to make a decision about which business you'd like to be in: selling counterfeit Viagra, or representing (favorably) the #1 hedge fund in the world.

prescription-strength offline mode

It might be surprising--ironic, even--for the first post of any blog to be about taking time to be offline. Being offline is hard, even if your work doesn’t require you to be online 24x7. But being offline appears to have similar benefits to taking medical cannabis. In an opinion piece in New York Times, Mark Wolfe describes great improvements in his parenting abilities after being prescribed pot-infused brownies. He became less distracted, more patient, and more engaged with his kids, as evidenced by the following before and after interactions:

Here is what a typical weekday evening exchange between me and my oldest daughter once looked like:

Child: Daddy, can you show me how to make a Q?

Father: (sipping bourbon and soda, not looking up from iPad) Just make a circle and put a little squiggle at the bottom.

Child: No, show me!

Father: Sweetie, not now, O.K.? Daddy’s tired.

It’s different now:

Child: Daddy, can you show me how to make a Q?

Father: (getting down on the floor) Here, I’ll hold your hand while you hold the pen and we’ll make one together. There! We made a Q! Isn’t it fantastic?

Child: Thanks, Daddy!

Father: Don’t you just love the shape of this pen?

In my experience, many of the benefits of prescription-strength brownies that Mark describes are available without a prescription by simply being offline for a while.

At the end of August, after a particularly grueling few months at work, I took a vacation with my family. Although the house was perfectly wired, I decided to disconnect completely: no phone, no SMS, no e-mail, no twitter, no web. (I didn’t have to worry about Facebook because I committed “Facebook suicide” a few years ago).

I was completely offline for 10 days.

The first few days were rough. It was hard to not have my phone with me at all times--my whole life was, seemingly, on this device. Every time I would get bored or impatient with the kids, I would instinctively reach for it. Every time I would go to the bathroom, I’d feel the urge to do as nature intended, i.e., read Hacker News. Or to check work e-mail, to make sure things were running smoothly.

I didn’t give in, and after a few days, the withdrawal symptoms started to subside. I was becoming calmer, more engaged and connected with my family. Not surprisingly, the kids responded in kind--they were noticeably more relaxed, happier, and there were far fewer tantrums. (Well, there was one notable tantrum in which my oldest raged about the injustice of wearing not the optimally esthetically pleasing shorts on the way home from the beach. But even under couture-induced stress, I remained more open and present than usual).

Coming back from vacation, I hesitated before getting back online--I really appreciated the lack of distraction that I experienced after disconnecting for 10 days. After checking if anything required immediate attention, I took about a week to ease back into it. And I’m happy to report that some of my vacation-time habits stuck: when I’m at home, I put away the phone, and generally stay offline as much as possible. After all, isn’t lingering for an extra minute with a tired child worth not being completely caught up on twitter? And sweeter than a brownie?