jtimberman's Code Blog

Chef, Ops, Ruby, Linux/Unix. Opinions are mine, not my employer's (CHEF).

Evolution of Cookbook Development

In this post, I will explore some development patterns that I’ve seen (and done!) with Chef cookbooks, and then explain how we can evolve to a new level of cookbook development. The examples here come from Chef’s new chef-splunk cookbook, which is a refactored version of an old splunk42 cookbook. While there is a public splunk cookbook on the Chef community site, it shares some of the issues that I saw with our old one, which are partially subject matter of this post.

Anyway, on to the evolution!

Sub-optimal patterns

These are the general patterns I’m going to address.

  • Composing URLs from multiple local variables or attributes
  • Large conditional logic branches like case statements in recipes
  • Not using definitions when it is best to do so
  • Knowledge of how node run lists are composed for search, or searching for “role:some-server
  • Repeated resources across multiple orthogonal recipes
  • Plaintext secrets in attributes or data bag items

Cookbook development is a wide and varied topic, so there are many other patterns to consider, but these are the ones most relevant to the refactored cookbook.

Composing URLs

It may seem like a good idea, to compose URL strings as attributes or local variables in a recipe based on other attributes and local variables. For example, in our splunk42 cookbook we have this:

1
2
3
4
5
splunk_root = "http://download.splunk.com/releases/"
splunk_version = "4.2.1"
splunk_build = "98164"
splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-linux-2.6-amd64.deb"
os = node['os'].gsub(/\d*/, '')

These get used in the following remote_file resource:

1
2
3
4
remote_file "/opt/#{splunk_file}" do
  source "#{splunk_root}/#{splunk_version}/universalforwarder/#{os}/#{splunk_file}"
  action :create_if_missing
end

We reused the filename variable, and composed the URL to the file to download. Then to upgrade, we can simply modify the splunk_version and splunk_build, as Splunk uses a consistent naming theme for their package URLs (thanks, Splunk!). The filename itself is built from a case statement (more on that in the next section). We could further make the version and build attributes, so users can update to newer versions by simply changing the attribute.

So what is bad about this? Two things.

  1. This is in the splunk42::client recipe, and repeated again in the splunk42::server recipe with only minor differences (the package name, splunk vs splunkforwarder).
  2. Ruby has excellent libraries for manipulating URIs and paths as strings, and it is easier to break up a string than compose a new one.

How can this be improved? First, we can set attributes for the full URL. The actual code for that is below, but suffice to say, it will look like this (note the version is different because the new cookbook installs a new Splunk version).

1
default['splunk']['forwarder']['url'] = 'http://download.splunk.com/releases/6.0.1/universalforwarder/linux/splunkforwarder-6.0.1-189883-linux-2.6-amd64.deb'

Second, we have helper libraries distributed with the cookbook that break up the URI so we can return just the package filename.

1
2
3
4
5
def splunk_file(uri)
  require 'pathname'
  require 'uri'
  Pathname.new(URI.parse(uri).path).basename.to_s
end

The previous remote_file resource is rewritten like this:

1
2
3
4
remote_file "/opt/#{splunk_file(node['splunk']['forwarder']['url'])}" do
  source node['splunk']['forwarder']['url']
  action :create_if_missing
end

As a bonus, the helper methods are available in other places like other cookbooks and recipes, rather than the local scope of local variables.

Conditional Logic Branches

One of the wonderful things about Chef is that simple Ruby conditionals can be used in recipes to selectively set values for resource attributes, define resources that should be used, and other decisions. One of the horrible things about Chef is that simple Ruby conditionals can be used in recipes and often end up being far more complicated than originally intended, especially when handling multiple platforms and versions.

In the earlier example, we had a splunk_file local variable set in a recipe. I mentioned it was built from a case statement, which looks like this, in full:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
splunk_file = case node['platform_family']
  when "rhel"
    if node['kernel']['machine'] == "x86_64"
      splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-linux-2.6-x86_64.rpm"
    else
      splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}.i386.rpm"
    end
  when "debian"
    if node['kernel']['machine'] == "x86_64"
      splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-linux-2.6-amd64.deb"
    else
      splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-linux-2.6-intel.deb"
    end
  when "omnios"
    splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-solaris-10-intel.pkg.Z"
  end

Splunk itself supports many platforms, and not all of them are covered by this conditional, so it’s easy to imagine how this can get further out of control and make the recipe even harder to follow. Also consider that this is just the client portion for the splunkforwarder package, this same block is repeated in the server recipe, for the splunk package.

So why is this bad? There are three reasons.

  1. We have a large block of conditionals that sit in front of a user reading a recipe.
  2. This logic isn’t reusable elsewhere, so it has to be duplicated in the other recipe.
  3. This is only the logic for the package filename, but we care about the entire URL. I’ve also covered that composing URLs isn’t delightful.

What is a better approach? Use the full URL as I mentioned before, and set it as an attribute. We will still have the gnarly case statement, but it will be tucked away in the attributes/default.rb file, and hidden from anyone reading the recipe (which is the thing they probably care most about reading).

1
2
3
4
5
6
7
8
9
10
11
case node['platform_family']
when 'rhel'
  if node['kernel']['machine'] == 'x86_64'
    default['splunk']['forwarder']['url'] = 'http://download.splunk.com/releases/6.0.1/universalforwarder/linux/splunkforwarder-6.0.1-189883-linux-2.6-x86_64.rpm'
    default['splunk']['server']['url'] = 'http://download.splunk.com/releases/6.0.1/splunk/linux/splunk-6.0.1-189883-linux-2.6-x86_64.rpm'
  else
    default['splunk']['forwarder']['url'] = 'http://download.splunk.com/releases/6.0.1/universalforwarder/linux/splunkforwarder-6.0.1-189883.i386.rpm'
    default['splunk']['server']['url'] = 'http://download.splunk.com/releases/6.0.1/splunk/linux/splunk-6.0.1-189883.i386.rpm'
  end
when 'debian'
  # ...

The the complete case block can be viewed in the repository. Also, since this is an attribute, consumers of this cookbook can set the URL to whatever they want, including a local HTTP server.

Another example of gnarly conditional logic looks like this, also from the splunk42::client recipe.

1
2
3
4
5
6
7
8
9
10
11
12
13
case node['platform_family']
when "rhel"
  rpm_package "/opt/#{splunk_file}" do
    source "/opt/#{splunk_file}"
  end
when "debian"
  dpkg_package "/opt/#{splunk_file}" do
    source "/opt/#{splunk_file}"
  end
when "omnios"
  # tl;dr, this was more lines than you want to read, and
  # will be covered in the next section.
end

Why is this bad? After all, we’re selecting the proper package resource to install from a local file on disk. The main issue is the conditional creates different resources that can’t be looked up in the resource collection. Our recipe doesn’t do this, but perhaps a wrapper cookbook would. The consumer wrapping the cookbook has to duplicate this logic in their own. Instead, it is better to select the provider for a single package resource.

1
2
3
4
5
6
7
8
9
10
package "/opt/#{splunk_file(node['splunk']['forwarder']['url'])}" do
  case node['platform_family']
  when 'rhel'
    provider Chef::Provider::Package::Rpm
  when 'debian'
    provider Chef::Provider::Package::Dpkg
  when 'omnios'
    provider Chef::Provider::Package::Solaris
  end
end

Definitions Aren’t Bad

Definitions are simply defined as recipe “macros.” They are not actually Chef Resources themselves, they just look like them, and contain their own Chef resources. This has some disadvantages, such as lack of metaparameters (like action), which has lead people to prefer using the “Lightweight Resource/Provider” (LWRP) DSL instead. In fact, some feel that definitions are bad, and that one should feel bad for using them. I argue that they have their place. One advantage is their relative simplicity.

In our splunk42 cookbook, the client and server recipes duplicate a lot of logic. As mentioned a lot of this is case statements for the Splunk package file. They also repeat the same logic for choosing the provider to install the package. I snipped the content from the when "omnios" block, but it looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
cache_dir = Chef::Config[:file_cache_path]
splunk_pkg = splunk_file.gsub(/\.Z/, '')

execute "uncompress /opt/#{splunk_file}" do
  not_if { ::File.exists?(splunk_cmd) }
end

cookbook_file "#{cache_dir}/splunk-nocheck" do
  source "splunk-nocheck"
end

file "#{cache_dir}/splunkforwarder-response" do
  content "BASEDIR=/opt"
end

pkgopts = ["-a #{cache_dir}/splunk-nocheck",
           "-r #{cache_dir}/splunkforwarder-response"]

package "splunkforwarder" do
  source "/opt/#{splunk_pkg}"
  options pkgopts.join(' ')
  provider Chef::Provider::Package::Solaris
end

(Note: the logic for setting the provider is required since we’re not using the default over-the-network package providers, and installing from a local file on the system.)

This isn’t too bad on its own, but needs to be repeated again in the server recipe if one wanted to run a Splunk server on OmniOS. The actual differences between the client and server package installation are the package name, splunkforwarder vs splunk. The earlier URL attribute example established a forwarder and server attribute. Using a definition, named splunk_installer, allows us to simplify the package installation used by the client and server recipes to look like this:

1
2
3
4
5
6
splunk_installer 'splunkforwarder' do
  url node['splunk']['forwarder']['url']
end
splunk_installer 'splunk' do
  url node['splunk']['server']['url']
end

How is this better than an LWRP? Simply that there was less ceremony in creating it. There is less cognitive load for a cookbook developer to worry about. Definitions by their very nature of containing resources are already idempotent and convergent with no additional effort. They also automatically support why-run mode, whereas in an LWRP that must be done by the developer. Finally, between resources in the definition and the rest of the Chef run, notifications may be sent.

Contrast this to an LWRP, we need resources and providers directories, and the attributes of the resource need to be defined in the resource. Then the action methods need to be written in the provider. If we’re using inline resources (which we are) we need to declare those so any notifications work. Finally, we should ensure that why-run works properly.

The actual definition is ~40 lines, and can be viewed in the cookbook repository. I don’t have a comparable LWRP for this, but suffice to say that it would be longer and more complicated than the definition.

Reasonability About Search

Search is one of the killer features of running a Chef Server. Dynamically configuring load balancer configuration, or finding the master database server is simple with a search. Because we often think about the functionality a service provides based on the role it serves, we end up doing searches that look like this:

1
splunk_servers = search(:node, "role:splunk-server")

Then we do something with splunk_servers, like send it to a template. What if someone doesn’t like the role name? Then we have to do something like this:

1
splunk_servers = search(:node, "role:#{node['splunk']['server_role']}")

Then consumers of the cookbook can use whatever server role name they want, and just update the attribute for it. But, the internet has said that roles are bad, so we shouldn’t use them (even though they aren’t ;)). So instead, we need something like one of these queries:

1
2
3
splunk_servers = search(:node, "recipes:splunk42\:\:server")
#or
splunk_servers = search(:node, "#{node['splunk']['server_search_query']}")

The problem with the first is similar to the problem with the first (role:splunk-server), we need knowledge about the run list in order to search properly. The problem with the second is that we now have to worry about constructing a query properly as a string that gets interpolated correctly.

How can we improve this? I think it is more “Chef-like” to use an attribute on the server’s node object itself that informs queries the intention that the node is in fact a Splunk server. In our chef-splunk cookbook, we use node['splunk']['is_server']. The query looks like this:

1
splunk_servers = search(:node, "splunk_is_server:true")

This reads clearly, and the is_server attribute can be set in one of 15 places (for good or bad, but that’s a different post).

Repeating Resources, Composable Recipes

In the past, it was deemed okay to repeat resources across recipes when those recipes were not included on the same node. For example, client and server recipes that have similar resource requirements, but may pass in separate data. Another example is in the haproxy) cookbook I wrote where one recipe statically manages the configuration files, and the other uses a Chef search to populate the configuration.

As I have mentioned above, a lot of code was duplicated between the client and server recipes for our splunk42 cookbook: user and group, the case statements, package resources, execute statements (that haven’t been shared here), and the service resource. It is definitely important to ensure that all the resources needed to converge a recipe are defined, particularly when using notifications. That is why sometimes a recipe will have a service resource with no actions like this:

1
service 'mything'

However Chef 11will generate a warning about cloned resources when they are repeated in the same Chef run.

Why is this bad? Well, CHEF-3694 explains in more detail that particular issue, of cloned resources. The other reason is that it makes recipes harder to reuse when they have a larger scope than absolutely necessary. How can we make this better? A solution to this is to write small, composable recipes that contain resources that may be optional for certain use cases. For example, we can put the service resource in a recipe and include that:

1
2
3
4
5
service 'splunk' do
  supports :status => true, :restart => true
  provider Chef::Provider::Service::Init
  action :start
end

Then when we need to make sure we have the service resource available (e.g., for notifications):

1
2
3
4
5
6
7
template "#{splunk_dir}/etc/system/local/outputs.conf" do
  source 'outputs.conf.erb'
  mode 0644
  variables :splunk_servers => splunk_servers
  notifies :restart, 'service[splunk]'
end
include_recipe 'chef-splunk::service'

Note that the service is included after the resource that notifies it. This is a feature of the notification system, where the notified resource can appear anywhere in the resource collection, and brings up another excellent practice, which is to declare service resources after other resources which affect their configuration. This prevents a race condition where, if a bad config is deployed, the service would attempt to start, fail, and cause the Chef run to exit before the config file could correct the problem.

Making recipes composable in this way means that users can pick and choose the ones they want. Our chef-splunk cookbook has a prescriptive default recipe, but the client and server recipes mainly include the others they need. If someone doesn’t share our opinion on this for their use case, they can pick and choose the ones they want. Perhaps they have the splunk user and group created on systems through some other means. They won’t need the chef-splunk::user recipe, and can write their own wrapper to handle that. Overall this is good, though it does mean there are multiple places where a user must look to follow a recipe.

Plaintext Secrets

Managing secrets is one of the hardest problems to solve in system administration and configuration management. In Chef, it is very easy to simply set attributes, or use data bag items for authentication credentials. Our old splunk42 cookbook had this:

1
splunk_password = node[:splunk][:auth].split(':')[1]

Where node[:splunk][:auth] was set in a role with the username:password. This isn’t particularly bad since our Chef server runs on a private network and is secured with HTTPS and RSA keys, but a defense in depth security posture has more controls in place for secrets.

How can this be improved? At Chef, we started using Chef Vault to manage secrets. I wrote a post about chef-vault a few months ago, so I won’t dig too deep into the details here. The current chef-splunk cookbook loads the authentication information like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
splunk_auth_info = chef_vault_item(:vault, "splunk_#{node.chef_environment}")['auth']
user, pw = splunk_auth_info.split(':')

execute "#{splunk_cmd} edit user #{user} -password '#{pw}' -role admin -auth admin:changeme" do
  not_if { ::File.exists?("#{splunk_dir}/etc/.setup_#{user}_password") }
end

file "#{splunk_dir}/etc/.setup_#{user}_password" do
  content 'true\n'
  owner 'root'
  group 'root'
  mode 00600
end

The first line loads the authentication information from the encrypted-with-chef-vault data bag item. Then we make a couple of convenient local variables, and change the password from Splunk’s built-in default. Then, we control convergence of the execute by writing a file that indicates that the password has been set.

The advantage of this over attributes or data bag items is that the content is encrypted. The advantage over regular encrypted data bags is that we don’t need to distribute the secret key out to every system, we can update the list of nodes that have access with a knife command.

Conclusion

Neither Chef (the company), nor I are here to tell anyone how to write cookbooks. One of the benefits of Chef (the product) is its flexibility, allowing users to write blocks of Ruby code in recipes that quickly solve an immediate problem. That’s how we got to where we were with splunk42, and we certainly have other cookbooks that can be refactored similarly. When it comes to sharing cookbooks with the community, well-factored, easy to follow, understand, and use code is preferred.

Many of the ideas here came from community members like Miah Johnson, Noah Kantrowitz, Jamie Winsor, and Mike Fiedler. I owe them thanks for challenging me over the years on a lot of the older patterns that I held onto. Together we can build better automation through cookbooks, and a strong collaborative community. I hope this information is helpful to those goals.

Managing Multiple AWS Account Credentials

UPDATE: All non-default profiles must have their profile name start with “profile.” Below, this is “profile nondefault.” The ruby code is updated to reflect this.

In this post, I will describe my local setup for using the AWS CLI, the AWS Ruby SDK, and of course the Knife EC2 plugin.

The general practice I’ve used is to set the appropriate shell environment variables that are used by default by these tools (and the “legacy” ec2-api-tools, the java-based CLI). Over time and between tools, there have been several environment variables set:

1
2
3
4
5
6
7
8
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
AWS_SSH_KEY
AMAZON_ACCESS_KEY_ID
AMAZON_SECRET_ACCESS_KEY
AWS_ACCESS_KEY
AWS_SECRET_KEY

There is now a config file (ini-flavored) that can be used to set credentials, ~/.aws/config. Each ini section in this file is a different account’s credentials. For example:

1
2
3
4
5
6
7
8
[default]
aws_access_key_id=MY_DEFAULT_KEY
aws_secret_access_key=MY_DEFAULT_SECRET
region=us-east-1
[profile nondefault]
aws_access_key_id=NOT_MY_DEFAULT_KEY
aws_secret_access_key=NOT_MY_DEFAULT_SECRET
region=us-east-1

I have two accounts listed here. Obviously, the actual keys are not listed :). I source a shell script that sets the environment variables with these values. Before, I maintained a separate script for each account. Now, I install the inifile RubyGem and use a one-liner for each of the keys.

1
2
3
4
export AWS_ACCESS_KEY_ID=`ruby -rinifile -e "puts IniFile.load(File.join(File.expand_path('~'), '.aws', 'config'))['default']['aws_access_key_id']"`
export AWS_SECRET_ACCESS_KEY=`ruby -rinifile -e "puts IniFile.load(File.join(File.expand_path('~'), '.aws', 'config'))['default']['aws_secret_access_key']"`
export AWS_DEFAULT_REGION="us-east-1"
export AWS_SSH_KEY='jtimberman'

This will load the specified file, ~/.aws/config with the IniFile.load method, retrieving the default section’s aws_access_key_id value. Then repeat the same for the aws_secret_access_key.

To use the nondefault profile:

1
2
export AWS_ACCESS_KEY_ID=`ruby -rinifile -e "puts IniFile.load(File.join(File.expand_path('~'), '.aws', 'config'))['profile nondefault']['aws_access_key_id']"`
export AWS_SECRET_ACCESS_KEY=`ruby -rinifile -e "puts IniFile.load(File.join(File.expand_path('~'), '.aws', 'config'))['profile nondefault']['aws_secret_access_key']"`

Note that this uses ['profile nondefault'].

Since different tools historically have used slightly different environment variables, I export those too:

1
2
3
4
export AMAZON_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AMAZON_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
export AWS_ACCESS_KEY=$AWS_ACCESS_KEY_ID
export AWS_SECRET_KEY=$AWS_SECRET_ACCESS_KEY

I create a separate config script for each account.

The AWS CLI tool will automatically use the ~/.aws/config, and can load different profiles with the --profile option. The aws-sdk Ruby library will use the environment variables, however. So authentication in a Ruby script is automatically set up.

1
2
require 'aws-sdk'
iam = AWS::IAM.new

Without this, it would be:

1
2
3
require 'aws-sdk'
iam = AWS::IAM.new(:access_key_id => 'YOUR_ACCESS_KEY_ID',
                   :secret_access_key => 'YOUR_SECRET_ACCESS_KEY')

Which is a little ornerous.

To use this with knife-ec2, I have the following in my .chef/knife.rb:

1
2
knife[:aws_access_key_id]      = ENV['AWS_ACCESS_KEY_ID']
knife[:aws_secret_access_key]  = ENV['AWS_SECRET_ACCESS_KEY']

Naturally, since knife.rb is Ruby, I could use Inifile.load there, but I only started using that library recently, and I have my knife configuration setup already.

Preview Chef Client Local Mode

Opscode Developer John Keiser mentioned that a feature for Chef Zero he’s been working on, “local mode,” is now in Chef’s master branch. This means it should be in the next release (11.8). I took the liberty to check this unreleased feature out.

Let’s just say, it’s super awesome and John has done some amazing work here.

PREVIEW

This is a preview of an unreleased feature in Chef. All standard disclaimers apply :).

Install

This is in the master branch of Chef, not released as a gem yet. You’ll need to get the source and build a gem locally. This totally assumes you’ve installed a sane ruby and bundler on your system.

1
2
3
4
5
git clone git://github.com/opscode/chef.git
cd chef
bundle install
bundle exec rake gem
gem install  pkg/chef-11.8.0.alpha.0.gem

Note Alpha!

Setup

Next, point it at a local repository. I’ll use a simple example.

1
2
3
4
git clone git://github.com/opscode/chef-repo.git
cd chef-repo
knife cookbook create zero -o ./cookbooks
vi cookbooks/zero/recipes/default.rb

I created a fairly trivial example recipe to show that this will support search, and data bag items:

1
2
3
4
5
6
7
8
9
10
a = search(:node, "*:*")
b = data_bag_item("zero", "fluff")

file "/tmp/zerofiles" do
  content a[0].to_s
end

file "/tmp/fluff" do
  content b.to_s
end

This simply searches for all nodes, and uses the content of the first node (the one we’re running on presumably) for a file in /tmp. It also loads a data bag item (which I created) and uses it for the content of another file in /tmp.

1
2
mkdir -p data_bags/zero
vi data_bags/zero/fluff.json

The data bag item:

1
2
3
4
{
  "id": "fluff",
  "clouds": "Are fluffy"
}

Converge!

Now, converge the node:

1
chef-client -z -o zero

The -z, or --local-mode argument is the magic that sets up Chef Zero, and loads all the contents of the repository. The -o zero tells Chef to use a one time run list of the “zero” recipe.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[2013-10-10T23:53:32-06:00] WARN: No config file found or specified on command line, not loading.
Starting Chef Client, version 11.8.0.alpha.0
[2013-10-10T23:53:36-06:00] WARN: Run List override has been provided.
[2013-10-10T23:53:36-06:00] WARN: Original Run List: [recipe[zero]]
[2013-10-10T23:53:36-06:00] WARN: Overridden Run List: [recipe[zero]]
resolving cookbooks for run list: ["zero"]
Synchronizing Cookbooks:
  - zero
Compiling Cookbooks...
Converging 2 resources
Recipe: zero::default
  * file[/tmp/zerofiles] action create
    - create new file /tmp/zerofiles
    - update content in file /tmp/zerofiles from none to 0a038a
        --- /tmp/zerofiles      2013-10-10 23:53:36.368059768 -0600
        +++ /tmp/.zerofiles20131010-6903-10cvytu        2013-10-10 23:53:36.368059768 -0600
        @@ -1 +1,2 @@
        +node[jenkins.int.housepub.org]
  * file[/tmp/fluff] action create
    - create new file /tmp/fluff
    - update content in file /tmp/fluff from none to d46bab
        --- /tmp/fluff  2013-10-10 23:53:36.372059683 -0600
        +++ /tmp/.fluff20131010-6903-1l3i1h     2013-10-10 23:53:36.372059683 -0600
        @@ -1 +1,2 @@
        +data_bag_item[fluff]
Chef Client finished, 2 resources updated

The diff output from each of the file resources shows that the content does in fact come from the search (a node object was returned) and a data bag item (a data bag item object was returned).

What’s Next?

Since this is a feature of Chef, it will be documented and released, so look for that in the next version of Chef.

I can see this used for testing purposes, especially for recipes that make use of combinations of data bags and search, such as Opscode’s nagios cookbook.

Questions

  • Does it work with Berkshelf?

I don’t know. Probably not (yet).

  • Does it work with Test Kitchen?

I don’t know. Probalby not (yet). Provisioners in test-kitchen would need to be (re)written.

  • Should I use this in production?

This is an unreleased feature in the master branch. What do you think? :)

  • When will this be released?

I don’t know the schedule for 11.8.0. Soon?

  • Where do I find out more, or get involved?

Join #chef-hacking in irc.freenode.net, the chef-dev mailing list, or attend the Chef Community Summit (November 12-13, 2013 in Seattle).

Switching MyOpenID to Google OpenID

You may be aware that MyOpenID is shutting down in February 2014.

The next best thing to use IMO, is Google’s OpenID, since they have 2-factor authentication. Google doesn’t really expose the OpenID URL in a way that makes it as easy to use as “username.myopenid.com.” Fortunately, it’s relatively simple to add to a custom domain hosted by, for example, GitHub pages. My coworker, Stephen Delano, pointed me to this pro-tip.

The requirement is to put a <link> tag in the HTML header of the site. It should look like this:

1
2
<link rel="openid2.provider" href="https://www.google.com/accounts/o8/ud?source=profiles" />
<link rel="openid2.local_id" href="http://www.google.com/profiles/A_UNIQUE_GOOGLE_PROFILE_ID />

Obviously you need a Google Profile, but anyone interested in doing this probably has a Google+ account for Google Hangouts anyway :).

If you’re like me and have your custom domain hosted as an Octopress blog, this goes in source/_includes/custom/head.html. Then deploy the site and in a few moments you’ll be able to start using your site as an OpenID.

Managing Secrets With Chef Vault

Two years ago, I wrote a post about using Chef encrypted data bags for SASL authentication with Postfix. At the time, my ISP didn’t allow non-authenticated SMTP, so I had to find a solution so I could get cronspam and other vital email from my servers at home. I’ve since switched ISPs to one that doesn’t care so much about this, so I’m not using any of that code anymore.

However, that doesn’t mean I don’t have secrets to manage! I actually don’t for my personal systems due to what I’m managing with Chef now, but we certainly do for Opscode’s hosted Enterprise Chef environment. The usual suspects for any web application are required: database passwords, SSL certificates, service API tokens, etc.

We’re evaluating chef-vault as a possible solution. This blog post will serve as notes for me so I can remember what I did when my terminal history is gone, and hopefully information for you to be able to use in your own environment.

Chef Vault

Chef Vault is an open source project published by Nordstrom. It is distributed as a RubyGem. You’ll need it installed on your local workstation so you can encrypt sensitive secrets, and on any systems that need to decrypt said secrets. Since the workstation is where we’re going to start, install the gem. I’ll talk about using this in a recipe later.

1
% gem install chef-vault

Use Cases

Now, for the use cases, I’m going to take two fairly simple examples, and explain how chef-vault works along the way.

  1. A username/password combination. The vaultuser will be created on the system with Chef’s built-in user resource.
  2. A file with sensitive content. In this case, I’m going to use a junk RSA private key for vaultuser.

Secrets are generally one of these things. Either a value passed into a command-line program (like useradd) or a file that should live on disk (like an SSL certificate or RSA key).

Command-line Structure

Chef Vault includes knife plugins to allow you to manage the secrets from your workstation, uploading them to the Chef Server just like normal data bags. The secrets themselves live in Data Bags on the Chef Server. The “bag” is called the “vault” for chef-vault.

After installation, the encrypt and decrypt sub-commands will be available for knife.

1
2
3
4
5
6
knife encrypt create [VAULT] [ITEM] [VALUES] --mode MODE --search SEARCH --admins ADMINS --json FILE
knife encrypt delete [VAULT] [ITEM] --mode MODE
knife encrypt remove [VAULT] [ITEM] [VALUES] --mode MODE --search SEARCH --admins ADMINS
knife rotate secret [VAULT] [ITEM] --mode MODE
knife encrypt update [VAULT] [ITEM] [VALUES] --mode MODE --search SEARCH --admins ADMINS --json FILE
knife decrypt [VAULT] [ITEM] [VALUES] --mode MODE

The README and Examples document these quite well.

Mode: Solo vs Client

I’m using Chef with a Chef Server (Enterprise Chef), so I’ll specify --mode client for the knife commands.

It is important to note the MODE in the chef-vault knife plugin commands affects where the encrypted data bags will be saved. Chef supports data bags with both Solo and Client/Server use. When using chef-solo, you’ll need to configure data_bag_path in your knife.rb. That is, even if you’re using Solo, since these are knife plugins, the configuration is for knife, not chef-solo. I’m using a Chef Server though, so I’m going to use --mode client.

Create a User with a Password

The user I’m going to create is the arbitrarily named vaultuser, with the super secret password, chef-vault. I’m going to use this on a Linux system with SHA512 hashing, so first I generate a password using mkpasswd:

1
2
3
% mkpasswd -m sha-512
Password: chef-vault
$6$VqEIDjsp$7NtPMhA9cnxvSMTE9l7DMmydJJEymi9b4t1Vhk475vrWlfxMgVb3bDLhpk/RZt0J3X7l5H8WnqFgvq3dIa9Kt/

Note: This is the mkpasswd(1) command from the Ubuntu 10.04 mkpasswd package.

Create the Item

The command I’m going to use is knife encrypt create since this is a new secret. I’ll show two examples. First, I’ll pass in the raw JSON data as “values”. You would do this if you’re not going to store the unencrypted secret on disk or in a repository. Second, I’ll pass a JSON file. You would do this if you want to store the unencrypted secret on disk or in a repository.

1
2
3
4
% knife encrypt create secrets vaultuser \
  '{"vaultuser":"$6$VqEIDjsp$7NtPMhA9cnxvSMTE9l7DMmydJJEymi9b4t1Vhk475vrWlfxMgVb3bDLhpk/RZt0J3X7l5H8WnqFgvq3dIa9Kt/"}' \
  --search 'role:base' \
  --admins jtimberman --mode client

The [VALUES] in this command is raw JSON that will be created in the data bag item by chef-vault. The --search option tells chef-vault to use the public keys of the nodes matching the SOLR query for encrypting the value. Then during the Chef run, chef-vault uses those node’s private keys to decrypt the value. The --admins option tells chef-vault the list of users on the Chef Server who are also allowed to decrypt the secret. This is specified as a comma separated string for multiple admins. Finally, as I mentioned, I’m using a Chef Server so I need to specify --mode client, since “solo” is the default.

Here’s the equivalent, using a JSON file named secrets_vaultuser.json. It has the content:

1
{"vaultuser":"$6$VqEIDjsp$7NtPMhA9cnxvSMTE9l7DMmydJJEymi9b4t1Vhk475vrWlfxMgVb3bDLhpk/RZt0J3X7l5H8WnqFgvq3dIa9Kt/"}

The command is:

1
2
3
4
% knife encrypt create secrets vaultuser \
  --json secrets_vaultuser.json
  --search 'role:base' \
  --admins jtimberman --mode client

Now, let’s see what has been created on the Chef Server. I’ll use the core Chef knife plugin, data bag item show for this.

1
2
3
% knife data bag show secrets
vaultuser
vaultuser_keys

I now have a “secrets” data bag, with two items. The first, vaultuser is the one that contains the actual secret. Let’s see:

1
2
3
4
5
6
7
8
9
% knife data bag show secrets vaultuser
id:        vaultuser
vaultuser:
  cipher:         aes-256-cbc
  encrypted_data: j+/fFM7ist6I7K360GNfzSgu6ix63HGyXN2ZAd99R6H4TAJ4pQKuFNpJXYnC
  SXA5n68xn9frxHAJNcLuDXCkEv+F/MnW9vMlTaiuwW/jO++vS5mIxWU170mR
  EgeB7gvPH7lfUdJFURNGQzdiTSSFua9E06kAu9dcrT83PpoQQzk=
  iv:             cu2Ugw+RpTDVRu1QaaAfug==
  version:        1

As you can see, I have encrypted data. I also told chef-vault that my user can decrypt this. I need to use the knife plugin to do so:

1
2
3
% knife decrypt secrets vaultuser 'vaultuser' --mode client
secrets/vaultuser
  vaultuser: $6$VqEIDjsp$7NtPMhA9cnxvSMTE9l7DMmydJJEymi9b4t1Vhk475vrWlfxMgVb3bDLhpk/RZt0J3X7l5H8WnqFgvq3dIa9Kt/

The 'vaultuser' in quotes is the key from the hash of JSON data that I specified earlier. As you can see, the password is that which was generated from the mkpasswd command earlier.

But what nodes have access to decrypt this password? That’s what chef-vault stored in the vaultuser_keys item. Let’s look:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
% knife data bag show secrets vaultuser_keys
admins:              jtimberman
clients:
  os-945926465950316
  os-2790002246935003
id:                  vaultuser_keys
jtimberman:          0Q2bhw/kJl2aIVEwqY6wYhrrfdz9fdsf8tCiIrBih2ZORvV7EEIpzzKQggRX
4P4vnVQjMjfkRwIXndTzctCJONQYF50OSZi5ByXWqbich9iCWvVIbnhcLWSp
z5mQoSTNXyZz/JQZGnubkckh4wGLBFDrLJ6WKl6UNXH1dRwqDNo5sEK7/3Wn
b4ztVSRxzB01wVli0wLvFSZzGsKYJYINBcidnbIgLh/xGYGtBJVlgG2z/7TV
uN0b/qvGj8VlhbS6zPlwh39O3mexDdkLwry/+gbO1nj8qKNkKDKaix5zypwE
XdmdfMKNYGaM6kzG8cwuKZXLAgGAgblVUB1HP8+8kQ==

os-2790002246935003: kGQLsxsFmBe9uPuWxZpKiNBnqJq55hQZJLgaKdjG2Vvivv98RrFGz1y8Xbwe
uzeSgPgAURCZmxpNxpHrwvvKcvL77sBOL6TTKiNzs8n5B3ZOawy17dsuG24v
41R0cRMnYLgbLcjln9dpVe4Esr4goPxko+1XqBPik1SBapthQq/pLUJ1BIKh
Fxu1QVGj1w4HPUftLaUzeS33jKbtfvgZyZsYZBdVCVEVidOxC90WRf4wtkd6
Ueyj+0gd1QKv84Q387O1R5LtRMS6u+17PJinrcRIkVNZ6P1z6oT2Dasfvrex
rK3s5vD7v6jpkUW12Wj74Lz3Z6x3sKuIDzCtvEUnWw==

os-945926465950316:  XzTJrJ3TZZZ1u9L9p6DZledf3bo2ToH2yrLGZQKPV6/ANzElHXGcYrEdtP0q
14Nz1NzsqEftzviAebUUnc6ke91ltD8s6hNQQrPJRqkUoDlM7lNEwiUiz/dD
+sFI6CSzQptO3zPrUbAlUI1Zog5h7k/CCtiYtmFRD6wbAWnxmCqvLhO1jwqL
VNJ1vfjlFsG77BDm2HFw7jgleuxRGYEgBfCCuBuW70FAdUTvNHIAwKQVkfU/
Am75UYm7N4N0E+W76ZwojLoYtXXTV/iOGG1cw3C75SVAmCsBOuxUK/otub67
zsNDsKToKa+laxzXGylrmkTricYXIqVpIQO8OL5nnw==

As we can see, I have two nodes that are API clients with access to decrypt the data bag items. These values are all generated by chef-vault, and I’ll talk about how to update the list and rotate secrets later in this post.

Manage a User Password

Let’s manage a user resource with a password set to the value from our encrypted data bag using Chef Vault.

First, I created a cookbook named vault, and added it to the base role. It contains the following recipe:

1
2
3
4
5
6
7
8
9
10
11
12
chef_gem "chef-vault"
require "chef-vault"

vault = ChefVault::Item.load("secrets", "vaultuser")

user "vaultuser" do
  password vault['vaultuser']
  home "/home/vaultuser"
  supports :manage_home => true
  shell "/bin/bash"
  comment "Chef Vault User"
end

Let me break this down.

1
2
chef_gem "chef-vault"
require "chef-vault"

chef-vault is distributed as a RubyGem, and I want to use it in my recipe(s), so here I use the chef_gem resource. Then, I require it like any other Ruby library.

1
vault = ChefVault::Item.load("secrets", "vaultuser")

This is where the decryption happens. If I do this under a chef-shell, I can see:

1
2
chef:recipe > vault = ChefVault::Item.load("secrets", "vaultuser")
 => data_bag_item["secrets", "vaultuser", {"id"=>"vaultuser", "vaultuser"=>"$6$VqEIDjsp$7NtPMhA9cnxvSMTE9l7DMmydJJEymi9b4t1Vhk475vrWlfxMgVb3bDLhpk/RZt0J3X7l5H8WnqFgvq3dIa9Kt/"}]

ChefVault::Item.load takes two arguments, the “vault” or data bag, in this case secrets, and the “item”, in this case vaultuser. It returns a data bag item. Then in the user resource, I use the password:

1
2
3
4
5
6
7
user "vaultuser" do
  password vault['vaultuser']
  home "/home/vaultuser"
  supports :manage_home => true
  shell "/bin/bash"
  comment "Chef Vault User"
end

The important resource attribute here is password, where I’m using the local variable, vault and the vaultuser key from the item as decrypted by ChefVault::Item.load. When Chef runs, it will look like this:

1
2
3
4
5
6
Recipe: vault::default
  * chef_gem[chef-vault] action install
    - install version 2.0.1 of package chef-vault
  * chef_gem[chef-vault] action install (up to date)
  * user[vaultuser] action create
    - create user user[vaultuser]

Now, I can su to vaultuser using the password I created:

1
2
3
4
5
6
ubuntu@-2790002246935003:~$ su - vaultuser
Password: chef-vault
vaultuser@os-2790002246935003:~$ id
uid=1001(vaultuser) gid=1001(vaultuser) groups=1001(vaultuser)
vaultuser@os-2790002246935003:~$ pwd
/home/vaultuser

Yay! To show that the user was created with the right password, here’s the DEBUG log output:

1
2
3
4
5
6
INFO: Processing user[vaultuser] action create ((irb#1) line 12)
DEBUG: user[vaultuser] user does not exist
DEBUG: user[vaultuser] setting comment to Chef Vault User
DEBUG: user[vaultuser] setting password to $6$VqEIDjsp$7NtPMhA9cnxvSMTE9l7DMmydJJEymi9b4t1Vhk475vrWlfxMgVb3bDLhpk/RZt0J3X7l5H8WnqFgvq3dIa9Kt/
DEBUG: user[vaultuser] setting shell to /bin/bash
INFO: user[vaultuser] created

Next, I’ll create a secret that is a file rendered on the system.

Create a Private SSH Key

Suppose this vaultuser is to be used for deploying code by cloning a repository. It will need a private SSH key to authenticate, so I’ll create one, with an empty passphrase in this case.

1
2
3
4
5
6
% ssh-keygen -b 4096 -t rsa -f vaultuser-ssh
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in vaultuser-ssh.
Your public key has been saved in vaultuser-ssh.pub.

Get the SHA256 checksum of the private key. I use SHA256 because that’s what Chef uses for file content. We’ll use this to verify content later.

1
2
% sha256sum vaultuser-ssh
a83221c243c9d39d20761e87db6c781ed0729b8ff4c3b330214ebca26e2ea89d  vaultuser-ssh

Assume that I also created the SSH key on GitHub for this user.

In order to have a file’s contents be a JSON value for the data bag item, I’ll remove the newlines (\n), and generate the JSON:

1
2
ruby -rjson -e 'puts JSON.generate({"vaultuser-ssh-private" => File.read("vaultuser-ssh")})' \
  > secrets_vaultuser-ssh-private.json

Now, create the secret on the Chef Server:

1
2
3
4
5
knife encrypt create secrets vaultuser-ssh-private \
  --search 'role:base' \
  --json secrets_vaultuser-ssh-private.json \
  --admins jtimberman \
  --mode client

Let’s verify the server has what we need:

1
2
3
4
5
6
7
8
9
10
% knife data bag show secrets vaultuser-ssh-private
id:                    vaultuser-ssh-private
vaultuser-ssh-private:
  cipher:         aes-256-cbc
  encrypted_data: mRRToM2N/0F+OyJxkYlHo/cUtHSIuy69ROAKuGoHIhX9Fr5vFTCM4RyWQSTN
  trimmed for brevity even though scrollbars
% knife decrypt secrets vaultuser-ssh-private 'vaultuser-ssh-private' --mode client
secrets/vaultuser-ssh-private
  vaultuser-ssh-private: -----BEGIN RSA PRIVATE KEY-----
trimmed for brevity even though scrollbars

Manage the Key File

Now, I’ll manage the private key file with the vault cookbook.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
vault_ssh = ChefVault::Item.load("secrets", "vaultuser-ssh-private")

directory "/home/vaultuser/.ssh" do
  owner "vaultuser"
  group "vaultuser"
  mode 0700
end

file "/home/vaultuser/.ssh/id_rsa" do
  content vault_ssh["vaultuser-ssh-private"]
  owner "vaultuser"
  group "vaultuser"
  mode 0600
end

Again, let’s break this up a bit. First, load the item from the encrypted data bag like we did before.

1
vault_ssh = ChefVault::Item.load("secrets", "vaultuser-ssh-private")

Next, make sure that the vaultuser has an .ssh directory with the correct permissions.

1
2
3
4
5
directory "/home/vaultuser/.ssh" do
  owner "vaultuser"
  group "vaultuser"
  mode 0700
end

Finally, manage the content of the private key file with a file resource and the content resource attribute. The value of vault_ssh["vaultuser-ssh-private"] will be a string, with \n’s embedded, but when it’s rendered on disk, it will display properly.

1
2
3
4
5
6
file "/home/vaultuser/.ssh/id_rsa" do
  content vault_ssh["vaultuser-ssh-private"]
  owner "vaultuser"
  group "vaultuser"
  mode 0600
end

And now run chef on a target node:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Recipe: vault::default
  * chef_gem[chef-vault] action install (up to date)
  * user[vaultuser] action create (up to date)
  * directory[/home/vaultuser/.ssh] action create
    - create new directory /home/vaultuser/.ssh
    - change mode from '' to '0700'
    - change owner from '' to 'vaultuser'
    - change group from '' to 'vaultuser'

  * file[/home/vaultuser/.ssh/id_rsa] action create
    - create new file /home/vaultuser/.ssh/id_rsa with content checksum a83221
        --- /tmp/chef-tempfile20130909-1918-1v5hezo   2013-09-09 22:41:21.887239999 +0000
        +++ /tmp/chef-diff20130909-1918-xwbmsn    2013-09-09 22:41:21.883240065 +0000
        @@ -0,0 +1,51 @@
        +-----BEGIN RSA PRIVATE KEY-----
        +MIIJJwIBAAKCAgEAtZmwFTlVOBbr2ZfG+cDtUGx04xCcgaa0p0ISmeyMEoGYH/CP
        output trimmed because its long even though scrollbars again

Note the content checksum, a83221. This will match the checksum of the source file from earlier (scroll up!), and the one rendered:

1
2
ubuntu@os-2790002246935003:~$ sudo sha256sum /home/vaultuser/.ssh/id_rsa
a83221c243c9d39d20761e87db6c781ed0729b8ff4c3b330214ebca26e2ea89d  /home/vaultuser/.ssh/id_rsa

Yay! Now, we can SSH to GitHub (note, this is fake GitHub for example purposes).

1
2
3
4
5
6
7
ubuntu@os-2790002246935003:~$ su - vaultuser
Password: chef-vault
vaultuser@os-2790002246935003:~$ ssh -i .ssh/id_rsa github@172.31.7.15
$ hostname
os-945926465950316
$ id
uid=1002(github) gid=1002(github) groups=1002(github)

Updating a Secret

What happens if we need to update a secret? For example, if an administrator leaves the organization, we will want to change the vaultuser password (and SSH private key).

1
2
3
% mkpasswd -m sha-512
Password: gone-user
$6$zM5STNtXdmsrOSm$svJr0tauijqqxTjnMIGJGJPv5V3ovMFCQo.ZDBleiL.yOxcngRqh9yAjpMAsMBA7RlKPv5DKFd1aPZm/wUoKs.

The encrypt create command will return an error if the target already exists:

1
2
% knife encrypt create secrets vaultuser --search 'role:base' --json secrets_vaultuser.json --admins jtimberman --mode client
ERROR: ChefVault::Exceptions::ItemAlreadyExists: secrets/vaultuser already exists, use 'knife encrypt remove' and 'knife encrypt update' to make changes.

So, I need to use encrypt update. Note make sure that the contents of the JSON file are valid JSON.

1
% knife encrypt update secrets vaultuser --search 'role:base' --json secrets_vaultuser.json --admins jtimberman --mode client

encrypt update only updates the things that change, so I can also shorten this:

1
% knife encrypt update secrets vaultuser --json secrets_vaultuser.json --mode client

Since the search and the admins didn’t change.

Verify it:

1
2
3
% knife decrypt secrets vaultuser 'vaultuser' --mode client
secrets/vaultuser
  vaultuser: $6$zM5STNtXdmsrOSm$svJr0tauijqqxTjnMIGJGJPv5V3ovMFCQo.ZDBleiL.yOxcngRqh9yAjpMAsMBA7RlKPv5DKFd1aPZm/wUoKs.

Now, just run Chef on any nodes affected.

1
2
3
4
5
6
7
8
Recipe: vault::default
  * chef_gem[chef-vault] action install (up to date)
  * user[vaultuser] action create
    - alter user user[vaultuser]

  * directory[/home/vaultuser/.ssh] action create (up to date)
  * file[/home/vaultuser/.ssh/id_rsa] action create (up to date)
Chef Client finished, 1 resources updated

And su to the vault user with the gone-user password:

1
2
3
ubuntu@os-2790002246935003:~$ su - vaultuser
Password: gone-user
vaultuser@os-2790002246935003:~$

Managing Access to Items

There are three common scenarios which require managing the access to an item in the vault.

  1. A system needs to be taken offline, or otherwise prevented from accessing the item(s).
  2. A new system comes online that needs access.
  3. An admin user has left the organization.
  4. A new admin user has joined the organization.

Suppose we have a system that we need to take offline for some reason, so we want to disable its access to a secret. Or, perhaps we have a user who has left the organization that was an admin. We can do that in a few ways.

Update the Vault Item

The most straightforward way to manage access to an item is to use the update or remove sub-commands.

Remove a System

Suppose I want to remove node DEADNODE, I can qualify the search to exclude the node named DEADNODE:

1
2
3
4
% knife encrypt update secrets vaultuser \
  --search 'role:base NOT name:DEADNODE' \
  --json secrets_vaultuser.json \
  --admins jtimberman --mode client

Note, as before, admins didn’t change so I don’t need to pass that argument.

Add a New System

If the node has run Chef and is indexed on the Chef Server already, simply rerun the update command with the search:

1
2
3
4
% knife encrypt update secrets vaultuser \
  --search 'role:base' \
  --json secrets_vaultuser.json \
  --admins jtimberman --mode client

There’s a bit of a “Chicken and Egg” problem here, in that a new node might not be indexed for search if it tried to load the secret during a bootstrap beforehand. For example, if I create an OpenStack instance with the base role in its run list, the node doesn’t exist for the search yet. A solution here is to create the node with an empty run list, allowing it to register with the Chef Server, and then use knife bootstrap to rerun Chef with the proper run list. This is annoying, but no one claimed that chef-vault would solve all problems with shared secret management :–).

Remove an Admin

The admins argument takes a list. Earlier, I only had my userid as an admin. Suppose I created the item with “bofh” as an admin too:

1
2
3
4
% knife encrypt create secrets vaultuser \
  --search 'role:base' \
  --json secrets_vaultuser.json \
  --admins "jtimberman,bofh" --mode client

To remove the bofh user, use the encrypt remove subcommand. In this case, the --admins argument is the list of admins to remove, rather than add.

1
% knife encrypt remove secrets vaultuser --admins bofh --mode client

Add a New Admin

I want to add “mandi” as an administrator because she’s awesome and will help manage our secrets. As above, I just pass a comma-separated string, "jtimberman,mandi" to the --admins argument.

1
2
3
4
% knife encrypt update secrets vaultuser \
  --search 'role:base' \
  --json secrets_vaultuser.json \
  --admins "jtimberman,mandi" --mode client

Regenerate the Client

The heavyhanded way to remove access is to regenerate the API client on the Chef Server. For example, of my nodes, say I want to remove os-945926465950316:

1
2
3
4
% knife client reregister os-945926465950316
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAybzwv53tDLIzW+GHRJwLthZmiGTfZVyqQX6m6RGuZjemEIdy
trim trim

If you’re familiar with Chef Server’s authentication cycle, you’ll know that until that private key is copied to the node, it will completely fail to authenticate. However, once the /etc/chef/client.pem file is updated with the content from the knife command, we’ll see that the node fails to read the Chef Vault item:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
================================================================================
Recipe Compile Error in /var/chef/cache/cookbooks/vault/recipes/default.rb
================================================================================


OpenSSL::PKey::RSAError
-----------------------
padding check failed


Cookbook Trace:
---------------
  /var/chef/cache/cookbooks/vault/recipes/default.rb:4:in `from_file'


Relevant File Content:
----------------------
/var/chef/cache/cookbooks/vault/recipes/default.rb:

  1:  chef_gem "chef-vault"
  2:  require "chef-vault"
  3:
  4>> vault = ChefVault::Item.load("secrets", "vaultuser")
  5:
  6:  user "vaultuser" do
  7:    password vault["vaultuser"]
  8:    home "/home/vaultuser"
  9:    supports :manage_home => true
 10:    shell "/bin/bash"
 11:    comment "Chef Vault User"
 12:  end
 13:

Note I say this is heavy-handed because if you make a mistake, you need to re-upload every single secret that this node needs access to.

Removing Users

We can also remove user access from Enterprise Chef simply by disassociating that user from the organization on the Chef Server. I won’t show an example of that here, since I’m using Opscode’s hosted Enterprise Chef server and I’m the only admin, however :–).

Backing Up Secrets

To back up the secrets, as encrypted data from the Chef Server, use knife-essentials (comes with Chef 11+, available as a RubyGem for Chef 10).

1
2
3
4
5
% knife download data_bags/secrets/
Created data_bags/secrets/vaultuser_keys.json
Created data_bags/secrets/vaultuser.json
Created data_bags/secrets/vaultuser-ssh-private_keys.json
Created data_bags/secrets/vaultuser-ssh-private.json

For example, the vaultuser.json file looks like this:

1
2
3
4
5
6
7
8
9
{
  "id": "vaultuser",
  "vaultuser": {
    "encrypted_data": "3yREwInxdyKpf8nuTIivXAeuEzHt7o4vF4FsOwmVLHmMWol5nCBoMWF0YdaW\n3P3NpEAAAxYEYeJYdVkrdLqjjB2kTJdx0+ceh/RBHBWqmSeHOWFH9pCRGjV8\nfS5XaTueShb320b/+Ia8iqUJJWg6utnbJCDx+VMcGNggPXgPKC8=\n",
    "iv": "EI+y74Uj2uwq7EVaP+0K6Q==\n",
    "version": 1,
    "cipher": "aes-256-cbc"
  }
}

Since these are encrypted using a strong cipher (AES 256), they should be safe to store in repository. Unless you think the NSA has access to that repository ;–).

Conclusion

Secrets management is hard! Especially when you need to store secrets that are used by multiple systems, services, and people. Chef’s encrypted data bag feature isn’t a panacea, but it certainly helps. Hopefully, this blog post was informative. While I don’t always respond, I do read all comments posted here via Disqus, so let me know if something is out of whack, or needs an update.

Getting Started With Zones on OmniOS

I’ve become enamored with IllumOS recently. Years ago, I used Solaris (2.5.1 through 8) at IBM. Unfortunately (for me), I stopped using it before Solaris 10 brought all the cool toys to the yard – zones, zfs, dtrace, SMF. Thanks to OmniTI’s excellent IllumOS distribution, OmniOS, I’m getting acclimated with the awesomeness. I plan to write more about my experiences here.

First up, I spent today playing with zones. Zones are a kernel-level container technology similar to Linux containers/cgroups, or BSD jails. They’re fast and lightweight. At least two of the plans I have for them:

  1. Segregating the services on my home-server.
  2. Adding support to various tools in Chef’s ecosystem.

The following is basically a compilation of several different blog posts and documentation collections I’ve been poring over. Like most technical blog writers, I’m posting this so I can find it later :–).

Hardware

I have a number of options for learning OmniOS. I have spare hardware, or VMware, or OmniTI’s Vagrant box. I’m doing all three of these, but the main use will be on physical hardware, as I’m planning to port the aforementioned server to OmniOS (#1, above).

The details of the hardware are not important, except that I have a hard disk device c3t1d0, and a physical NIC device nge1 that are devoted to zones. To adapt these instructions for your own installation, change those device names where appropriate.

You can find the name of the disk device to use in your system with the format command.

root@menthe:~# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c3t0d0 <ATA-WDCWD1500AHFD-0-7QR5 cyl 18238 alt 2 hd 255 sec 63>
          /pci@0,0/pci1043,cb84@d/disk@0,0
       1. c3t1d0 <ATA-SAMSUNG HD501LJ-0-12-465.76GB>
          /pci@0,0/pci1043,cb84@d/disk@1,0
Specify disk (enter its number): ^D

Here I wanted to use the Samsung disk.

Use dladm to find the network devices:

root@menthe:~# dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
nge0         Ethernet             up         1000   full      nge0
nge1         Ethernet             up         1000   full      nge1

Setup

The example zone here is named base. Replace base with any zone name you wish, e.g. webserver37 or noodlebarn. It’s also worth noting that I’m going to use DHCP, rather than static networking here. There are plenty of guides out there for static networking, and I had to hunt around for DHCP. Also worth noting is that this was all performed right after installing the OS.

First, create a zpool to use for zones. This is a 500G disk, so I have plenty of space.

zpool create zones c3t1d0

Next, create a VNIC on the interface which is devoted to zones (nge1). It can be named anything, but must end with a number.

dladm create-vnic -l nge1 vnicbase0

Rather than use the zonecfg REPL, I used the following configuration file, for repeatability.

create -b
set zonepath=/zones/base
set ip-type=exclusive
set autoboot=false
add net
set physical=vnicbase0
end
commit

Use this config file to configure the zone with zonecfg.

zonecfg -z base -f base.conf

Now we’re ready to install the OS in the new zone. This may take awhile as all the packages need to be downloaded.

zoneadm -z base install

The default nsswitch.conf(4) does not use DNS for hosts. This is fairly standard for Solaris/IllumOS. Also, the resolv.conf(4) is not configured automatically, which is a departure from automagic Linux distributions (and a thing I agree with).

cp /etc/nsswitch.dns /etc/resolv.conf /zones/base/root/etc

OmniOS does not use sysidcfg, so the way to make the new zone boot up with an interface configured for DHCP is to write out the ipadm.conf configuration for ipadm. The following is base.ipadm.conf that I used, with the vnicbase0 VNIC created with dladm earlier.

_ifname=vnicbase0;_family=2;
_ifname=vnicbase0;_family=26;
_ifname=vnicbase0;_aobjname=vnicbase0/v4;_dhcp=-1,no;

Copy this file to the zone.

cp base.ipadm.conf /zones/base/root/etc/ipadm/ipadm.conf

Now, boot the zone.

zoneadm -z base boot

Now you can log into the newly created zone and verify that things are working, and do any further configuration required.

zlogin -e ! base

I use ! as the escape character because I’m logging into my global zone over SSH. This means you disconnect with !. instead of ~..

Once complete, the zone can be cloned.

Clone a Zone

I’m going to clone the base zone to clonebase. Again, rename this to whatever you like.

First, a zone must be halted before it can be cloned.

zoneadm -z base halt

Now, create a new VNIC for the zone.

dladm create-vnic -l nge1 clonebase

Read the base zone’s configuration, and replace base with clonebase.

zonecfg -z base export | sed 's/base/clonebase/g' | tee clonebase.conf

Then, create the new zone configuration, and clone the base zone.

zonecfg -z clonebase -f clonebase.conf
zoneadm -z clonebase clone base

Again, ensure that the network configuration to use DNS is available.

cp /etc/nsswitch.dns /etc/resolv.conf /zones/clonebase/root/etc

Create the ipadm.conf config for the new zone. I named it clonebase.ipadm.conf

sed 's/base/clonebase/g' base.ipadm.conf > clonebase.ipadm.conf

Now copy this to the zone.

cp clonebase.ipadm.conf /zones/clonebase/root/etc/ipadm/ipadm.conf

Finally, boot the new zone.

zoneadm -z clonebase boot

Login and verify the new zone.

zlogin -e ! clonebase

Cleaning Up

Use the following to clean up the zone when it’s not needed anymore.

zone=clonebase
zoneadm -z $zone halt
zoneadm -z $zone uninstall -F
zonecfg -z $zone delete -F

Sans Prose

This gist contains all the things I did above minus the prose.

What’s Next?

I have a few goals in mind for this system. First of all, I want to manage the zones with Chef, of course. Some of the functions of the zones may be:

  • IPS package repository
  • Omnibus build system for OmniOS
  • Adding OmniOS support to cookbooks

I also want to facilitate plugins and the ecosystem around Chef for IllumOS, including zone based knife, vagrant and test-kitchen plugins.

Finally, I plan to convert my Linux home-server to OmniOS. There are a couple things I’m running that will require Linux (namely Plex), but fortunately, OmniOS has KVM thanks to SmartOS.

References

The following links were helpful in composing this post, and of course for the reference material they contain.

Starting ChefSpec Example

This is a quick post to introduce what I’m starting on testing with ChefSpec. This is from Opscode’s Java cookbook. While the recipe tested is really trivial, it actually has some nuances that require detailed testing.

First off, the whole thing is in this gist. I’m going to break it down into sections below. The file is spec/default_spec.rb in the java cookbook (not committed/pushed yet).

The chefspec gem is where all the magic comes from. You can read about ChefSpec on its home page. You’ll need to install the gem, and from there, run rspec to run the tests.

1
require 'chefspec'

Next, we’re going to describe the default recipe. We’re using the regular rspec “let” block to set up the runner to converge the recipe. Then, because we know/assume that the openjdk recipe is the default, we can say that this chef run should include the java::openjdk recipe.

1
2
3
4
5
describe 'java::default' do
  let (:chef_run) { ChefSpec::ChefRunner.new.converge('java::default') }
  it 'should include the openjdk recipe by default' do
    chef_run.should include_recipe 'java::openjdk'
  end

Next, this cookbook supports Windows. However, we have to set up the runner with the correct platform and version (this comes from fauxhai), and then set attributes that are required for it to work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
context 'windows' do
    let(:chef_run) do
      runner = ChefSpec::ChefRunner.new(
        'platform' => 'windows',
        'version' => '2008R2'
        )
      runner.node.set['java']['install_flavor'] = 'windows'
      runner.node.set['java']['windows']['url'] = 'http://example.com/windows-java.msi'
      runner.converge('java::default')
    end
    it 'should include the windows recipe' do
      chef_run.should include_recipe 'java::windows'
    end
  end

Next are the contexts for other install flavors. The default recipe will include the right recipe based on the flavor, which is set by an attribute. So we set up an rspec context for each recipe, then set the install flavor attribute, and test that the right recipe was included.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
  context 'oracle' do
    let(:chef_run) do
      runner = ChefSpec::ChefRunner.new
      runner.node.set['java']['install_flavor'] = 'oracle'
      runner.converge('java::default')
    end
    it 'should include the oracle recipe' do
      chef_run.should include_recipe 'java::oracle'
    end
  end
  context 'oracle_i386' do
    let(:chef_run) do
      runner = ChefSpec::ChefRunner.new
      runner.node.set['java']['install_flavor'] = 'oracle_i386'
      runner.converge('java::default')
    end
    it 'should include the oracle_i386 recipe' do
      chef_run.should include_recipe 'java::oracle_i386'
    end
  end

Finally, a recent addition to this cookbook is support for IBM’s Java. In addition to setting the install flavor, we must set the URL where the IBM Java package is (see the README in the commit linked in that ticket for detail), and we can see that the ibm recipe is in fact included.

1
2
3
4
5
6
7
8
9
10
11
12
  context 'ibm' do
    let(:chef_run) do
      runner = ChefSpec::ChefRunner.new
      runner.node.set['java']['install_flavor'] = 'ibm'
      runner.node.set['java']['ibm']['url'] = 'http://example.com/ibm-java.bin'
      runner.converge('java::default')
    end
    it 'should include the ibm recipe' do
      chef_run.should include_recipe 'java::ibm'
    end
  end
end

This is just the start of the testing for this cookbook. We’ll need to test each individual recipe. However as I’ve not written that code yet, I don’t have examples. Stay tuned!

Test Kitchen and Jenkins

I’ve been working more with test-kitchen 1.0 alpha lately. The most recent thing I’ve done is set up a Jenkins build server to run test-kitchen on cookbooks. This post will describe how I did this for my own environment, and how you can use my new test-kitchen cookbook in yours… if you’re using Jenkins, anyway.

This is all powered by a relatively simple cookbook, and some click-click-clicking in the Jenkins UI. I’ll walk through what I did to set up my Jenkins system.

First, I started with Debian 7.0 (stable, released this past weekend). I installed the OS on it, and then bootstrapped with Chef. The initial test was to make sure everything installed correctly, and the commands were functioning. This was done in a VM, and is now handled by test-kitchen itself (how meta!) in the cookbook, kitchen-jenkins.

The cookbook, kitchen-jenkins is available on the Chef Community site. I started with a recipe, but extracted it to a cookbook to make it easier to share with you all. This is essentially a site cookbook that I use to customize my Jenkins installation so I can run test-kitchen builds.

I apply the recipe with a role, because I love the roles primitive in Chef :–). Here is the role I’m using:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "name": "jenkins",
  "description": "Jenkins Build Server",
  "run_list": [
    "recipe[kitchen-jenkins]"
  ],
  "default_attributes": {
    "jenkins": {
      "server": {
        "home": "/var/lib/jenkins",
        "plugins": ["git-client", "git"],
        "version": "1.511",
        "war_checksum": "7e676062231f6b80b60e53dc982eb89c36759bdd2da7f82ad8b35a002a36da9a"
      }
    }
  },
  "json_class": "Chef::Role",
  "chef_type": "role"
}

The run list is only slightly different here than my actual role, I have a few other things in the run list, which are other site-specific recipes. Don’t worry about those now. The jenkins attributes are set to ensure the right plugins I need are available, and the right version of jenkins is installed.

(I’m going to leave out the details such as uploading cookbooks and roles, if you’re interested in test-kitchen, I’ll assume you’ve got that covered :–).)

Once Chef completes on the Jenkins node, I can reach the Jenkins UI, conveniently enough, via “http://jenkins:8080” (because I’ve made a DNS entry, of course). The next release of the Jenkins cookbook will have a resource for managing jobs, but for now I’m just going to create them in the webui.

For this example, I want to have two kinds of cookbook testing jobs. The first, is to simply run foodcritic and fail on any correctness matches. Second, I want to actually run test-kitchen.

A foodcritic job is simple:

  1. New job –> Build a free-style software project “foodcritic-COOKBOOK”.
  2. Source Code Management –> Git, supply the repository and the master branch.
  3. Set a build trigger to Poll SCM every 5 minutes, once an hour, whenever you like.
  4. Add a build step to execute a shell, “foodcritic . -f correctness”

I created a view for foodcritic jobs, and added them all to the view for easy organizing.

Next, I create a test-kitchen job:

  1. New job –> Copy existing job “foodcritic-COOKBOOK”, name the new job “test-COOKBOOK”.
  2. Uncheck Poll SCM, check “Build after other projects are built” and enter “foodcritic-COOKBOOK”.
  3. Replace the foodcritic command in the build shell command with “kitchen test”.

Now, the test kitchen test will only run if the foodcritic build succeeds. If the cookbook has any correctness lint errors, then the foodcritic build fails, and the kitchen build won’t run. This will help conserve resources.

Hopefully the kitchen-jenkins cookbook is helpful and this blog post will give you some ideas how to go about adding cookbook tests to your CI system, even if it’s not Jenkins.

TDD Cookbook Ticket

This post will briefly describe how I did a TDD update to Opscode’s runit to resolve an issue reported last night.

First, the issue manifests itself only on Debian systems. The runit cookbook’s runit_service provider will write an LSB init.d script on Debian, rather than symlinking to /usr/bin/sv. The problem raised in the new ticket is that the template will follow the link and write to /usr/bin/sv. This is bad, as it will end up in a forkbomb as runsvdir attempts to restart sv on all the things. Oops! Sorry about that. Let’s get it fixed, and practice some TDD.

The runit cookbook includes support for test-kitchen, though I did need to update it for this effort. Part of this change was adding a box for Debian in the .kitchen.yml. I set about resolving this with TDD in mind.

First, the runit cookbook includes a couple “test” cookbooks to facilitate setting up the system with the runit_service resource so the outcome can be tested to ensure the behavior is correct. I started by adding a “failing test” in the runit_test::service recipe, meaning a link resource, and a runit_service resource that would overwrite /usr/bin/sv.

1
2
3
4
5
6
7
link "/etc/init.d/cook-2867" do
  to "/usr/bin/sv"
end

runit_service "cook-2867" do
  default_logger true
end

Then I ran kitchen test on the Debian box. As expected, the link was created, and then the runit service was configured. The service’s provider will wait until the service is up. Since we’ve destroyed the sv binary, that will never happen, so I destroyed it. I manually confirmed the behavior too, to make sure I wasn’t seeing something weird. Due to its very nature, this is really hard to test for automatically, but it will happen consistently.

Next, I had to write the code to implement the fix for this bug. Essentially, this means checking if the /etc/init.d/cook-2867 file is a symbolink link, and removing it.

1
2
initfile = ::File.join( '/etc', 'init.d', new_resource.service_name)
::File.unlink(initfile) if ::File.symlink?(initfile)

Simple enough. Next I tested again by destroying the existing environment and rerunning it from scratch. This takes some time, but it verifies that everything is working properly. Here’s the output on Debian:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
INFO: Processing link[/etc/init.d/cook-2867] action create (runit_test::service line 147)
INFO: link[/etc/init.d/cook-2867] created
INFO: Processing service[cook-2867] action nothing (dynamically defined)
INFO: Processing runit_service[cook-2867] action enable (runit_test::service line 151)
INFO: Processing directory[/etc/sv/cook-2867] action create (dynamically defined)
INFO: Processing template[/etc/sv/cook-2867/run] action create (dynamically defined)
INFO: Processing directory[/etc/sv/cook-2867/log] action create (dynamically defined)
INFO: Processing directory[/etc/sv/cook-2867/log/main] action create (dynamically defined)
INFO: Processing directory[/var/log/cook-2867] action create (dynamically defined)
INFO: Processing file[/etc/sv/cook-2867/log/run] action create (dynamically defined)
INFO: Processing template[/etc/init.d/cook-2867] action create (dynamically defined)
INFO: template[/etc/init.d/cook-2867] updated content
INFO: template[/etc/init.d/cook-2867] owner changed to 0
INFO: template[/etc/init.d/cook-2867] group changed to 0
INFO: template[/etc/init.d/cook-2867] mode changed to 755
INFO: runit_service[cook-2867] configured
INFO: Chef Run complete in 7.267132764 seconds
INFO: Running report handlers

I didn’t feel I needed a specific test for this in minitest-chef, because it wouldn’t have finished converging (earlier behavior I saw in the “failing” test).

If you’re contributing to cookbooks, and they have support for test-kitchen, it’s awesome if you can open a bug report with a failing test. In this case, it was fairly easy to reproduce the bug.

Anatomy of a Test Kitchen 1.0 Cookbook (Part 2)

DISCLAIMER Test Kitchen 1.0 is still in alpha at the time of this post.

Update We’re no longer required to use bundler, and in fact recommend installing the required RubyGems in your globalRuby environment (#3 below).

Update The log output from the various kitchen commands is not updated with the latest and greatest. Play along at home, it’ll be okay :–).

This is a continuation from part 1

In order to run the tests then, we need a few things on our machine:

  1. VirtualBox and Vagrant (1.1+)
  2. A compiler toolchain with XML/XSLT development headers (for building Gem dependencies)
  3. A sane, working Ruby environment (Ruby 1.9.3 or greater)
  4. Git

It is outside the scope of this post to cover how to get all those installed.

Once those are installed:

1
2
3
4
% vagrant plugin install vagrant-berkshelf
% gem install berkshelf
% gem install test-kitchen --pre
% gem install kitchen-vagrant

Test Kitchen combines the suite (default) with the platform names (e.g., ubuntu-12.04). To run all the suites on all platforms, simply do:

1
% kitchen test

This will take awhile, especially if you don’t already have the Vagrant boxes on your system, as it will download each one. To make this faster, we’ll just run Ubuntu 12.04:

1
% kitchen test default.*1204

Test Kitchen 1.0 can take a regular expression for the instances to test. This will match the box default-ubuntu-12.04. I could also just say 12 as that will match the single entry in my kitchen list (above).

It will take a few minutes to run Test Kitchen. Those familiar with Chef know that if it encounters an unhandled exception, it exits with a non-zero return code. This is important, because we know at the end of a successful run, Chef did the right thing, assuming our recipe is the right thing :–).

To recap the previous post, we have a run list like this:

1
["recipe[apt]", "recipe[minitest-handler]", "recipe[bluepill_test]"]

Let’s break down the output of our successful run. I’ll show the output first, and explain it after:

1
2
3
4
5
6
Starting Kitchen
Cleaning up any prior instances of <default-ubuntu-1204>
Destroying <default-ubuntu-1204>
Finished destroying <default-ubuntu-1204> (0m0.00s).
Testing <default-ubuntu-1204>
Creating <default-ubuntu-1204>

This is basic setup to ensure that “The Kitchen” is clean beforehand and we don’t have existing state interfering with the run.

1
2
3
4
5
6
[vagrant command] BEGIN (vagrant up default-ubuntu-1204 --no-provision)
[default-ubuntu-1204] Importing base box 'canonical-ubuntu-12.04'...
[default-ubuntu-1204] Matching MAC address for NAT networking...
[default-ubuntu-1204] Clearing any previously set forwarded ports...
[default-ubuntu-1204] Forwarding ports...
[default-ubuntu-1204] -- 22 => 2222 (adapter 1)

This will look familiar to Vagrant users, we’re just getting some basic setup from Vagrant initializing the box defined in the .kitchen.yml (passed to the Vagrantfile by the kitchen-vagrant plugin). This step does a vagrant up --no-provision.

1
2
3
4
5
6
7
8
[Berkshelf] installing cookbooks...
[Berkshelf] Using bluepill (2.2.2) at path: '/Users/jtimberman/Development/opscode/cookbooks/bluepill'
[Berkshelf] Using apt (1.8.4)
[Berkshelf] Using yum (2.0.0)
[Berkshelf] Using minitest-handler (0.1.2)
[Berkshelf] Using bluepill_test (0.0.1) at path: './test/cookbooks/bluepill_test'
[Berkshelf] Using rsyslog (1.5.0)
[Berkshelf] Using chef_handler (1.1.0)

Remember from the previous post that we’re using Berkshelf? This is the integration with Vagrant that ensures that the cookbooks are available. The first four, apt, yum, minitest-handler and bluepill_test are defined in the Berksfile. The next, rsyslog is a dependency of the bluepill cookbook (for rsyslog integration), and the last, chef_handler is a dependency of minitest-handler. Berkshelf extracts the dependencies from the cookbook metadata of each cookbook defined in the Berksfile.

1
2
3
4
5
6
7
8
9
10
11
12
13
[default-ubuntu-1204] Creating shared folders metadata...
[default-ubuntu-1204] Clearing any previously set network interfaces...
[default-ubuntu-1204] Running any VM customizations...
[default-ubuntu-1204] Booting VM...
[default-ubuntu-1204] Waiting for VM to boot. This can take a few minutes.
[default-ubuntu-1204] VM booted and ready for use!
[default-ubuntu-1204] Setting host name...
[default-ubuntu-1204] Mounting shared folders...
[default-ubuntu-1204] -- v-root: /vagrant
[default-ubuntu-1204] -- v-csc-1: /tmp/vagrant-chef-1/chef-solo-1/cookbooks
[vagrant command] END (0m48.76s)
Vagrant instance <default-ubuntu-1204> created.
Finished creating <default-ubuntu-1204> (0m53.12s).

Again, this is familiar output to Vagrant users, where Vagrant is making the cookbooks available to the instance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Converging <default-ubuntu-1204>
[vagrant command] BEGIN (vagrant ssh default-ubuntu-1204 --command 'should_update_chef() {\n...')
Installing Chef Omnibus (11.4.0)
Downloading Chef 11.4.0 for ubuntu...
Installing Chef 11.4.0
Selecting previously unselected package chef.
g database ...        60513 files and directories currently installed.)
Unpacking chef (from .../chef_11.4.0_amd64.deb) ...
Setting up chef (11.4.0-1.ubuntu.11.04) ...
Thank you for installing Chef!
[vagrant command] END (0m34.85s)
[vagrant command] BEGIN (vagrant provision default-ubuntu-1204)
[Berkshelf] installing cookbooks...
[Berkshelf] Using bluepill (2.2.2) at path: '/Users/jtimberman/Development/opscode/cookbooks/bluepill'
[Berkshelf] Using apt (1.8.4)
[Berkshelf] Using yum (2.0.0)
[Berkshelf] Using minitest-handler (0.1.2)
[Berkshelf] Using bluepill_test (0.0.1) at path: './test/cookbooks/bluepill_test'
[Berkshelf] Using rsyslog (1.5.0)
[Berkshelf] Using chef_handler (1.1.0)

This part is interesting, in that we’re going to install the Full Stack Chef (Omnibus) package. This means it doesn’t matter what the underlying base box has installed, we get the right version of Chef. This is defined in the .kitchen.yml. This is done through vagrant ssh (second line). Then, Test Kitchen does vagrant provision. The provisioning step is where Berkshelf happens, so we do see this happen again (perhaps a bug?).

1
2
3
4
5
6
7
8
[default-ubuntu-1204] Running provisioner: Vagrant::Provisioners::ChefSolo...
[default-ubuntu-1204] Generating chef JSON and uploading...
[default-ubuntu-1204] Running chef-solo...
INFO: *** Chef 11.4.0 ***
INFO: Setting the run_list to ["recipe[apt]", "recipe[minitest-handler]", "recipe[bluepill_test]"] from JSON
INFO: Run List is [recipe[apt], recipe[minitest-handler], recipe[bluepill_test]]
INFO: Run List expands to [apt, minitest-handler, bluepill_test]
INFO: Starting Chef Run for default-ubuntu-1204.vagrantup.com

This is the start of the actual Chef run, using Chef Solo by Vagrant’s provisioner. Note that we have our suite’s run list. I’m going to skip a lot of the Chef output because it isn’t required. Note that a few resources in the minitest—handler will report as failed, but they can be ignored because it means that those tests were simply not implemented.

1
2
3
4
5
6
7
8
9
INFO: Processing directory[/var/chef/minitest/bluepill_test] action create (minitest-handler::default line 50)
INFO: directory[/var/chef/minitest/bluepill_test] created directory /var/chef/minitest/bluepill_test
INFO: Processing cookbook_file[tests-bluepill_test-default] action create (minitest-handler::default line 53)
INFO: cookbook_file[tests-bluepill_test-default] created file /var/chef/minitest/bluepill_test/default_test.rb
INFO: Processing remote_directory[tests-support-bluepill_test-default] action create (minitest-handler::default line 60)
INFO: remote_directory[tests-support-bluepill_test-default] created directory /var/chef/minitest/bluepill_test/support
INFO: Processing cookbook_file[/var/chef/minitest/bluepill_test/support/helpers.rb] action create (dynamically defined)
INFO: cookbook_file[/var/chef/minitest/bluepill_test/support/helpers.rb] mode changed to 644
INFO: cookbook_file[/var/chef/minitest/bluepill_test/support/helpers.rb] created file /var/chef/minitest/bluepill_test/support/helpers.rb

These are the relevant parts of the minitest-handler recipe, where it has copied the tests from the bluepill_test cookbook into place.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
INFO: Processing gem_package[i18n] action install (bluepill::default line 20)
INFO: Processing gem_package[bluepill] action install (bluepill::default line 24)
INFO: Processing directory[/etc/bluepill] action create (bluepill::default line 34)
INFO: directory[/etc/bluepill] created directory /etc/bluepill
INFO: directory[/etc/bluepill] owner changed to 0
INFO: directory[/etc/bluepill] group changed to 0
INFO: Processing directory[/var/run/bluepill] action create (bluepill::default line 34)
INFO: directory[/var/run/bluepill] created directory /var/run/bluepill
INFO: directory[/var/run/bluepill] owner changed to 0
INFO: directory[/var/run/bluepill] group changed to 0
INFO: Processing directory[/var/lib/bluepill] action create (bluepill::default line 34)
INFO: directory[/var/lib/bluepill] created directory /var/lib/bluepill
INFO: directory[/var/lib/bluepill] owner changed to 0
INFO: directory[/var/lib/bluepill] group changed to 0
INFO: Processing file[/var/log/bluepill.log] action create_if_missing (bluepill::default line 41)
INFO: entered create
INFO: file[/var/log/bluepill.log] owner changed to 0
INFO: file[/var/log/bluepill.log] group changed to 0
INFO: file[/var/log/bluepill.log] mode changed to 755
INFO: file[/var/log/bluepill.log] created file /var/log/bluepill.log

Recall from the previous post that the bluepill_test recipe includes the bluepill recipe. This is the basic setup of bluepill.

1
2
3
4
5
6
7
8
9
INFO: Processing package[nc] action install (bluepill_test::default line 4)
INFO: Processing template[/etc/bluepill/test_app.pill] action create (bluepill_test::default line 16)
INFO: template[/etc/bluepill/test_app.pill] updated content
INFO: Processing bluepill_service[test_app] action enable (bluepill_test::default line 18)
INFO: Processing bluepill_service[test_app] action load (bluepill_test::default line 18)
INFO: Processing bluepill_service[test_app] action start (bluepill_test::default line 18)
INFO: Processing link[/etc/init.d/test_app] action create (/tmp/vagrant-chef-1/chef-solo-1/cookbooks/bluepill/providers/service.rb line 30)
INFO: link[/etc/init.d/test_app] created
INFO: Chef Run complete in 81.099185824 seconds

And this is the rest of the bluepill_test recipe. It sets up a test service that will basically be a netcat process listening on a port. Let’s take a moment here and discuss what we have.

First, we have successfully converged the default recipe in the bluepill cookbook via its inclusion in bluepill_test. This is awesome, because we know the recipe works exactly as we defined it, since Chef resources are declarative, and Chef exits if there’s a problem.

Second, we have successfully setup a service managed by bluepill itself using the LWRP included in the bluepill cookbook, bluepill_service. This means we know that the underlying provider configured all the resources correctly.

At this point, we could say “Ship it!” and release the cookbook, knowing it will do what we require. However, this may be disingenuous because we don’t know if the behavior of the system after all this runs is actually correct. Therefore we look to the next segment of output from Chef, from minitest:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
INFO: Running report handlers
Run options: -v --seed 38794
\# Running tests:
recipe::bluepill_test::default#test_0001_the_default_log_file_must_exist_cook_1295_ =
0.00 s = .
recipe::bluepill_test::default::create a bluepill configuration file#test_0001_anonymous =
0.00 s = .
recipe::bluepill_test::default::create a bluepill configuration file#test_0002_must_be_valid_ruby =
0.06 s = .
recipe::bluepill_test::default::runs the application as a service#test_0001_anonymous =
0.72 s = .
recipe::bluepill_test::default::runs the application as a service#test_0002_anonymous =
0.71 s = .
recipe::bluepill_test::default::spawn a netcat tcp client repeatedly#test_0001_should_receive_a_tcp_connection_from_netcat =
2.24 s = .
Finished tests in 3.746002s, 1.6017 tests/s, 1.8687 assertions/s.
6 tests, 7 assertions, 0 failures, 0 errors, 0 skips

This is performed by the minitest-handler, which runs the tests copied from the bluepill_test cookbook before. It’s outside the scope of this post to describe how to write minitest-chef tests, but we can talk about the output.

We have 6 separate tests that perform 7 assertions, and they all passed. The tests are asserting:

  1. The log file is created, and by the full name of the test, this is to check for a regression from COOK-1295.
  2. The .pill config file for the service must exist and be valid Ruby.
  3. The bluepill service must actually be enabled and running, thereby testing that those actions in the LWRP work.
  4. The running service, which listens on a TCP port, must be up and available, thereby testing that bluepill started the service correctly.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[vagrant command] END (1m29.24s)
Finished converging <default-ubuntu-1204> (2m15.45s).
Setting up <default-ubuntu-1204>
Finished setting up <default-ubuntu-1204> (0m0.00s).
Verifying <default-ubuntu-1204>
Finished verifying <default-ubuntu-1204> (0m0.00s).
Destroying <default-ubuntu-1204>
[vagrant command] BEGIN (vagrant destroy default-ubuntu-1204 -f)
[default-ubuntu-1204] Forcing shutdown of VM...
[Berkshelf] cleaning Vagrant's shelf
[default-ubuntu-1204] Destroying VM and associated drives...
[vagrant command] END (0m3.68s)
Vagrant instance <default-ubuntu-1204> destroyed.
Finished destroying <default-ubuntu-1204> (0m4.04s).
Finished testing <default-ubuntu-1204> (3m12.62s).
Kitchen is finished. (3m12.62s)

This output shows Test Kitchen cleaning up after itself. We destroy the Vagrant instance on a successful convergence and test run in Chef, because further investigation is not required. If the test failed for some reason, Test Kitchen leaves it running so you can log into the machine and poke around to find out what went wrong. Then simply correct the required part of the cookbook (recipes, tests, etc) and rerun Test Kitchen. For example:

1
2
3
4
% bundle exec kitchen login 1204
vagrant@ubuntu-1204$ ... run some commands
vagrant@ubuntu-1204$ ^D
% bundle exec kitchen converge 1204

My goal with these posts is to get some information out for folks to consider when examining Test Kitchen 1.0 alpha for their own projects. There’s a lot more to Test Kitchen, such as managing non-cookbook projects, or even using other kinds of tests. We’ll have more documentation and guides as we get the 1.0 release out.

Enjoy!