jtimberman's Code Blog

Chef, Ops, Ruby, Linux/Unix. Opinions are mine, not my employer's (CHEF).

Anatomy of a Test Kitchen 1.0 Cookbook (Part 1)

DISCLAIMER Test Kitchen 1.0 is still in alpha at the time of this post.

Update Remove Gemfile and Vagrantfile

Let’s take a look at the anatomy of a cookbook set up with test-kitchen 1.0-alpha.

Note It is outside the scope of this post to discuss how to write minitest-chef tests or “test cookbook” recipes. Use the cookbook described below as an example to get ideas for writing your own.

This is the full directory tree of Opscode’s “bluepill” cookbook:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
├── .kitchen.yml
├── Berksfile
├── CHANGELOG.md
├── CONTRIBUTING
├── LICENSE
├── README.md
├── TESTING.md
├── attributes
│   └── default.rb
├── metadata.rb
├── providers
│   └── service.rb
├── recipes
│   ├── default.rb
│   └── rsyslog.rb
├── resources
│   └── service.rb
├── templates
│   └── default
│       ├── bluepill_init.fedora.erb
│       ├── bluepill_init.freebsd.erb
│       ├── bluepill_init.rhel.erb
│       └── bluepill_rsyslog.conf.erb
└── test
    └── cookbooks
        └── bluepill_test
            ├── README.md
            ├── attributes
            │   └── default.rb
            ├── files
            │   └── default
            │       └── tests
            │           └── minitest
            │               ├── default_test.rb
            │               └── support
            │                   └── helpers.rb
            ├── metadata.rb
            ├── recipes
            │   └── default.rb
            └── templates
                └── default
                    └── test_app.pill.erb

I’ll assume the reader is familiar with basic components of cookbooks like “recipes,” “templates,” and the top-level documentation files, so let’s trim this down to just the areas of concern for Test Kitchen.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
├── .kitchen.yml
├── Berksfile
└── test
    └── cookbooks
        └── bluepill_test
            ├── attributes
            │   └── default.rb
            ├── files
            │   └── default
            │       └── tests
            │           └── minitest
            │               ├── default_test.rb
            │               └── support
            │                   └── helpers.rb
            ├── recipes
            │   └── default.rb
            └── templates
                └── default
                    └── test_app.pill.erb

Note that this cookbook has a “test” cookbook. I’ll get to that in a minute.

First of all, we have the .kitchen.yml. This is the project definition that describes what is required to run test kitchen itself. This particular file tells Test Kitchen to bring up nodes of the platforms we’re testing with Vagrant, and defines the boxes with their box names and URLs to download. You can view the full .kitchen.yml in the Git repo. For now, I’m going to focus on the suite stanza in the .kitchen.yml. This defines how Chef will run when Test Kitchen brings up the Vagrant machine.

1
2
3
4
5
- name: default
  run_list:
  - recipe[minitest-handler]
  - recipe[bluepill_test]
  attributes: {bluepill: { bin: "/opt/chef/embedded/bin/bluepill" } }

Each platform has a recipe it will run with, in this case apt and yum. Then the suite’s run list is appended, so for example, the final run list of the Ubuntu 12.04 node will be:

1
["recipe[apt]", "recipe[minitest-handler]", "recipe[bluepill_test]"]

We have apt so the apt cache on the node is updated before Chef does anything else. This is pretty typical so we put it in the default run list of each Ubuntu box.

The minitest-handler recipe existing in the run list means that the Minitest Chef Handler will be run at the end of the Chef run. In this case, it will use the tests from the test cookbook, bluepill_test.

The bluepill cookbook itself does not depend on any of these cookbooks. So how does Test Kitchen know where to get them? Enter the next file in the list above, Berksfile. This informs Berkshelf which cookbooks to download. The relevant excerpt from the Berksfile is:

1
2
3
4
cookbook "apt"
cookbook "yum"
cookbook "minitest-handler"
cookbook "bluepill_test", :path => "./test/cookbooks/bluepill_test"

Based on the Berksfile, it will download apt, yum, and minitest-handler from the Chef Community site. It will also use the bluepill_test included in the bluepill cookbook. This is transparent to the user, as I’ll cover in a moment.

Test Kitchen’s Vagrant driver plugin handles all the configuration of Vagrant itself based on the entries in the .kitchen.yml. To get the Berkshelf integration in the Vagrant boxes, we need to install the vagrant-berkshelf plugin in Vagrant. Then, we automatically get Berkshelf’s Vagrant integration, meaning all the cookbooks defined in the Berksfile are going to be available on the box we bring up.

Remember the test cookbook mentioned above? It’s the next component. The default suite in .kitchen.yml puts bluepill_test in the run list. This particular recipe will include the bluepill default recipe, then it sets up a test service using the bluepill_service LWRP. This means that when the nodes brought up by Test Kitchen via Vagrant converge, they’ll have bluepill installed and set up, and then a service running that we can test the final behavior. Since Chef will exit with a non-zero return code if it encounters an exception, we know that a successful run means everything is configured as defined in the recipes, and we can run tests against the node.

The tests we’ll run are written with the Minitest Chef Handler. These are defined in the test cookbook, files/default/tests/minitest directory. The minitest-handler cookbook (also in the default suite run list) will execute the default_test tests.

In the next post, we’ll look at how to run Test Kitchen, and what all the output means.

Last Check-in Time for Nodes

This one liner uses the knife exec sub-command to iterate over all the node objects on the Chef Server, and print out their ohai_time attribute in a human readable format.

1
knife exec -E 'nodes.all {|n| puts "#{n.name} #{Time.at(n[:ohai_time])}"}'

Let’s break this up a little.

1
knife exec -E

The exec plugin for knife executes a script or the given string of Ruby code in the same context as chef-shell (or shef in Chef 10 and earlier) if you start it up in it’s “main” context. Since it is knife, it will also use your .chef/knife.rb settings, so it knows about your user, key and Chef Server.

1
nodes.all

The chef-shell main context has helper methods to access the corresponding endpoints in the Chef Server API. Clearly we’re working with “nodes” here, and the #all method returns all the node objects from the Chef Server. This differs from search in that there’s a commit delay between the time when data is saved to the server, and the data is indexed by Solr. This is usually a few seconds, but depending on various factors like the hardware you’re using, how many nodes are converging, etc, it can take longer.

Anyway, we can pass a block to nodes.all and do something with each node object. The example above is a oneliner, so let’s make it more readable.

1
2
3
nodes.all do |n|
  puts "#{n.name} #{Time.at(n[:ohai_time])}"
end

We’re simply going to use n as the iterator for each node object, and we’ll print a string about the node. The #{}’s in the string to print with puts is Ruby string interpolation. That is, everything inside the braces is a Ruby expression. First, the Chef::Node object has a method, #name, that returns the node’s name. This is usually the FQDN, but depending on your configuration (node_name in /etc/chef/client.rb or using the -N option for chef-client), it could be something else. Then, we’re going to use the node’s ohai_time attribute. Every time Chef runs and it gathers data about the node with Ohai, it generates the ohai_time attribute, which is the Unix epoch of the timestamp when Ohai ran. When Chef saves the node data at the end of the run, we know approximately the last time the node ran Chef. In this particular string, we’re converting the Unix epoch, like 1358962351.444405 to a human readable timestamp like 2013-01-23 10:32:31 -0700.

Of course, you can get similar data from the Chef Server by using knife status:

1
knife status

The ohai_time attribute will be displayed as a relative time, e.g., “585 hours ago.” It will include some more data about the nodes like IP’s. This uses Chef’s search feature, so you can also pass in a query:

1
knife status "role:webserver"

The knife exec example is simple, but you can get a lot more data about the nodes than what knife status reports.

In either case, ohai_time isn’t 100% accurate, since it is generated at the beginning of the run, and depending on what you’re doing with Chef on your systems, it can take a long time before the node data is saved. However, it’s close enough for many use cases.

If more detailed or completely accurate information about the Chef run is required for your purposes, you should use a report handler, which does have more data about the run available, including whether the run was successful or not.

Install Chef 11 Server on CentOS 6

A few months ago, I posted briefly on how to install Chef 10 server on CentOS. This post revisits the process for Chef 11.

These steps were performed on a default CentOS 6.3 server install.

First, navigate to the Chef install page to get the package download URL. Use the form on the “Chef Server” tab to select the appropriate drop-down items for your system.

Install the package from the given URL.

1
rpm -Uvh https://opscode-omnitruck-release.s3.amazonaws.com/el/6/x86_64/chef-server-11.0.4-1.el6.x86_64.rpm

The package just puts the bits on disk (in /opt/chef-server). The next step is to configure the Chef Server and start it.

1
% chef-server-ctl reconfigure

This runs the embedded chef-solo with the included cookbooks, and sets up everything required – Erchef, RabbitMQ, PostgreSQL, etc.

Next, run the Opscode Pedant test suite. This will verify that everything is working.

1
% chef-server-ctl test

Copy the default admin user’s key and the validator key to your local workstation system that you have Chef client installed on, and create a new user for yourself with knife. You’ll need version 11.2.0. The key files on the Chef Server are readable only by root.

1
2
scp root@chef-server:/etc/chef-server/admin.pem .
scp root@chef-server:/etc/chef-server/chef-validator.pem .

Use knife configure -i to create an initial ~/.chef/knife.rb and new administrative API user for yourself. Use the FQDN of your newly installed Chef Server, with HTTPS. The validation key needs to be copied over from the Chef Server from /etc/chef-server/chef-validator.pem to ~/.chef to use it for automatically bootstrapping nodes with knife bootstrap.

1
% knife configure -i

The .chef/knife.rb file should look something like this:

1
2
3
4
5
6
7
8
log_level                :info
log_location             STDOUT
node_name                'jtimberman'
client_key               '/home/jtimberman/.chef/jtimberman.pem'
validation_client_name   'chef-validator'
validation_key           '/home/jtimberman/.chef/chef-validator.pem'
chef_server_url          'https://chef-server.example.com'
syntax_check_cache_path  '/home/jtimberman/.chef/syntax_check_cache'

Your Chef Server is now ready to use. Test connectivity as your user with knife:

1
2
3
4
5
6
% knife client list
chef-validator
chef-webui
% knife user list
admin
jtimberman

In previous versions of Open Source Chef Server, users were API clients. In Chef 11, users are separate entities on the Server.

The chef-server-ctl command is used on the Chef Server system for management. It has built-in help (-h) that will display the various sub-commands.

Chef and Net::SSH Dependency Broken

2nd UPDATE CHEF-3835 was opened by a member of the community; Chef versions 11.2.0 and 10.20.0 have been released by Opscode to resolve the issue.

UPDATE Opscode is working on getting a new release of the Chef gem with updated version constraints.

What Happened?

Earlier today (February 6, 2013), a new version of the various net-ssh RubyGems were published. This includes:

  • net-ssh 2.6.4
  • net-ssh-multi 1.1.1
  • net-ssh-gateway 1.1.1

Chef’s dependencies have a pessimistic version constraint (~>) on net-ssh 2.2.2.

What’s the Problem?

So what is the problem?

It appears to lie with net-ssh-gateway. The version of net-ssh-gateway went from 1.1.0 (released in April 2011), to 1.1.1. It depends on net-ssh. In net-ssh-gateway 1.1.0, the net-ssh version constraint was >= 1.99.1, which is fine with Chef’s constraint against ~> 2.2.2. However, in net-ssh-gateway 1.1.1, the net-ssh version constraint was changed to >= 2.6.4, which is obviously a conflict with Chef’s constraint.

What’s the Solution?

So, how can we fix it?

One solution is to use the Opscode Omnibus Package for Chef. This isn’t a solution for everyone, of course, but it does include and contain all the dependencies. This also doesn’t help if one wishes to install another gem that depends on Chef under the “Omnibus” Ruby environment along with Chef, because the conflict will be found. For example, to use the minitest-chef-handler gem for running minitest-chef tests.

vagrant@ubuntu-12-04:~$ /opt/chef/embedded/bin/gem install minitest-chef-handler ERROR: While executing gem ... (Gem::DependencyError) Unable to resolve dependencies: net-ssh-gateway requires net-ssh (>= 2.6.4)

Another solution is to relax / modify the constraint in Chef. This may be okay, but as of right now we don’t know if this will affect anything in the way that Chef uses net-ssh. We have tickets related to net-ssh version constraints in Chef:

Local-only Knife Configuration

In this post I want to discuss briefly an approach to setting up a shared Knife configuration file for teams using the same Chef Repository, while supporting customized configuration.

Background

Most infrastructures managed by Chef have multiple people working on them. Recently, several people in the Ruby community started working together on migrating RubyGems to Amazon EC2.

The repository has a shared .chef/knife.rb which sets some local paths where cookbooks and roles are located. In addition to this, I wanted to test building the infrastructure using a Chef Server and my own EC2 account.

The Approach

At Opscode, we believe in leveraging internal DSLs. The .chef/knife.rb (and Chef’s client.rb or solo.rb, etc) is no exception. While you can have a fairly simple configuration like this:

1
2
3
4
node_name        "jtimberman"
client_key       "/home/jtimberman/.chef/jtimberman.pem"
chef_server_url  "https://api.opscode.com/organizations/my_organization"
cookbook_path    "cookbooks"

You can also have something like this:

1
2
3
4
5
6
7
8
9
10
log_level     :info
log_location  STDOUT
node_name     ENV["NODE_NAME"] || "solo"
client_key    File.expand_path("../solo.pem", __FILE__)
cache_type    "BasicFile"
cache_options(path: File.expand_path("../checksums", __FILE__))
cookbook_path [ File.expand_path("../../chef/cookbooks", __FILE__) ]
if ::File.exist?(File.expand_path("../knife.local.rb", __FILE__))
  Chef::Config.from_file(File.expand_path("../knife.local.rb", __FILE__))
end

This is the knife.rb included in the RubyGems-AWS repo.

The main part of interest here is the last three lines.

1
2
3
if ::File.exist?(File.expand_path("../knife.local.rb", __FILE__))
  Chef::Config.from_file(File.expand_path("../knife.local.rb", __FILE__))
end

This says “if a file knife.local.rb exists, then load its configuration. The Chef::Config class is what Chef uses for configuration files, and the #from_file method will load the specified file.

In this case, the content of my knife.local.rb is:

1
2
3
4
5
6
7
8
9
10
11
node_name                "jtimberman"
client_key               "/Users/jtimberman/.chef/jtimberman.pem"
validation_client_name   "ORGNAME-validator"
validation_key           "/Users/jtimberman/.chef/ORGNAME-validator.pem"
chef_server_url          "https://api.opscode.com/organizations/ORGNAME"
cookbook_path [
  File.expand_path("../../chef/cookbooks", __FILE__),
  File.expand_path("../../chef/site-cookbooks", __FILE__)
]
knife[:aws_access_key_id]      = "Some access key I like"
knife[:aws_secret_access_key]  = "The matching secret access key"

Here I’m setting my Opscode Hosted Chef credentials and server. I also set the cookbook_path to include the site-cookbooks directory (this should probably go in the regular knife.rb). Finally, I set the knife configuration options for my AWS EC2 account.

The configuration is parsed top-down, so the options here that overlap the knife.rb will be used instead.

In the Repository

In the repository, commit only the .chef/knife.rb and not the .chef/knife.local.rb. I recommend adding the local file to the .gitignore or VCS equivalent.

1
2
3
% echo .chef/knife.local.rb >> .gitignore
% git add .chef/knife.rb .gitignore
% git commit -m 'keep general knife.rb, local config is ignored'

Conclusion

There are many approaches to solving the issue of having shared Knife configuration for multiple people in a single repository. The real benefit here is that the configuration file is Ruby, which provides a lot of flexibility. Of course, when using someone else’s configuration examples, one should always read the code and understand it first :–).

Local Templates for Application Configuration

Today I joined the Food Fight Show for a conversation about Application Deployment. Along the way, the question came up about where to store application specific configuration files. Should they be stored in a Chef cookbook for setting up the system for the application? Or shoud they be stored in the application codebase itself?

The answer is either, as far as Chef is concerned. Chef’s template resource can render a template from a local file on disk, or retrieve the template from a cookbook. The latter is the most common pattern, so let’s examine the former, using a local file on disk.

For sake of discussion, let’s use a Rails application that needs a database.yml file rendered. Also, we’ll assume that information about the application (database user, password, server) we need is stored in a Chef data bag. Finally, we’re going to assume that the application is already deployed on the system somehow and we just want to render the database.yml.

The application source tree looks something like this:

1
2
3
myapp/
-> config/
    -> database.yml.erb

Note that there should not be a database.yml (non-.erb) here, as it will be rendered with Chef. The deployment of the app will end up in /srv, so the full path of this template is, for example, /srv/myapp/current/config/database.yml.erb. The content of the template may look like this:

1
2
3
4
5
6
7
8
<%= @rails_env %>:
  adapter: <%= @adapter %>
  host: <%= @host %>
  database: <%= @database %>
  username: <%= @username %>
  password: <%= @password %>
  encoding: 'utf8'
  reconnect: true

The Chef recipe looks like this. Note we’ll use a search to find the first node that should be the database master (there should only be one). For the adapter, we may have set an attribute in the role that selects the adapter to use.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
results = search(:node, "role:myapp_database_master AND environment:#{node.chef_environment}")
db_master = results[0]

template "/srv/myapp/shared/database.yml" do
  source "/srv/myapp/current/config/database.yml.erb"
  local true
  variables(
    :rails_env => node.chef_environment,
    :adapter => db_master['myapp']['db_adapter'],
    :host => db_master['fqdn'],
    :database => "myapp_#{node.chef_environment}",
    :username => "myapp",
    :password => "SUPERSECRET",
  )
end

The rendered template, /srv/myapp/shared/database.yml, will look like this:

1
2
3
4
5
6
7
8
production:
  adapter: mysql
  host: domU-12-31-39-14-F1-C3.compute-1.internal
  database: myapp_production
  username: myapp
  password: SUPERSECRET
  encoding: utf8
  reconnect: true

This post is only part of the puzzle, mainly to explain what I mentioned on the Food Fight Show today. There are a number of unanswered questions like,

  • Should database.yml be .gitignore’d?
  • How do developers run the app locally?
  • How do I use this with Chef Solo?

As mentioned on the show, there’s currently a thread related to this topic on the Chef mailing list.

Process Supervision: Solved Problem

TL;DR: Use runit; skip to “This is a Solved Problem” and “Additional Resources” sections at the end of this post.

Recently on my twitter stream, I saw a link to a question on Stack Overflow about how to properly start Jetty. While the question is over 2.5 years old, the question stems from the common problem: How do I start a long running process, and keep it running? That an accepted answer is to run it under “nohup” (or screen?!) tells me that for some reason, this isn’t perceived as a solved problem.

In my opinion, process supervision is a solved problem. However, this wheel keeps getting reinvented, or reimplemented with solutions that are not easily manageable or scalable. In this post, I will clarify some terminology, describe commonly understood goals of process supervision, explore some of the common approaches, and how they don’t meet those goals, and finally explain why I feel this is a solved problem.

Note This is a Unix/Linux centric post. Windows has its own methods for running services, but the problem is definitely solved there too; Microsoft gave developers and administrators APIs that seem to be commonly used.

Process Supervision is Not Service Management

What exactly is process supervision?

One of the reasons for running servers is to provide a service of some kind. That is, an application that provides business value. A service is made up of one or more running processes on computer systems somewhere. Those processes are typically long-lived running daemons.

Process supervision is simply the concept of starting a daemon and keeping it running.

Note that this is not the same as more general service management. That may imply multiple services, which may be running on separate physical or virtual servers in a distributed system. That is outside the scope of this post.

This is also not service monitoring, a la graphing (munin, graphite) and/or alerting (nagios, sensu). That is also outside the scope of this post.

Goals and Benefits of Process Supervision

The Wikipedia page on Process Supervision describe its benefits as follows:

  • Ability to restart services which have failed
  • The fact that it does not require the use of “pidfiles”
  • Clean process state
  • Reliable logging, because the master process can capture the stdout/stderr of the service process and route it to a log
  • Faster (concurrent) and ability to start up and stop

To this, I add:

  • Manage processes with Unix signals
  • Simple setup that is configuration management-friendly (I’m obviously biased)
  • Dependency management between services on the same machine

For sake of argument, these combined lists of goals and benefits are my criteria for a process supversion system in this post.

Spoiler alert: runit covers all of these, as does s6.

Common Approaches

I’m going to talk about some approaches to process supervision, and how they don’t meet the criteria above. This won’t be comprehensive. I want to illustrate the highlights.

Nohup

First, the approach mentioned in the StackOverflow answer: “nohup.” The “nohup” command will “run a command immune to hangups, with output to a non-tty.” This typically involves logging into a system and manually invoking the command, such as:

1
nohup jar -jar start.jar

This doesn’t provide the ability to restart if it fails. The process state is contaminated with whatever the user has in their login shell profile(s). It will log to “nohup.out” by default, though it can be redirected to another file. It’s pretty safe to say that in my opinion that this fails the criteria above, and should not be used for long running processes, especially those as important as running your Java application.

Terminal Multiplexers

Next up, a common approach for running process is to start up screen (or tmux), and let them run in the foreground. Screen and tmux are terminal multiplexers. That is, they are “full-screen window manager[s] that multiplex a physical terminal between several processes.” These are great tools, and I use tmux for other reasons. However, this fails the criteria for the same reasons as nohup. Additionally, automating a process running in screen is not a simple task that can be repeated reliably.

SysV/BSD Init

Most commonly, process management (and not supervision) is handled on Unix/Linux systems by plain ol’ SysV/BSD “init.” These obviously fail to meet the criteria above, because two new(er) systems, “upstart” and “systemd” have been written to address the problems. That said, “init” fails pretty much all the criteria:

  1. No ability to restart services which have failed.
  2. One of the biggest problems is handling of “pidfiles.”
  3. Process state is theoretically clean, but then realize the average init script sources at least two different shell scripts for helper functions and environment variables, nevermind homegrown scripts that might read in user shell settings, too.
  4. The best one can hope for in logging is that the process writes to syslog, because many init scripts redirect log output in different, non-portable ways.
  5. Init is 100% sequential startup, no concurrency: “/etc/rc2.d/S*”
  6. Sure, you can send signals to the process, but most init scripts don’t support more than “reload” or “restart,” so you’re left on your own with picking up the pieces manually.
  7. Configuration management is easy, right? Just “ensure running” or “action :start” – except let’s not forget the “/etc/sysconfig” or “/etc/default” that sets more configuration. And that the package manager might have started it for you before you’re ready.
  8. Okay, I’ll give you this. As long as the sequential ordering of the init scripts is all correct to meet the dependencies.

Also, I have a personal pet peeve about init script complexity, inconsistency and non-portability between distributions of Linux, let alone Unix. I could (and probably will) write a post about that. For a taste, see CHEF-3774.

Note: I’m generalizing both SysV and BSD here. I admit I don’t have extensive experience with BSD systems, but my observation is it fails in very similar ways to SysV.

Systemd/Upstart

The newer init-replacement systems, systemd and upstart are worth their own section, though I’ll be brief. Other people have posted about these, and they’re pretty well covered on the s6 comparison.

Mainly, I see both of these as reinventing the solution that follows. However, a couple points I’d like to make:

  1. Both systems are primarily focused on desktop systems, rather than server systems. This is mostly evident in their use of D-Bus (Desktop bus), goals of faster boot time, and that their roots are in primarily desktop-oriented Linux distributions (Fedora and Ubuntu).
  2. They both completely replace init, which isn’t necessarily bad. However, they both operate differently from init, and each other, thus being a non-portable major difference between Linux distributions.

Other Process Supervision Systems

There are a lot of process supervision systems out there. In no particular order, an incomplete list:

I have varying degrees of experience with all of these. I have written significant amounts of automation code for operating some of them.

I think that with perhaps the exception of Monit(*), they are redundant and unnecessary.

(*): I don’t have as much experience with Monit as the others, and it seems to have a lot of nice additional features. I’ve also heard it goes well with my favorite solution.

This Is a Solved Problem

Earlier I mentioned runit meets all the criteria I listed above. In my opinion, it is the solution to the process supervision problem. While the runit website itself lists its benefits, it gets a nod from the s6 project, too. The underlying solution is actually the foundation both runit and s6 build on: Dan J Bernstein’s daemontools. The merits of DJB and daemontools are very well stated by the author of s6. I strongly recommend reading it, as he sums up my thoughts about DJB, too. It is worth noting that I do like s6 itself, but it isn’t currently packaged anywhere and adheres fairly strictly to the “slash package” convention, which isn’t compatible with the more popular Filesystem Hierarchy Standard.

Anyway, the real point of this post is to talk about why I like runit. I think the best way to explain it is to talk about how it meets the criteria above.

Restart Failed Services

The runsv program supervises services, and will restart them if they fail. While it doesn’t provide any notification that the service failed, other than possibly writing to the log, this means that if a configuration issue caused a service to fail, it will automatically start when the configuration file is corrected.

No PID files

Each service managed by runsv has a “service directory” where all its files are kept. Here, a “supervise” directory is managed by runsv, and a “pid” file containing the running PID is stored. However this isn’t the same as the pidfile management used in init scripts, and it means program authors don’t have to worry about managing a pidfile.

Clean Process State

Runit’s benefits page describes how it guarantees clean process state. I won’t repeat it here.

Reliable Logging

Likewise, Runit’s benefits page describes how it provides reliable logging.

Parallel Start/Stop

One of the goals and benefits lauded by systemd and upstart is that they reduce system boot time because various services can be started in parallel. Runit also starts up all the services it manages in parallel. More about this under dependency management, too.

Manage Processes (with Unix Signals)

The sv program is used to send signals to services, and for general management of the services. It is used to start, stop and restart services. It also implements a number of commands that can be used for signals like TERM, CONT, USR1. sv also includes “LSB-init” compatibility, so the binary can be linked to /etc/init.d/service-name so “init style” commands can be used:

1
2
sudo /etc/init.d/service-name status
sudo /etc/init.d/service-name restart

And so forth.

Simple Setup, Configuration Management Friendly

One of the benefits listed is that runit is packaging friendly. This is interesting because that also makes it configuration management friendly. Setting up a new service under runit is fairly simple:

  1. Create a “service directory” for the service.
  2. Write a “run” script that will start the service.
  3. Create a symbolic link from the service directory to the directory of supervised services.

As an example, suppose we want to run a git daemon. By convention, we’ll create the service directory in /etc/sv, and the supervised services are linked in /etc/service.

1
2
3
4
sudo mkdir /etc/sv/git-daemon
sudo vi /etc/sv/git-daemon/run
sudo chmod 0755 /etc/sv/git-daemon/run
sudo ln -s /etc/sv/git-daemon /etc/service

The run script may look like this (chpst is a program that comes with runit that changes the process state, such as the user it runs as):

1
2
3
4
5
#!/bin/sh
exec 2>&1
exec chpst -ugitdaemon \
  "$(git --exec-path)"/git-daemon --verbose --reuseaddr \
    --base-path=/var/cache /var/cache/git

Within a few seconds, the git daemon will be running:

1
2
root      6236  0.0  0.0    164     4 ?        Ss   19:03   0:00 runsv git-daemon
119      12093  0.0  0.0  11460   812 ?        S    23:46   0:00 /usr/lib/git-core/git-daemon --verbose --reuseaddr --base-path=/var/cache /var/cache/git

The documentation contains a lot more information and usesp

Note: As evidence that this is packaging friendly, this is provided by the very simple git-daemon-run package on Debian and Ubuntu.

Dependency Management

Many services require that other services are available before they can start. A common example is that the database filesystem must be mounted before the database can be started.

Depending on the services, this can be addressed simply by runsv restarting services that fail. For example, if the startup of the database fails because its file system isn’t mounted and the process exits with a return code greater than 0, then perhaps restarting will eventually work once the filesystem is mounted. Of course, this is an oversimplified naive example.

The runit FAQ addresses this issue by use of the program sv, mentioned earlier. Simply put, use the sv start command on the required service.

A Few Notes

I’ve used runit for a few years now. We used it at HJK Solutions to manage all system and application services that weren’t packaged with an init script. We use it at Opscode to manage all the services that run Opscode Private Chef.

  1. Manage services that run in the foreground. If a service doesn’t support running in the foreground, you’ll have a bad time with it in runit, as runsv cannot supervise it.
  2. Use svlogd to capture log output. It automatically rotates the log files, and can capture both STDOUT and STDERR. It can also be configured (see the man page).
  3. The author of runit is also the package maintainer for Debian/Ubuntu. This means runit works extremely well on these distributions.
  4. I don’t replace init with runit, so I can’t speak to that.
  5. Ian Meyer maintains an RPM spec for runit packages that work well. It will be included in Opscode’s runit cookbook soon.
  6. If you use Chef, use Opscode’s runit cookbook. It will soon have a resource/provider for managing runit services, instead of the definition.

Conclusion

Use runit.

But not just because I said so. Use it because it meets the criteria for a process supervision system, and it builds on the foundation pioneered by an excellent software engineer.

After all, I’m not the only one who thinks so.

Additional Resources

Cookbook Integration Testing With Real Examples

This blog post starts with a gist, and a tweet. However, that isn’t the whole story. Read on…

Today I released version 1.6.0 of Opscode’s apt cookbook. The cookbook itself needed better coverage for testing in Test Kitchen. This post will describe these additions to the cookbook, including how one of the test recipes can actually be used for actual production use. My goal is to explain a bit about how we go about testing with Test Kitchen, and provide some real world examples.

TL;DR – This commit has all the code.

Kitchenfile

First, the Kitchenfile for the project looked like this:

1
2
3
cookbook "apt" do
  runtimes []
end

This is outdated as far as Kitchenfiles goes. It still has the empty array runtimes setting which prevents Test Kitchen from attempting to run additional tests under RVM. We’ll remove this line, and update it for supporting the configurations of the recipes and features we want to test. The cookbook itself has three recipes:

  • default.rb
  • cacher-client.rb
  • cacher-ng.rb

By default, with no configurations defined in a Kitchenfile, test-kitchen will run the default recipe (using Chef Solo under Vagrant). This is useful in the common case, but we also want to actually test other functionality in the cookbook. In addition to the recipes, we want to verify that the LWRPs will do what we intend.

I updated the Kitchenfile with the following content:

1
2
3
4
5
cookbook "apt" do
  configuration "default"
  configuration "cacher-ng"
  configuration "lwrps"
end

A configuration can correspond to a recipe (default, cacher-ng), but it can also be arbitrarily named. This is a name used by kitchen test. The cacher-client recipe isn’t present because recipe[apt::cacher-ng] includes it, and getting the test to work, where the single node is a cacher client to itself, was prone to error. “I assure you, it works” :–). We’ll look at this later anyway.

With the above Kitchenfile, kitchen test will start up the Vagrant VMs and attempt to run Chef Solo with the recipes named by the configuration. This is a good start, but we want to actually run some minitest-chef tests. These will be created inside a “test” cookbook included with this cookbook. I created a cookbook named apt_test under ./test/kitchen/cookbooks using:

1
knife cookbook create apt_test -o ./test/kitchen/cookbooks

This creates the cookbook scaffolding like normal. I cleaned up the contents of the directory to contain what I needed to start:

1
2
3
4
5
6
test/kitchen/cookbooks//apt_test/metadata.rb
test/kitchen/cookbooks//apt_test/README.md
test/kitchen/cookbooks//apt_test/recipes
test/kitchen/cookbooks//apt_test/recipes/cacher-ng.rb
test/kitchen/cookbooks//apt_test/recipes/default.rb
test/kitchen/cookbooks//apt_test/recipes/lwrps.rb

The metadata.rb is as you’d expect, it contains the name, a version, maintainer information and a description. The README simply mentions that this is a test cookbook for the parent project. The recipes are the interesting part. Let’s address them in the order of the configurations in the Kitchenfile.

Configuration: default

First, the default recipe in the test cookbook. This is simply going to perform an include_recipe "apt::default". The way test kitchen runs, it will actually have the following run list for Chef Solo:

1
[test-kitchen::default, minitest-handler, apt_test::default, apt::default]

test-kitchen sets up some essential things for Test Kitchen itself. minitest-handler is the recipe that sets up minitest-chef-handler to run post-convergence tests. apt_test::default is the “test” recipe for this configuration, and finally apt::default is the cookbook’s recipe for this configuration named “default”.

Had we not done anything else here, the results are the same as simply running test kitchen with the original Kitchenfile (with runtimes, instead of configurations defined).

Minitest: default recipe

There are now minitest-chef tests for each configuration. The default recipe provides some “apt-get update” executes, and also creates a directory that can be used for preseeding packages. We’ll simply test that the preseeding directory exists. We could probably check that the cache is updated, but since this cookbook has worked for almost 4 years w/o issue for apt-get update we’ll trust it continues working :–). Here’s the test (ignoring the boilerplate):

1
2
3
  it 'creates the preseeding directory' do
    directory('/var/cache/local/preseeding').must_exist
  end

When Chef runs, it will run this test:

1
2
3
apt_test::default#test_0001_creates_the_preseeding_directory = 0.00 s = .
Finished tests in 0.007988s, 125.1808 tests/s, 125.1808 assertions/s.
1 tests, 1 assertions, 0 failures, 0 errors, 0 skips

Configuration: cacher-ng

Next, Test Kitchen runs the cacher-ng configuration. The recipe in the apt_test cookbook simply includes the apt::cacher-ng recipe. The run list in Chef Solo looks like this:

1
[test-kitchen::default, minitest-handler, apt_test::cacher-ng, apt::cacher-ng]

The apt::cacher-ng recipe also includes the client recipe, but basically does nothing unless the cacher_ipaddress attribute is set, or if we can search using a Chef Server (which Solo can’t, of course).

Minitest: cacher-ng recipe

The meat of the matter for the cacher-ng recipe is running the apt-cacher-ng service, so we’ve written a minitest test for this:

1
2
3
  it 'runs the cacher service' do
    service("apt-cacher-ng").must_be_running
  end

And when Chef runs the test:

1
2
3
apt_test::default#test_0001_runs_the_cacher_service = 0.06 s = .
Finished tests in 0.067250s, 14.8698 tests/s, 14.8698 assertions/s.
1 tests, 1 assertions, 0 failures, 0 errors, 0 skips

Configuration: lwrps

Finally, we have our custom configuration that doesn’t correspond to a recipe in the apt cookbook, lwrps. This configuration instead is to do a real-world integration test that the LWRPs actually do what they’re supposed to.

Recipe: apt_test::lwrps

The recipe itself looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
include_recipe "apt"

apt_repository "opscode" do
  uri "http://apt.opscode.com"
  components ["main"]
  distribution "#{node['lsb']['codename']}-0.10"
  key "2940ABA983EF826A"
  keyserver "pgpkeys.mit.edu"
  action :add
end

apt_preference "chef" do
  pin "version 10.16.2-1"
  pin_priority "700"
end

The apt recipe is included because otherwise, we may not be able to notify the apt-get update resource to execute when the new sources.list is dropped off.

Next, we use Opscode’s very own apt repository as an example because we can rely on that existing. When Test Kitchen runs, it will actually write out the apt repository configuration file to /etc/apt/sources.list.d/opscode.list, but more on that in a minute.

Finally, we’re going to write out an apt preferences file for pinning the Chef package. Currently, Chef is actually packaged at various versions in Ubuntu releases:

1
2
3
4
5
% rmadison chef
      chef | 0.7.10-0ubuntu1.1 | lucid/universe | source, all
      chef | 0.8.16-4.2 | oneiric/universe | source, all
      chef |  10.12.0-2 | quantal/universe | source, all
      chef |  10.12.0-2 | raring/universe | source, all

So by adding the Opscode APT repository, and pinning Chef, we can ensure that we’re going to have the correct version of Chef installed as a package, if we were installing Chef as a package from APT :).

When Chef Solo runs, here is the run list:

1
[test-kitchen::default, minitest-handler, apt_test::lwrps]

Notice it doesn’t have “apt::lwrps”, since that isn’t a recipe in the apt cookbook.

Minitest: lwrps recipe

The minitest tests for the lwrps configuration and recipe look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
  it 'creates the Opscode sources.list' do
    file("/etc/apt/sources.list.d/opscode.list").must_exist
  end

  it 'adds the Opscode package signing key' do
    opscode_key = shell_out("apt-key list")
    assert opscode_key.stdout.include?("Opscode Packages <packages@opscode.com>")
  end

  it 'creates the correct pinning preferences for chef' do
    chef_policy = shell_out("apt-cache policy chef")
    assert chef_policy.stdout.include?("Package pin: 10.16.2-1")
  end

The first test simply asserts that the Opscode APT sources.list is present. We could elaborate on this by verifying that its content is correct, but for now we’re going to trust that the declarative resource in the recipe is, ahem, declared properly.

Next, we run the apt-key command to show the available GPG keys in the APT trusted keyring. This will have the correct Opscode Packages key if it was added correctly.

Finally, we test that the package pinning for the Chef package is correct. Successful output of the tests looks like this:

1
2
3
4
5
apt_test::default#test_0003_creates_the_correct_pinning_preferences_for_chef = 0.05 s = .
apt_test::default#test_0002_adds_the_opscode_package_signing_key = 0.05 s = .
apt_test::default#test_0001_creates_the_opscode_sources_list = 0.00 s = .
Finished tests in 0.112725s, 26.6133 tests/s, 26.6133 assertions/s.
3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

The Real World Bits

The tests hide some of the detail. What does this actually look like on a real system? Glad you asked!

Here’s the sources.list for Opscode’s APT repository.

1
2
vagrant@ubuntu-12-04:~$ cat /etc/apt/sources.list.d/opscode.list
deb     http://apt.opscode.com precise-0.10 main

Next, the apt-key content:

1
2
3
4
5
vagrant@ubuntu-12-04:~$ sudo apt-key list
(snip, ubuntu's keys)
pub   1024D/83EF826A 2009-07-24
uid                  Opscode Packages <packages@opscode.com>
sub   2048g/3B6F42A0 2009-07-24

And the grand finale, the pinning preferences:

1
2
3
4
5
6
7
8
9
10
vagrant@ubuntu-12-04:~$ apt-cache policy chef
chef:
  Installed: 10.14.4-2.ubuntu.11.04
  Candidate: 10.16.2-1
  Package pin: 10.16.2-1
  Version table:
     10.16.2-1 700
        500 http://apt.opscode.com/ precise-0.10/main amd64 Packages
 *** 10.14.4-2.ubuntu.11.04 700
        100 /var/lib/dpkg/status

I used Opscode’s bento box for Ubuntu 12.04, which comes with the ‘omnibus’ Chef package version 10.14.4(-2.ubuntu.11.04). In order to install the newer Chef package and demonstrate the pinning, I’ll first remove it:

1
vagrant@ubuntu-12-04:~$ sudo dpkg --purge chef

Then, I install from the Opscode APT repository:

1
2
3
4
vagrant@ubuntu-12-04:~$ sudo apt-get install chef
...
Setting up chef (10.16.2-1) ...
...

And the package is installed:

1
2
3
4
5
6
7
8
9
vagrant@ubuntu-12-04:~$ apt-cache policy chef
chef:
  Installed: 10.16.2-1
  Candidate: 10.16.2-1
  Package pin: 10.16.2-1
  Version table:
 *** 10.16.2-1 700
        500 http://apt.opscode.com/ precise-0.10/main amd64 Packages
        100 /var/lib/dpkg/status

Currently the omnibus packages are NOT in the APT repository, since they do not have additional dependencies they are installed simply with dpkg. Don’t use this particular recipe if you’re using the Omnibus packages. Instead, just marvel at the utility of this. Perhaps instead, use the LWRPs in the apt cookbook to set up your own local APT repository and pinning preferences.

Conclusion

Test Kitchen is a framework for isolated integration testing in individual projects. As such, it has a lot of features, capabilities and also moving parts. Hopefully this post helps you understand some of them, and see how it works, and how you may be able to use it for yourself. Or, if you want, simply grab the apt_test::lwrps recipe’s contents and stick them in your own cookbook that manages Chef package installation and move along. :–)

All the code used in this post is available in the Opscode Cookbook’s organization “apt” repository.

Further Reading

Some Knife Plugins

I’ve shared my ~/.chef/plugins/knife directory as a Git repository on GitHub. There’s only a few, but I hope you find them useful. They are licensed under the Apache 2.0 software license, but please only use them for awesome.

gem

This plugin will install a gem into the Ruby environment that knife is executing in. This is handy if you want to install knife plugins that are gems.

If you have Ruby and Chef/Knife installed in an area where your user can write:

1
knife gem install knife-config

If you’re using an Omnibus package install of Chef, or otherwise require root access to install:

1
knife gem install knife-config

Note If you’re trying to install a gem for Chef to use, you should put it in a chef_gem resource in a recipe.

metadata

This plugin prints out information from a cookbook’s metadata. It currently only works with metadata.rb files, and not metadata.json files.

In a cookbook’s directory, display the cookbook’s dependencies:

1
knife metadata dependencies

Show the dependencies and supported platforms:

1
knife metadata dependencies platforms

Use the -P option to pass a path to a cookbook.

1
knife metadata name dependencies -P ~/.berkshelf/cookbooks/rabbitmq-1.6.4

nukular

I wrote on this blog about this plugin awhile ago.

This plugin cleans up after running chef-client on a VMware Fusion machine.

1
knife nukular guineapig base guineapig.int.example.com

plugin_create

This creates a plugin scaffolding in ~/.chef/plugins/knife. It will join underscored words as CamelCaseClasses.

For example,

1
knife plugin create awesometown

Creates a plugin that is class Awesometown that can be executed with:

1
knife awesometown

Whereas this,

1
knife plugin create awesome_town

Creates a plugin that is class AwesomeTown that can be executed with:

1
knife plugin awesome town

Chef Repository Berkshelf Conversion

I’ve been managing my personal systems with Chef since Chef was created, though I didn’t always use the same chef-repo for them. For about two years though, I’ve used pretty much the same repository, which has grown and accumulated cruft over time. Fortunately since it’s only me working on it, and I only have a few systems, it is really easy to make drastic changes.

I have a number of to do items that I’ve put off, so this weekend I decided to spend some time cleaning house, and convert the repository to have cookbooks managed by Berkshelf.

Rationale

There are other cookbook management tools, including the built in “knife cookbook site install”, librarian-chef, and whisk. I have used the knife command as long as it has existed, and it worked well for awhile. The buzz in the community since the Chef Summit has been around “library” vs “application” cookbooks, especially in conjunction with Berkshelf so I thought I’d give it a go.

Before Berkshelf

Before I started on this migration, here are some numbers about cookbooks in my chef-repo.

  • 113 total cookbooks
  • 33 “chef-vendor” branches (knife cookbook site install creates a branch for each cookbook)
  • 50 “cookbook site” tags

Overall, I had about a half dozen cookbooks that I actually modified from their “upstream” versions on the community site. Most of those customizations were adding munin plugins, changing a couple minor settings in a template, or long term workarounds that are actually fixed in the current released versions.

The Conversion

The conversion was fairly straight-forward. It required some preparation:

  • Determine the cookbooks that would be managed by Berkshelf.
  • Refactor customizations into “application” cookbooks or otherwise.
  • Remove all those cookbooks, and the completely unused cookbooks.

Cookbooks in Berkshelf

Determining the cookbooks that would be managed by Berkshelf was simple. I started with all the cookbooks that had been installed via knife cookbook site install. Since the command creates a branch for each one, I had a nice list already. I did review that for cookbooks I know I wasn’t using anymore, or didn’t plan to use for long, to simplify matters.

1
git branch | grep 'chef-vendor' | awk -F- '{print $3}'

I also looked at the cookbooks that are applied to node’s expanded run lists. This knife exec one-liner will return such a list.

1
knife exec -E "nodes.find('recipes:*').map {|n| n[:recipes]}.flatten.map {|r| r.gsub(/::.*/, '')}.sort.uniq"

Refactoring Customization

My repository has a fair amount of customization to the cookbooks from the community site. Rather than go through all the changes, I’ll summarize with the more interesting parts.

First, I use Samba for filesharing from an Ubuntu server. I originally changed the samba::server recipe so the services used upstart as the provider and set a start_command on Ubuntu, which looked like this (s is smbd or nmbd):

1
2
3
4
5
6
service s do
  pattern "smbd|nmbd" if node["platform"] =~ /^arch$/
  provider Chef::Provider::Service::Upstart if platform?("ubuntu")
  start_command "/usr/bin/service #{s} start" if platform?("ubuntu")
  action [:enable, :start]
end

The upstream cookbook doesn’t have this change, so I added an “application” cookbook, housepub-samba, which has this as the default recipe:

1
2
3
4
5
["smbd", "nmbd"].each do |s|
  srv = resource("service[#{s}]")
  srv.provider Chef::Provider::Service::Upstart
  srv.start_command "/usr/bin/service #{s} start"
end if platform?("ubuntu")

For each of the Samba services, we look up the resource in the resource collection, then change the provider to upstart, and set the start_command to use upstart’s service command.

Next, I use OpenVPN. I also want to modify the template used for the /etc/openvpn/server.conf and /etc/openvpn/server.up.sh resources. Again, I create an “application” cookbook, housepub-openvpn, and the default recipe looks like this:

1
2
resources("template[/etc/openvpn/server.conf]").cookbook "housepub-openvpn"
resources("template[/etc/openvpn/server.up.sh]").cookbook "housepub-openvpn"

This is a shorter form of what was done for Samba’s services above. The #resources method does the lookup and returns the resource, and any of the resource parameter attributes can be used as a method, so I send the cookbook method to both template resources, setting this cookbook, housepub-openvpn as the cookbook that contains the template to use. Then, I copy my customized templates into cookbooks/housepub-openvpn/templates/default, and Chef will do the right thing.

Other cookbook changes I made were:

  • Change the data bag name used in djbdns::internal_server, which I changed back so I could use the upstream recipe.
  • Add munin plugins to various cookbooks. As I’m planning to move things to Graphite, this is unnecessary and removed.
  • A few of my OS X cookbooks have the plist file for use with mac_os_x_plist LWRP. These are simply moved to my workstation data bag.

Finally, one special case is Fletcher Nichol’s rbenv cookbook. The rbenv::user_install recipe manages /etc/profile.d/rbenv.sh, which requires root privileges. However, on my workstations where I use this particular cookbook, I run Chef as my user, so I had to comment this resource out. To allow for a non-privileged user running Chef, the better approach is to determine whether to manage that file by using an attribute, so I opened a pull request, which is now merged. Now I just have the attribute set to false in my workstation role, and can use the cookbook unmodified.

Remove Unused and Berkshelf Cookbooks

Removing the unused cookbooks, and the cookbooks managed by Berkshelf was simple. First, each cookbook gained an entry in the Berksfile. For example, apache2.

1
cookbook "apache2"

Next, the cookbook was deleted from the Chef Server. I did this, purging all versions, because I planned to upload all the cookbooks as resolved by Berkshelf.

1
knife cookbook delete -yap apache2

Finally, I removed the cookbook from the git repository.

1
2
3
git rm -r cookbooks/apache2
git add Berksfile
git commit -m 'apache2 is managed by Berkshelf'

The cookbooks that I didn’t plan to use, I simply didn’t add to Berkshelf, and removed them all in one commit.

After Berkshelf

The net effect of this change is a simpler, easier to manage repository. I now have only 23 cookbooks in my cookbooks directory. Some of those are candidates for refactoring and updating to the upstream ones, I just didn’t get to that yet. Most of them are “internal” cookbooks that aren’t published, since they’re specific for my internal network, such as my housepub-samba or housepub-openvpn cookbooks.

On my Chef Server, I have 90 total cookbooks, which means 67 are managed by Berkshelf. I have 62 entries in my Berksfile, and some of those are dependencies of others, which means that can be refactored some as well.

The workflow is simpler, and there’s fewer moving parts to worry about changing. I think this is a net positive for this since I do it in my free time. However, there’s a couple of issues, which should be addressed in Berkshelf soon.

First, Berkshelf issue #190, which would have berks update take a single cookbook to update. Currently, it has to update all the cookbooks, and this takes time for impatient people.

Second, issue #191, which would allow berks upload to take a single cookbook to upload. Normally, one could just use knife cookbook upload, but the directory where Berkshelf stores cookbooks it is managing are not located in the cookbook_path, and the knife command uses the directory name a the cookbook name. Berkshelf creates directories like ~/.berkshelf/cookbooks/apache2-1.3.0, so the way to upload Berkshelf managed cookbooks is all together with the berks upload command. This isn’t a huge deal for me as I already uploaded all the cookbooks I’ve been using once.

All in all, I am happy with this workflow, though. It is simple and hassle-free for me. Plus, I have more flexibility for maintaining my additional non-Opscode cookbooks.