Using Bundler and Capistrano Behind a Proxy

When you’re developing applications for a fortune 50 company, you’re bound to run into a problem with their security focused IT department. Recently, I was having trouble deploying a rails application to a server that was sitting behind a proxy. I tried to solve the problem by changing bundler, then by reconfiguring capistrano, but those were both dead ends.

Setup Your $http_proxy

I thought that I could edit my .gemrc and that bundler would use those settings, but that didn’t work. I had been installing gems by using the --http-proxy flag so it seemed like that would work. I had also been contacting the outside world (for git and other things) by setting the http_proxy environment variable, but I was doing so manually each time I needed it, rather than setting it up in my shell’s rc file. Here’s how I fixed it.

This assumes that the shell you’re editing belongs to the user that deploys the application in capistrano.

1. Add the http_proxy to your shell’s rc file

# bash
export http_proxy=http://proxy.example.com:8080

#tcsh
setenv http_proxy http://proxy.example.com:8080

2. Ensure that when you create a new session, the http_proxy environment variable is actually set

$> echo $http_proxy

http://proxy.example.com:8080

Now, when you deploy with capistrano, the bundle command will the proxy that’s setup and communication with your gem sources won’t timeout.

Printing PDF HightCharts with wkhtmltopdf and PDFKit

I have a rails 3 application that’s using Highcharts the fantastic javascript powered charting library. I’m also generating PDF versions of some pages using wkhtmltopdf and PDFKit for things like “download this report” and what not.

Everything was working great, until I noticed that my line charts were being rendered very poorly when I was viewing a PDF. As HTML they look great, but as a PDF they look like this:

I dug around a bit and found a helpful forum post explaining the problem, something to do with how wkhtmltopdf renders opacity. The important part is this (specifically line 7):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<script type="text/javascript" charset="utf-8">
 
  var chart = new Highcharts.Chart({
      chart: { renderTo: 'container' },
 
      // These make it work nicely with wkhtmltopdf
      plotOptions: { series: { enableMouseTracking: false, shadow: false, animation: false } },
 
      xAxis: { categories: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] },
      series: [{ data: [29.9, 71.5, 106.4, 129.2, 144.0, 176.0, 135.6, 148.5, 216.4, 194.1, 95.6, 54.4] }]
 
  });
 
</script>

Turning off animation, mouse tracking and shadows apparently gives wkhtmltopdf enough to render the chart. When I added that to my charts, I got a chart that looks like this:

Five Essential Performance Optimization Steps

Performance optimization of any software project is a mix of art, science, stubbornness and inexplicable improvement. Most developers will spend incredible amounts of time tweaking their application to squeeze out performance gains, regardless of the value they provide. As the old saying goes, given enough optimization rope, developers will quickly hang themselves.

Using the following five steps, you can ensure that the time you spend optimizing your application provides maximum value.

1. Benchmarking

Optimizing your application is worthless without a benchmark. You absolutely must have a measurement of performance before you begin so that your changes can be quantified when you are done. I have seen many developers start out with a complaint from a client about a “slow” page and then spend a few hours making changes only to conclude with “it seems a lot faster now”. If you’re optimizing a rails application, something like New Relic’s RPM is a great choice. Another tool I’ve had success with is brynary’s rack-bug.

2. Set a Timebox

Once you’ve recorded benchmarks for the problem areas of your application you need to determine what specifically is wrong and how to fix the problem. Since it’s possible to spend hours upon hours of time under the directive of “performance improvement” you need to set a hard limit on the time you spend optimizing. Two timeboxes should be used, one to locate the specifics of the problem and another to fix them.

3. Similarity of Environments

The vast majority of performance issues will be discovered by real users performing real tasks in a production environment. In a rails application, the behavior of the development environment is far different from that of the production environment, so why would you optimize in anything but the production environment? Of course you can’t optimize on the live production application, but you can setup a similar environment that behaves like the live production environment. Just recently I spent an hour optimizing a slow page and actually gained an 80% improvement, the only problem was, in production the page was slow for a completely different reason. Talk about a tough conversation, having to go back to the client and explain that my previous detailed explanation and assurance that the problem was solved was instead all wrong.

4. Accurate Dataset

How many times have you implemented a feature, ran it through some tests, demo’d to the client, deployed it to staging only to find that when it makes it to production it is impossibly slow to the point of being useless? If your production environment has a huge database with a hundred thousand rows and a few hundred concurrent users, testing on your development machine with 10 rows in the database is a waste of time. Make sure that the dataset on which you are testing your improvements accurately reflects your production dataset.

5. Low-hanging Fruit and Small Victories

Since you will have a fixed amount of time in which to optimize, you need to take small wins and get the low-hanging fruit when possible. Maybe the best solution to your problem is to drastically change the application’s design garnering a big boost to performance. However, if that route takes you beyond your timebox maybe a few database indexes and an optimized query are all you can afford. Always remember that a small performance improvement is better than no improvement and that the small stuff always adds up.

BONUS Stakeholders Care About Value

When a report of a slow feature comes in, make sure that you evaluate the time and effort it will take to implement the improvement. Communicate this with the client, product owner and/or stakeholders so that they can evaluate the value they’ll receive from the time you spend. If the feature in question is used infrequently it might not be worth the time or cost. Conversely, performance optimization should not always be viewed as something done for free. If an application has grown in size beyond what was originally envisioned, it is perfectly reasonable to charge your client for performance enhancements.

Don’t be like most developers, keep these steps in mind and make your next performance optimization session a success.

Common ThinkingSphinx Configuration Problems

I have recently added full-text search to two Rails projects using Sphinx and the ThinkingSphinx gem. While I have been extremely impressed with both Sphinx and ThinkingSphinx, I did stumble along the way a few times trying to get everything setup and working consistently. On both projects I had setup delta indexing so that my very large search indexes would not need to be rebuild but once per day. On one of the projects I also added monit to keep searchd running, monitoring the process once every few minutes.

Updated: 9/23/2010

Installing Sphinx and the initial setup of ThinkingSphinx were straightforward and relatively simple, however, I spent about two weeks debugging what turned out to be a collection of small problems that, together, made me think I had gone terribly wrong in choosing Sphinx for my full-text search needs.

searchd Binary Path

Problem
You try and run some of the ThinkingSphinx rake tasks, but they fail because ThinkingSphinx can’t find the required Sphinx binaries.

Solution
ThinkingSphinx needs to be able to start and stop sphinx when deploying. If you ssh into your server using the username of the user who runs your rails application (deploy in my case) and type which searchd at the command prompt, you should see something similar to /usr/bin/searchd although it will vary depending on how you installed Sphinx.

In your production version of the config/sphinx.yml file, set the bin_path configuration option. Let’s say that which searchd returns /usr/local/bin/sphinx/bin/searchd you’d want your sphinx.yml to contain the following:

production:
  bin_path: "/usr/local/bin/sphinx/bin"

searchd pid File

Problem
You’ve setup Monit to monitor searchd but Monit is unable to monitor or restart searchd. In my case, the location of the pid file was not what I was expecting so Monit could not monitor the searchd process.

Solution
In order to have Monit monitor the searchd process, it’s necessary to specify the location of the search pid file in your monit configuration. When you use ThinkingSphinx to build your Sphinx configuration file, the location of your pid file is specified in the resulting production.sphinx.conf. I decided that I wanted to specify the location of the searchd pid file so that others wouldn’t have to go digging through the auto generated configuration file to find it.

production:
  bin_path: "/usr/local/bin/sphinx/bin"
  pid_file: "/home/deploy/apps/my_rails_app/shared/log/searchd.pid"

My /etc/monit.d/sphinx configuration file:1

  check process searchd with pidfile /home/deploy/apps/my_rails_app/current/log/searchd.production.pid 
  start program = "/usr/local/bin/start_sphinx" as uid deploy 
  stop program = "/usr/local/bin/stop_sphinx" as uid deploy

The /usr/local/bin/start_sphinx file used to start searchd:

  #!/bin/bash
  export PATH="$PATH:/usr/local/bin"
 
  cd /home/deploy/apps/my_rails_app/current && RAILS_ENV=production /usr/bin/rake thinking_sphinx:index
  cd /home/deploy/apps/my_rails_app/current && RAILS_ENV=production /usr/bin/rake thinking_sphinx:start > log/sphinx.log 2>&1

The /usr/local/bin/stop_sphinx file used to stop searchd:

  #!/bin/bash
  export PATH="$PATH:/usr/local/bin"
 
  cd /home/deploy/apps/my_rails_app/current && RAILS_ENV=production /usr/bin/rake thinking_sphinx:stop > log/sphinx.log 2>&1

Index File Location

Problem
You’ve deployed your application and build the index. A cron job has been setup to rebuild the index nightly. You notice that when you deploy your application, your indexed results seem to be missing.

Solution
The index files that Sphinx builds and uses should be kept in a shared directory that is available across multiple deploys. Using the typical capistrano setup a good place would be /home/deploy/apps/my_rails_app/shared. By default ThinkingSphinx will store these in RAILS_ROOT/db/sphinx/ENVIRONMENT which is fine in development, but not in production.

First, create a directory in your shared folder on production:

$> mkdir /home/deploy/apps/my_rails_app/shared/db

Then, if your capistrano deployment recipe for production, symlink the shared db path to current release:

  run "ln -nsf  #{shared_path}/db/sphinx/production  #{release_path}/db/sphinx/production"

Finally, tell ThinkingSphinx to use the shared path for the Sphinx’s index files.

production:
  bin_path: "/usr/local/bin/sphinx/bin"
  pid_file: "/home/deploy/apps/my_rails_app/shared/log/searchd.pid"
  searchd_file_path: "/home/deploy/apps/my_rails_app/shared/db/sphinx/production"

Permissions

Problem
The files being created by Sphinx are owned by root and cannot be modified by the user running the ThinkingSphinx rake tasks, usually deploy. This often comes up when delta indexing is being used and the delta indexes are being modified or merged back into the full index.

Solution
You should start and stop searchd using the ThinkingSphinx rake tasks. This will ensure that searchd is started by a user who can later modify the index files if needed. If you are using Monit, make sure you setup your Monit configuration to start or restart the searchd process as the same user who runs the ThinkingSphinx rake tasks.

This is accomplished in my Monit configuration file by using as uid deploy:

  start program = "/usr/local/bin/start_sphinx" as uid deploy 
  stop program = "/usr/local/bin/stop_sphinx" as uid deploy

Monit Restarts searchd Before Rebuilding is Complete

Problem
When rebuilding your index, Monit restarts searchd before the index is rebuilt.

Solution
While there are ways to pause Monit for certain services, I found the easiest way to solve this problem was to increase the frequency at which Monit monitors my searchd process. Given the traffic of your site and required uptime of the search index, this solution may not be for you. For me the magic frequency was every three minutes.

Missing Configuration File

Problem
When you deploy and build your index and configuration file Sphinx appears to be working, the next time you deploy, your log file fills up with errors about a missing configuration file.

Solution
This one is an easy fix. In the ThinkingSphinx documentation, the deployment strategy is simple.

Essentially, the following steps need to be performed for a deployment:
  • stop Sphinx searchd (ensure it’s running)
  • generate Sphinx configuration
  • start Sphinx searchd
  • ensure index is regularly rebuilt

Make sure that part of your deploy process makes a call to the thinking_sphinx:configure rake task. This will regenerate the sphinx configuration file each time you deploy.

Rebuilding My Index is Too Slow!

Problem
Your index has many thousands of records. Running rake thinking_sphinx:rebuild works great, but it’s very slow.

Solution
I recently found out about the thinking_sphinx:reindex rake task. On my sphinx installation with ~115,000 indexed records, reindex is significantly faster than rebuild, so much so that it can be run on an hourly basis to keep my delta indexes from becoming too large.

1 Hat tip to Chris Irish for the Monit configuration and start/stop scripts.

Zero To Tested With Cucumber and Factory Girl

Get your project from zero to tested leveraging the power of Cucumber and Factory Girl

Testing can be a daunting task, especially for developers who are new to Rails. It sounds like a great idea, but where do you start? What do you test? What don’t you test? For many developers, the hardest part of testing is jumping in and getting your feet wet. Luckily, Cucumber makes getting started with testing easy, you’ll be testing your code form top to bottom in no time.

What is Cucumber?

Cucumber, in simple terms, is a framework for writing human-readable, meaningful tests that allow you to test the full functionality of your Rails project from database access to business logic to what’s displayed in your views.

From the Cucumber Wiki:

Cucumber lets software development teams describe how software should behave in plain text. The text is written in a business-readable domain-specific language and serves as documentation, automated tests and development-aid – all rolled into one format.

When you write your tests with Cucumber you get full-stack automated tests, documentation and a tool to help you communicate with your product owner.

Who’s the Factory Girl?

Factory Girl is a fixture replacement for Rails that allows you to create meaningful objects for use in your tests.

From the Factory Girl Github Page:

factory_girl is a fixtures replacement with a straightforward definition syntax, support for multiple build strategies (saved instances, unsaved instances, attribute hashes, and stubbed objects), and support for multiple factories for the same class (user, admin_user, and so on), including factory inheritance.

When you use Factory Girl in conjunction with Cucumber you can simplify complex test setup and use real objects from the database, not brittle mocks.

Getting Setup

Getting your project setup with Cucumber and Factory Girl is pretty straightforward. First, you’ll need a few gems installed:


    sudo gem install cucumber-rails
    sudo gem install webrat
    sudo gem install thoughtbot-factory_girl --source=http://gemcutter.org

Note: As of 0.5.0 the Rails specific functionality was extracted from the cucumber gem into the cucumber-rails gem

Cucumber comes pre-packaged with some generators for getting your project ready for testing. From the root of your Rails project, run the generator:


    ruby script/generate cucumber --webrat --rspec

You’ll notice that we’re passing a couple of flags to the generator, --webrat specifies that we want to use webrat as our automated browser and --rspec specifies that we want to use rspec as our test framework. If you don’t pass these options Cucumber will guess which options you want based on the gems which you have installed.

After running the generator, you will see some output about the files that were created:


    create  config/cucumber.yml
    create  config/environments/cucumber.rb
    create  script/cucumber
    create  features/step_definitions
    create  features/step_definitions/web_steps.rb
    create  features/support
    create  features/support/env.rb
    create  features/support/paths.rb
    create  lib/tasks
    create  lib/tasks/cucumber.rake

Here’s a brief description of what some of the important files do:

*config/cucumber.rb*: This is your cucumber environment, similar to production.rb or development.rb. Typically you’ll add cucumber specific config.gem directives and similar “cucumber only” items in here.

*features/step_definitions/web_steps.rb*: These are the webrat steps that you get for free with Cucumber and Webrat.

*features/support/env.rb*: This file has Cucumber specific setup and configuration options. This changes often, so it’s recommended to not alter this file directly, but to create your own custom env.rb (e.g. custom_env.rb).

*features/support/paths.rb*: Cucumber needs to be aware of the custom routes in your application that you plan on using when testing, this is where they are specified.

Cucumber from 10,000ft

Now that you’ve got your project configured to use Cucumber, let’s go over a high level view of the general concept of testing with Cucumber.

Cucumber uses “Features” to describe the functionality of your application. Features, stored in the features directory, are written in a simple human-readable format called Gherkin. The “Steps” in your feature files that describe the functionality of your application are backed up by step files in the features/step_definitions directory. The step files act like translations between the human-readable feature files and the underlying Ruby code of your Rails application.

Typically you will be using Cucumber in conjunction with Webrat to test your application. Webrat acts as the person behind the keyboard clicking around the site, filling out forms and viewing resulting pages. If you’ve ever tested your project by manually filling out forms and verifying output, that’s exactly what Webrat does for you, except it’s automated.

Let’s Get Testing Already

Rather than go into detail about TDD or BDD and why you should use them, I’m going to assume that you’re like most Rails developers looking to learn more about testing. You probably have an application or two out there in the wild, seemingly working just fine. However, you know that you should get some test coverage in there so that you can confidently make changes and debug problems as they come up. With that in mind, I am going to use a sample application that already has some functionality and we will add our Cucumber tests where needed.

A Veterinary Patient Management System

Our sample application is a simple patient management system for a Veterinarian that deals with Owners, Pets and Visitations. Let’s take a look at some of the models.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 
    class Owner < ActiveRecord::Base
 
      has_many :pets
      validates_presence_of :name
      validates_presence_of :email
 
    end
 
    class Pet < ActiveRecord::Base
      belongs_to :owner
 
      validates_presence_of  :name
      validates_presence_of  :species
      validates_inclusion_of :species, :in => %w( dog cat bird snake ),
      :message => "Species: %s is not included in the list of accepted species"
 
    end
 
 
    class Visitation < ActiveRecord::Base
 
      has_one :pet
 
    end

The first thing for which we’ll be adding a Feature is the process for creating a new Owner record. When a new client calls the Veterinarian’s office, an employee needs to enter them into the system.

Writing a Feature

Cucumber features are very straight forward. The feature file explains a set of functionality by describing different cases through scenarios.

1
2
3
4
5
6
7
8
9
10
 
    Feature: Manage Owners
      In order to value
      As a role
      I want feature
 
      Scenario: title
        Given context
        When event
        Then outcome

Each Cucumber feature represents a desired software feature that a certain user of the system wants in order to achieve some end goal. The scenarios are formatted so that a context is setup, an event occurs and an outcome is evaluated.

Let’s examine our real Manage Owners feature.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 
    Feature: Managing Owners
      In order to manage our client list
      As an employee
      I want to be able to CRUD owners
 
      Scenario: Creating a new Owner
        Given I am on the homepage
        And I follow "Owners"
        Then I should be on the owners index page
        Given I follow "New owner"
        And I fill in "Name" with "Clayton"
        And I fill in "E-Mail" with "clayton@example.org"
        And I fill in "Address" with "100 Cactus Rd"
        And I fill in "City" with "Scottsdale"
        And I fill in "State" with "Arizona"
        And I fill in "Postal Code" with "85000"
        And I fill in "Phone" with "480-555-1212"
        When I press "Create"
        Then I should see "Owner was successfully created."
        And I should be on the owners index page

Note: When you specify “And” in a Cucumber feature the step inherits the Given/Then/When from the previous step

Our scenario covers the act of creating a new Owner record. We first setup a context (I am on the homepage). Next we click a link in the navigation (I follow “Owners”). Once we’re on the index view for the Owners controller (I should be on) we can click another link to get to the new view for the Owners controller. From there we are simply instructing Webrat to fill out our form (I fill in) and click the submit button at the bottom (I press “Create”). Finally we are asking Webrat to read the resulting page and see if there is some text present (I should see).

Our feature is looking pretty good, but we still have a few more steps before we can get this scenario passing. Running the this feature from the command line will give us some output and more direction. Once you’re run rake db:migrate and rake db:test:prepare run the following from your application’s root directory:


  ruby cucumber features/manage_owners.feature

Looks like we’re failing, and Cucumber gave us some information about what’s wrong.


  Can't find mapping from "the owners index page" to a path.
  Now, go and add a mapping in patient-management/features/support/paths.rb

If we open up our features/support/paths.rb file we can add the correct path right below the default path for the “home page”. The paths file is really just a list of regular expressions that Cucumber uses to match named routes to words in scenarios.

1
2
3
4
5
 
  when /the homes?page/
    '/'
  when /the owners index page/
    owners_path

With that little change we have our first passing Cucumber scenario! When running Cucumber features from the command line, Cucumber will print out a little summary message.


    1 scenario (1 passed)
    14 steps (14 passed)
    0m0.290s

Complex Scenario Setup

In our above example the setup, or Given steps, were pretty simple. However, sometimes you need more complex setups for your scenarios, like editing a record for example. Cucumber makes it easy to create objects in your scenarios using tables in multiline step arguments.

Here is a feature that goes through the process of editing an owner’s address. You’ll see the Cucumber table that is used to setup our existing owner record. The first row in the Cucumber table correspond to some of the attribute names for our Owner model.

1
2
3
4
5
6
7
8
9
10
11
12
    Scenario: Editing an existing Owner
      Given the following owners:
       | name    | email               | address       |
       | Clayton | clayton@example.org | 100 Cactus Rd |
      Given I am on the homepage
      And I follow "Owners"
      Then I should be on the owners index page
      When I follow "Edit"
      And I fill in "Address" with "567 N Scottsdale Rd"
      When I press "Update"
      Then I should see "Owner was successfully updated."
      And I should be on the owners index page

When we try to run this, Cucumber will give us the “shell” for our custom step definition “Given the following owners”.

    Given /^the following owners:$/ do |table|
      # table is a Cucumber::Ast::Table
      pending # express the regexp above with the code you wish you had
    end

Since we never created a step definition for this Cucumber feature, we can go ahead and add a new file, features/step_definitions/manage_owners_steps.rb. It is important to note that Cucumber will load all of the files in features/step_definitions and that steps can be used across scenarios and features. If you think you’re going to re-use a step definition, it might be a good idea to place it in something like features/step_definitions/shared_steps.rb.

Note: You should put some thought into your step organization as your project grows in size.

Cucumber takes the table argument from your scenario and turns it into an array of hashes, which is technically a Cucumber::Ast::Table object. By iterating through each hash of the array, we can build up an object that we can use later in our tests. Since we want this object to live in the database and we don’t want to describe every attribute of the owner in our scenario, we can use Factory Girl to simplify the process.

The factory for our owner is simple. I won’t go into the specifics of how to create factories as that’s a whole other article. You can read more about using factories on the github project page

Create a file to store your factories in features/support/factories.rb and copy the following.

1
2
3
4
5
6
7
8
9
10
    require 'factory_girl'
 
    Factory.define(:owner) do |o|
      o.name "John Doe"
      o.email "john@example.org"
      o.address "123 Elm Street"
      o.city "Phoenix"
      o.state "Arizona"
      o.zip "85000"
    end

When we use the factory to create our object, Factory Girl will take the attribute hash that we’ve passed in when creating the object and use those values for the object. Factory Girl will also fill in any blank attributes with the defaults we’ve specified in the factory definition. The record that is created will actually exist in the database, it is not a mock and it does not have any stubs, as far as our Rails app is concerned it is just a regular record in the database.

1
2
3
4
5
6
7
    Given /^the following owners:$/ do |table|
      table.hashes.each do |attributes|
        # {:name => "Clayton", :email => "clayton@example.org", 
        :address => "100 Cactus Rd"}
        Factory.create(:owner, attributes)
      end
    end

Another place we could use our Given the following owners step would be when creating a feature that deals with creating Pet records. An owner has_many pets while a pet belongs_to an owner. When we create a pet record, we need an existing owner record to which we’ll link the pet record. Start by creating a new feature file, features/manage_pets.feature.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
    Scenario: Creating a pet
      Given the following owners:
       | name    | email               |
       | Clayton | clayton@example.org |
      Given I am on the homepage
      And I follow "Pets"
      And I follow "New pet"
      When I fill in "Name" with "Bruno"
      And I select "Dog" from "Species"
      And I select "March 19th, 2005" as the "DOB" date
      And I select "Clayton (clayton@example.org)" from "Owner"
      When I press "Create"
      Then I should see "Bruno was successfully created."
      And I should be on the pets index page
      And a pet named "Bruno" should be owned by an owner named "Clayton"

In this example we have re-used our Given the following owners table. We have also made use of some new Webrat steps like I select for interacting with select lists and I select [DATE] as the [DATE LABEL] date for selecting a date from Rails’ date_selector helper. Finally, there is a custom step that we are going to use to make sure that the connection between our newly created pet and the existing owner is in place.

Let’s take a look at the step definition for our custom step above.

1
2
3
4
 
    Then /^a pet named "([^"]*)" should be owned by an owner named "([^"]*)"$/ do |pet_name, owner_name|
      Pet.find_by_name(pet_name).owner.name.should == owner_name
    end

When you place something in double quotes in your scenario steps, like “Bruno” and “Clayton” in our example, they are captured using regular expressions in the step definition. Cucumber then passes the matched values along so that then can be used in your assertion. We can find the pet based on the pet_name and make sure that the owner linked to this pet via the belongs_to association is the same as the owner_name we specified in our scenario. This is an example of how Cucumber can be used, with RSpec matchers, to make an assertion that has nothing to do with inspecting the DOM of a resulting webpage.

Note: Beware of the Conjunction Steps Anti-pattern

Scenario Outlines

Cucumber provides a very simple way to test multiple different situations with a single scenario. These might be edge cases or just repetitive examples that don’t require their own scenario.

We have some business logic in our application that determines when an appointment, or Visitation, can be made with the Veterinarian:

  • Visitations less than three days from today require approval before being booked
  • No visitations can be booked more than six months in advance
  • No visitations can be booked in the past

We can easily represent this using a Cucumber Scenario Outline. Start by creating a new feature features/visitation_logic.feature. Also, because we’re going to be working with time and date specific business logic, this is a good time to point out that we can stub methods when using Cucumber, however, this practice is frowned upon and should only be used for things like dates and connecting to external APIs.

To add stubbing support add the following to your features/support/custom_env.rb file.

  require 'spec/stubs/cucumber'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
 
    Scenario Outline: visitations
      Given the following owners:
        | name    |
        | Clayton |
      And the following pets:
        | name  |
        | Bruno |
      Given today is "<today>"
      Given I am on the home page
      And I follow "Visitations"
      And I follow "New visitation"
      And I select "Bruno (Owned by Clayton)" from "Pet"
      And I select "<date>" as the "Appointment Date" date
      When I press "Create"
      Then I should see "<message>"
 
    Examples:
      | today      | date               | message                                           |
      | 2010-01-01 | January 2nd, 2010  | Visitation requires short notice approval.        |
      | 2010-01-01 | January 31st, 2010 | Visitation succesfully created.                   |
      | 2010-01-01 | November 2nd, 2010 | Visitations cannot be booked that far in advance. |
      | 2010-04-01 | January 2nd, 2010  | Visitations cannot be booked in the past.         |

Cucumber will go through the Scenario Outline you have created and substitute the values in <>’s with their corresponding value in the Examples table at the bottom. Cucumber goes through each line in the Examples table and runs the scenario using your specified values.

First, lets implement the step for stubbing our today’s date.

1
2
3
4
 
    Given /^today is "([^"]*)"$/ do |date|
      Date.stub(:today).and_return(Date.parse(date))
    end

In features/step_definitions/shared_steps.rb we can add the implementation for Given the following pets. We are cheating here because we’re not explicitly linking the Pet to the Owner in our scenario outline. This example isn’t so bad, but keep in mind that this association is not obvious when reading the scenario. This type of “behind the scenes” scenario setup is generally a bad idea as it obscures what’s really happening. Since we’ve already created the owner we can just find the first one in the database and create the pet records using the appropriate Rails’ association method.


    Given /^the following pets:$/ do |table|
      table.hashes.each do |attributes|
        Owner.first.pets.create(attributes)
      end
    end

The other steps in our scenario already exist, either because we created them, or because they come for free with the built in steps. In this one scenario we were able to test four different business logic outcomes. This test could be expanded to go beyond what is shown on the resulting page, perhaps to ensure that an Approval record was created for the short-notice appointment.

Advanced Cucumber Goodies

While you should now have an understanding of the basics of Cucumber, there are a number of other powerful features.

Tags

Cucumber allows you to “tag” your scenarios and features so that they can be run, not run or have special other tasks run before and after them. Out of the box you get the @wip tag (Work In Progress) which isn’t run as part of the normal rake cucumber:ok process.

Hooks

Using tags, you can run Before and After tasks that run some code before or after a given feature or scenario. You can even run these hooks before or after all of your features. You might need to do some cleanup of generated files in an After hook or some CPU expensive operation before a single scenario that you don’t want running before all of your scenarios.

Table Transformations

The tables that we have been creating use the exact model attribute names for their header rows. This is convenient for the developer, but makes the scenarios harder to understand. This is especially true if you have very strange or legacy attribute names like record_quote_ext_type__c. Using table transformations you can use “Record Extension Type” in your scenario table and map that to record_quote_ext_type__c in your step definition.

Calling Steps from Steps

It’s possible to take a multi-step process, expressed in Cucumber steps, and call them all at once from another step. You might have a few steps that describe logging in to a system (filling out login credentials, pressing a button etc.) You can easily put them all into one step, Given I am logged in, and then reference that elsewhere. This helps to reduce unnecessary repetition of steps and keep your scenarios to a manageable size.

Profiles

Cucumber allows you to specify a profile to use when running your features. By default Cucumber will not run any @wip tagged scenarios and the output will contain any failing files and the line number of the failure. However, if you wanted to run the @production tagged stories and output a nice HTML report, you could setup a new profile for that in config/cucumber.yml and use it when running your features.

Testing Javascript

You can even use Cucumber to test complex javascript. Tools like Celerity with Culerity or Selenium give you the ability go beyond the interactions provided by Webrat. A new addition to Cucumber is Capybara support which aims to unify the language used in steps so that you can use Webrat, Selenium and Celerity side-by-side.

Use Cucumber Table Transformations To Build Objects

Using Cucumber’s Table Transformations, you can easily build complex objects in a way that’s easy to read and understand for clients and developers alike.

My latest favorite feature of Cucumber are Table Transformations. I frequently use tables to build up complex objects and I’ve found that the regular old tables can be a little ugly, especially when your attribute names don’t make much sense on their own. I’ve also noticed that building up associations can be a little wonky, usually requiring more steps than seem necessary.

Conventional Table Usage

Let’s look at an example of how we could use a table, without transformations, to build up some objects for our scenario.

1
2
3
4
5
Scenario: Editing a Spirit
  Given I have a spirit with the following attributes:
    | spirit_type    | country_of_origin | age | brand        | lgcy_prod_sku | name       |
    | Scotch Whisky  | Scotland          | 12  | The Balvenie | SC38181       | DoubleWood |
    | Scotch Whiskey | Scotland          | 12  | The Macallan | SC38245       |            |
1
2
3
4
5
Given /^I have a spirit with the following attributes:$/ do |table|
  table.hashes.each do |attributes|
    Spirit.create!(attributes)
  end
end

Be sure to use create! in your tests to prevent false positives (Thanks Aslak!)

Now this is all fairly simple, and it looks pretty easy to implement, but I see some problems. First, what if you were actually friends with your DBA (bear with me) and you knew better than to have an attribute in your model like country_of_origin or spirit_type. Chances are those are going to be used by many other records and should be pulled out and made into their own Models, Country and SpiritType respectively.1

So what does our scenario look like with those two new models?

1
2
3
4
5
6
7
8
9
10
11
Scenario: Editing a Spirit
  Given I have a country with the following attributes:
    | name     | continent |
    | Scotland | Europe    |
  And I have a spirit type with the following attributes:
    | name           |
    | Scotch Whiskey |
  And I have a spirit with the following attributes:
    | age | brand        | lgcy_prod_sku | name       |
    | 12  | The Balvenie | SC38181       | DoubleWood |
    | 12  | The Macallan | SC38245       |            |

It’s a little more complex, for sure, but it’s not totally unmanageable. However, the key part that’s missing is how to link the two spirits with their spirit types and countries of origin.

You could add some more steps, but then you’ve got a conjunction step which is inflexible and brittle.

1
  And the spirit named "The Balvenie" is from "Scotland" and is a "Scotch Whiskey"

You could go back to your original step, and try to do some behind-the-scenes stuff to map country_of_origin to the correct country_id, but that gets messy too.

Transform Your Tables

The first step to making good use of table transformations is to make your tables more readable. Start by change the header row of your table to use meaningful representations of the real attribute names.

1
2
3
4
Given I have a spirit with the following attributes:
  | Spirit Type    | Country  | Age | Brand        | Legacy Product Code | Name       |
  | Scotch Whisky  | Scotland | 12  | The Balvenie | SC38181             | DoubleWood |
  | Scotch Whiskey | Scotland | 12  | The Macallan | SC38245             |            |

We’ve turned that weird lgcy_prod_sku attribute into something that your Product Owner can make sense of and we’ve be able to add Spirit Type and Country of Origin back to the table. Now let’s look at the transformation that makes this all work.

1
2
3
4
5
6
7
8
9
10
11
12
Transform /^table:Spirit Type,Country,Age,Brand,Legacy Product Code,Name$/ do |table|
  table.hashes.map do |hash|
    spirit_type = SpiritType.create!({:name => hash["Spirit Type"]})
    country = Country.create!({:name => hash["Country"]})
    spirit = Spirit.create!({:age => hash["Age"],
                            :brand => hash["Brand"],
                            :lgcy_prod_sku => hash["Legacy Product Code"],
                            :name => hash["Name"]})
 
    {:spirit_type => spirit_type, :country => country, :spirit => spirit}
  end
end

The transformation step definition looks a lot like a regular table step definition. There is a regular expression, like anything else in Cucumber, that has the same values as the header row in the table from our scenario. Just like the table step definition we have a table object which is just an array of hashes. We can go through each hash, do the actual transformation, and then return something to our table step definition. We are using map (same as collect) to return an array of hashes, which is just what the table step definition is expecting.

You will also see that we’re creating three different records, which we are returning in the hash we create at the end. Let’s go through those step-by-step:

  1. Create a spirit type object from the hash["Spirit Type"] value
  2. Create a country object from the hash["Country"] value
  3. Create a spirit object from the several related hash values
  4. Put all of our created objects into a hash

While we’ve created the objects, we still need to create the associations. I like to leave this for the table step definition rather than the transformation since I think it’s more obvious what’s going on with the values when you’re viewing the table step.

1
2
3
4
5
6
7
Given /^I have a spirit with the following attributes:$/ do |table|
  table.each do |group|
    spirit = group[:sprit]
    associations = {:country => group[:country], :spirit_type => group[:spirit_type]}
    spirit.update_attributes(associations)
  end
end

There are at least a dozen ways to get the spirit associated with the country and spirit type, so don’t feel like you have to follow this pattern every time. Since we’ve sent our table an array of hashes we can iterate over each hash, group, and work with the individual rows. Here’s how:

  1. Extract the spirit object from the hash
  2. Create another hash with the country and spirit type that Rails can make sense of
  3. Use update_attributes to update the spirit object with the new associations

Transformation Tradeoffs

We’ve been able to take our original multi-step scenario and simplify it to a single step. We are using the proper place, the step definitions, to do the associations and we have made our scenario much easier to read for non-developers working on the project. But what did we give up?

The biggest issue I’ve found with using table transformations is that they can be inflexible when you need to add more attributes to your dynamically created object. If you are writing features, using your table to setup objects and then realize that you need to add another attribute, you’re going to have to edit your table transformation step and how you create objects from the hash. When you take this a step further and try to have two different table definitions, you’ll be looking at having two nearly identical table transformations.2

If you’re not already using regular old Cucumber tables to create objects, use this guide to get started. If you are using Cucumber tables to create objects, try to re-factor one of your scenarios and use the table transformation strategy. Once you start using Cucumber tables and table transformations you’ll instantly improve the readability, portability and efficiency of your steps.

1 Ignore for now the issues with spirit_type and Rails Single Table Inheritance

2 I’m guessing that there is some way you can get around this with regular expressions and to have more flexible transformation table steps, but I haven’t tried it yet.

Do You Make This Common Mistake When Estimating?

There’s a common mistake that many software developers make when estimating projects. Here’s how you can avoid falling into this trap.

When estimating a project using the Planning Poker method, many developers like to use a baseline estimate for a given task. For example, many developers use CRUD, the creating, displaying, editing and deleting of a Model as their baseline estimate. Once they’ve got their baseline in mind, it makes it easier to estimate other stories that are more domain specific, or so it seems.

Baseline Estimates are Broken

When you’re using a task like CRUD as a baseline for your estimations, you can easily skew the estimations of the other stories in the project. Let’s say we’re using a 3 point baseline for CRUD stories.

  1. As a user I should be able to upload a profile photo – 2
  2. As a user I should be able to CRUD movies I’ve seen – 3
  3. As a user I should be able to send a private message to another user – 5
  4. As a user I should be able to create a trivia quiz for a movie that I’ve seen – 8

In this first example, the stories are probably estimated fairly well and compared to each other, the complexity is quite relative. What happens if we add a few more easy stories or a few more difficult stories?

  1. As a user I should be able to see a contact e-mail on the home page – 1
  2. As a user I should be able download the menu as a PDF - 1
  3. As a user I should be able to read a privacy policy – 2
  4. As a user I should be able to CRUD restaurant reviews – 3

With easier stories added, the CRUD story is definitely the most complex, but compared to the others it is significantly more complex than seeing a link on a page or viewing some text.

  1. As a user I should be able to CRUD portfolio photos – 3
  2. As a user I should be able to signup for a paid recurring account – 8
  3. As a user I should be able to make connections with other users via Facebook – 13
  4. As a user I should be able to send a rocket to the moon – !

When the other stories become much more complex, the CRUD task is again shown to be significantly different in complexity, this time in the other direction. In this group of stories its hard to imagine that managing portfolio photos is only two orders of magnitude away from a recurring payment e-commerce system.

Relative Complexity Works

Its important to estimate a group of stories so that the complexity of each story is relative to the next. In Mike Cohn’s Agile Estimating and Planning, he describes a method of estimating stories called “Analogy”.

When estimating by analogy, the estimator compares the story being estimated with one or more other stories. If the story is twice the size, it is given an estimate twice as large.

When you apply this technique to your estimation process, you will have a more coherent set of estimates. From this you will be more likely to determine an accurate estimated velocity and you will have a better overall sense for the scope of the project.

Hurdles to Estimating by Analogy

  • Estimating all of your stories one by one makes it difficult to estimate a relative complexity as you only have the previously estimated stories with which to compare your current story. A group of especially simple or complex stories could be waiting towards the end of the sessions.
  • Estimators who are new to the agile estimation process might have difficultly estimating a story without a point of reference.
  • Estimating with a baseline can be a difficult habit to break since it provides a convenience and familiarity to the estimator.
  • Two seemingly similar projects may have a greater than expected difference in total number of story points, even though their velocities are relatively the same.

The Secret to Awesome Agile Development

With a little hard work and my secret development ingredient, you can be a better Agile Developer

Recently my fellow developers at Integrum and I took a survey that helped us assess our team with regard to our Agile practices. When taking the survey, and now reviewing it later on, I was struck by how many of the questions were related to a single concept. Many of the problem areas that can be uncovered by the survey, along with examples of one’s successes, come back to this one theme.

Are programmers nearly always confident that the code they’ve written recently does what it’s intended to do?
Consider the following questions:
  • Is there more than one bug per month in the business logic of completed stories?
  • Can any programmer on the team currently build and test the software, and get unambiguous success / fail result, using a single command?
  • When a programmer gets the latest code, is he nearly always confident that it will build successfully and pass all tests?
  • Are fewer than five bugs per month discovered in the teamʼs finished work?
  • After a line item is marked “complete” do team members later perform unexpected additional work, such as bug fixes or release polish, to finish it?
  • Are programmers nearly always confident that the code they’ve written recently does what it’s intended to do?
  • Are all programmers comfortable making changes to the code?
  • Do programmers have more than one debug session per week that exceeds 10 minutes?
  • Do unexpected design changes require difficult or costly changes to existing code?
  • Do any programmers optimize code without conducting performance tests first?
  • Is there more than one bug per month in the business logic of completed stories?
  • Are any team members unsure about the quality of the software the team is producing?

What’s the common theme among these stories, and the secret to better agile development? Testing, testing and more testing.

The negative outcomes implied by some of these questions can be solved by testing. Spending time fixing “completed” stories? Probably something you could have tested. Conversely, the positive benefits implied by other questions can be had via testing. Want to make your code more inviting and easier to deal with for new team members or people unfamiliar with the project? Give them robust and well-written tests.

The 7 Bullshit Agile Estimation Problems

Estimating stories for an upcoming project is one of the more difficult tasks that agile teams have to perform. It’s never easy to determine how difficult it will be to implement a particular feature, especially when you’ve got different personalities, goals, and levels or experience in the same room. Unfortunately this all leads to people coming up with excuses and roadblocks which lead to inaccurate estimates.

I’ve identified seven problems pitfalls of the agile estimation process that I’m sure many other teams have experienced:

1) Estimating Time Rather Than Complexity

The point of estimating stories with planning poker cards is that you estimate based on the story’s complexity, not on how long it will take you to complete the actual feature. A story that’s an 8 is more complex than one that’s a 5, but it doesn’t meant that the 8 will take two days. It could take an hour or a week, it’s all relative.

Why it’s Bullshit
Estimates based on time make planning commitments difficult and velocities unreliable.

2) Not Estimating For “Simplest Possible Solution”

The “Simplest Possible Solution” is just that, what’s the simplest way that the feature described in the story can be implemented. When you get away from this you start going down all sorts of “what if” roads that always end up bumping up your estimate.

Why it’s Bullshit
Trying to guess what the product owner will want or what the completed feature entails is a waste of time.

3) I’ve Never Done That Before

There aren’t many software problems that haven’t been solved. Most of them are things that have been solved a thousand times in a hundred different ways. Adding complexity to a story because you’ve personally never solved that problem is shortsighted.

Why it’s Bullshit
Just because you’ve never done something doesn’t mean it’s complex. Lean on your team or network of developer peers to help solve these problems.

4) Estimating Stories In Excessive or No Isolation

Stories should be estimated in isolation, just as they should be written so as not to depend heavily on one another. However, developers will often try to assign too much complexity to a story because of assumed tasks or features that they think would accompany the story in a completed state. For example:

As a user I should be able to login
As a user I should be able to upload a profile photo
As a user I should be able to change my address

Some developers will see the first story and immediately think of the complexity of creating a user model, the controllers and views that go along with the entire registration process.

Alternatively some developers will see the last story and think “Oh I just need a form field for the e-mail address when the user is editing their profile.”

Why it’s Bullshit
The first extreme gives you chunks of related stories with too much padding that are never as complex individually as they are as a whole. The latter only (sometimes) works when the actual stories are extracted out into a bunch of very small stories, which has its own set of problems.

5) Gaming Velocities

If you’re looking to “guesstimate” how long a project will take to complete, you could grab some story cards and pick out what you think might be a week of work. If you add up the points assigned to those stories you’d have an estimated velocity. However, if you’ve padded your stories, or purposefully pick out a small number of stories, your project is going to appear much lengthier than it really is.

Why it’s Bullshit
You don’t look any better completing 50 points per iteration when you padded the hell out of your estimates than you do when you do 20 points per iteration with accurate estimates.

6) Always Assume The Worst!

There seems to be this mantra with some developers, “Always assume the worst!” When you come across a slightly vague story, let your imagination run wild and assume that the product owner is going to want the most complex solution possible.

Why it’s Bullshit
Remember, every story is a negotiation. You’re not going to know the exact details of the story until you have your planning meeting with the product owner. Often times the product owner would never have been able to dream up the solution on which you based your estimate.

7) Padding Padding Padding

Padding is all Bullshit
The problem of padding estimates creeps into nearly all of the above six issues. It introduces bad data early in the life of the project and makes every other step of the process unreliable. It’s almost always in an effort to cover one’s ass but it’s painfully transparent and reeks of amateurism.

Don’t pad your estimates.

Intel Developer Ignite #2

I had a blast presenting at the recent Intel Developer Ignite. In my quick five minute presentation I mapped ten of Aesop’s fables to modern day software engineering challenges and principles. If you missed the event, or just want to check out my presentation again, here it is!

Age-Old Solutions to Everyday Problems

Big thanks to Intel and everyone involved for putting on a great event, I really enjoyed it and can’t wait to do it again soon.