Smarter Unit Testing with nose-knows

No one likes to break unit tests. You get all stressed about it, feel like you’ve let your peers down, and sometimes even have to get everyone donuts the next day. Our production Python codebase is complex, and the smallest changes can have an unexpectedly large impact; this is only complicated by the fact that Python is a dynamic language, making it hard to figure out what code touches what.

Enter nose-knows, a plugin for the nose unit test runner (and py.test, experimentally). It traces your code while unit tests are running, and figures out which files have been touched by which tests. Now, running your full test suite with code tracing turned on is expensive, so we have a daily Jenkins job that does it and creates an output file. It can also do the converse, as it knows how to leverage this file to run specific tests.

Setting it up is a breeze:

(risk)eyal-01575:risk eyal$ pip install nose-knows
Downloading/unpacking nose-knows
Downloading nose-knows-0.1.tar.gz
Running setup.py egg_info for package nose-knows

Installing collected packages: nose-knows
Running setup.py install for nose-knows

Successfully installed nose-knows
Cleaning up...

Now, to create the nose-knows output file, .knows, you need to run the following command:

(risk)eyal-01575:src eyal$ nosetests --with-knows --knows-dir=eventbrite/risk --knows-out
..............................................................
----------------------------------------------------------------------
Ran 62 tests in 0.311s

OK

The knows-dir option there allows you to specify the name of your base directory, so it notes the paths as relative to that spot. This allows for a .knows file to be more portable.

After running this command, the .knows output file contains the following:

warehouse/src/process_data/mapping.py:
src.tests.test_process_data_mapping:TestProcessDataMapping.test_mapping_works
src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_load_function
src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_load_function_delete_from_config
src.tests.test_process_data_mapping:TestProcessDataMapping.test_mapping_creation
src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_load_function_delete_from_kazoo
src.tests.test_process_data_mapping:TestProcessDataMapping.test_single_mapping
src.tests.test_process_data_mapping:TestProcessDataMapping.test_cannot_map
src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test__load_with_no_checkin
src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_table_watcher_delete
src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test__load_with_checkin
src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_table_watcher
src.tests.test_process_data_mapping:TestProcessDataMapping.test_single_mapping_cannot_map

And, indeed, you can run those 12 tests by either supplying a relative path from the base directory, or the current directory:

(risk)eyal-01575:src eyal$ nosetests --with-knows --knows-dir=eventbrite/risk warehouse/src/process_data/mapping.py
............
----------------------------------------------------------------------
Ran 12 tests in 0.011s

OK
(risk)eyal-01575:src eyal$ pwd
/Users/eyal/eventbrite/risk/warehouse/src
(risk)eyal-01575:src eyal$ nosetests --with-knows --knows-dir=eventbrite/risk process_data/mapping.py
............
----------------------------------------------------------------------
Ran 12 tests in 0.011s

OK

This can then be hooked up to something like git diff --name-only to run unit tests for all recently modified files:

function grab_latest_knows_output() {
    NOW=`date +%s`
    if [ ! -f $KNOWS_FILE_TMP ] ; then
        curl --compressed $KNOWS_FILE_URL > $KNOWS_FILE_TMP
    else
        KNOWS_FILE_AGE=`stat -c %Y $KNOWS_FILE_TMP`
        if [ `expr $NOW - $KNOWS_FILE_AGE` -gt "86400" ] ; then
            curl --compressed $KNOWS_FILE_URL > $KNOWS_FILE_TMP
        else
            echo "Using latest knows output file."
        fi
    fi
}

function test_changed() {
    grab_latest_knows_output
    nosetests $KNOWS_FLAGS `git diff --name-only --cached origin | xargs`
}

function run_tests_for() {
    grab_latest_knows_output
    nosetests $KNOWS_FLAGS $@
}

The great strength of this approach lies in generating the .knows file in Jenkins. By having a file ready already prepared, engineers can simply download that file and use it when needed. This cuts the amount of time a unit tests take to run by an order of magnitude or more, which means that relevant unit tests get run more often.

In the future, the hope is to integrate nose-knows with webdriver to create a similar map for our system tests.

The nose-knows plugin is available for you to install via pip, and you can browse the source on github: https://github.com.eventbrite/nose-knows.