No one likes to break unit tests. You get all stressed about it, feel like you’ve let your peers down, and sometimes even have to get everyone donuts the next day. Our production Python codebase is complex, and the smallest changes can have an unexpectedly large impact; this is only complicated by the fact that Python is a dynamic language, making it hard to figure out what code touches what.
Enter nose-knows
, a plugin for the nose
unit test runner (and py.test
, experimentally). It traces your code while unit tests are running, and figures out which files have been touched by which tests. Now, running your full test suite with code tracing turned on is expensive, so we have a daily Jenkins job that does it and creates an output file. It can also do the converse, as it knows how to leverage this file to run specific tests.
Setting it up is a breeze:
(risk)eyal-01575:risk eyal$ pip install nose-knows Downloading/unpacking nose-knows Downloading nose-knows-0.1.tar.gz Running setup.py egg_info for package nose-knows Installing collected packages: nose-knows Running setup.py install for nose-knows Successfully installed nose-knows Cleaning up...
Now, to create the nose-knows output file, .knows, you need to run the following command:
(risk)eyal-01575:src eyal$ nosetests --with-knows --knows-dir=eventbrite/risk --knows-out .............................................................. ---------------------------------------------------------------------- Ran 62 tests in 0.311s OK
The knows-dir
option there allows you to specify the name of your base directory, so it notes the paths as relative to that spot. This allows for a .knows
file to be more portable.
After running this command, the .knows
output file contains the following:
warehouse/src/process_data/mapping.py: src.tests.test_process_data_mapping:TestProcessDataMapping.test_mapping_works src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_load_function src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_load_function_delete_from_config src.tests.test_process_data_mapping:TestProcessDataMapping.test_mapping_creation src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_load_function_delete_from_kazoo src.tests.test_process_data_mapping:TestProcessDataMapping.test_single_mapping src.tests.test_process_data_mapping:TestProcessDataMapping.test_cannot_map src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test__load_with_no_checkin src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_table_watcher_delete src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test__load_with_checkin src.tests.test_process_data_denormalize:TestProcessDataDenormalize.test_table_watcher src.tests.test_process_data_mapping:TestProcessDataMapping.test_single_mapping_cannot_map
And, indeed, you can run those 12 tests by either supplying a relative path from the base directory, or the current directory:
(risk)eyal-01575:src eyal$ nosetests --with-knows --knows-dir=eventbrite/risk warehouse/src/process_data/mapping.py ............ ---------------------------------------------------------------------- Ran 12 tests in 0.011s OK (risk)eyal-01575:src eyal$ pwd /Users/eyal/eventbrite/risk/warehouse/src (risk)eyal-01575:src eyal$ nosetests --with-knows --knows-dir=eventbrite/risk process_data/mapping.py ............ ---------------------------------------------------------------------- Ran 12 tests in 0.011s OK
This can then be hooked up to something like git diff --name-only
to run unit tests for all recently modified files:
function grab_latest_knows_output() { NOW=`date +%s` if [ ! -f $KNOWS_FILE_TMP ] ; then curl --compressed $KNOWS_FILE_URL > $KNOWS_FILE_TMP else KNOWS_FILE_AGE=`stat -c %Y $KNOWS_FILE_TMP` if [ `expr $NOW - $KNOWS_FILE_AGE` -gt "86400" ] ; then curl --compressed $KNOWS_FILE_URL > $KNOWS_FILE_TMP else echo "Using latest knows output file." fi fi } function test_changed() { grab_latest_knows_output nosetests $KNOWS_FLAGS `git diff --name-only --cached origin | xargs` } function run_tests_for() { grab_latest_knows_output nosetests $KNOWS_FLAGS $@ }
The great strength of this approach lies in generating the .knows
file in Jenkins. By having a file ready already prepared, engineers can simply download that file and use it when needed. This cuts the amount of time a unit tests take to run by an order of magnitude or more, which means that relevant unit tests get run more often.
In the future, the hope is to integrate nose-knows
with webdriver to create a similar map for our system tests.
The nose-knows
plugin is available for you to install via pip
, and you can browse the source on github: https://github.com.eventbrite/nose-knows.