RSS

Tag Archives: python

Test coverage analysis

I enjoy writing tests for my code for the obvious reasons: they build my confidence on the correct functionality of the software and also tend to drive the design in a more readable and well-structured direction. However, until recently I had been limited to performing only statement coverage analysis on my tests which is why I got very excited when I was able to start tracking both branch coverage and condition coverage in our recent projects.

Below is a short introduction to the different types of coverage analysis you can perform with currently available tools.

Set up

We’ll start with a virtual environment to contain the example package and run the tests. We’ll also install Nose, coverage and instrumental in the environment.

$ virtualenv-2.6 analysis
New python executable in analysis/bin/python2.6
Also creating executable in analysis/bin/python
Installing distribute......................done.

$ cd analysis
$ ./bin/pip install nose coverage instrumental

Inside the virtualenv we have a Python package called “example” with two modules: lib.py and tests.py. The lib.py module contains the following function that we will test and the tests.py will contain the test cases.

def func(a, b):
    value = 0
    if a or b:
        value = value + 1
    return value

Although the function is very simple it will allow us to demonstrate the different coverage analysis tools.

Statement coverage

Statement coverage is probably the simplest of the three and its goal is to keep track of the source code lines that get executed during the test run. This will allow us to spot obvious holes in our test suite. We’ll add the following test function in the tests.py module.

def test1():
    from example.lib import func
    assert func(True, False) == 1

With both nose and coverage installed in the virtualenv we can run the tests with statement coverage analysis with

$ ./bin/nosetests -v --with-coverage example
example.tests.test1 ... ok

Name          Stmts   Miss  Cover   Missing
-------------------------------------------
example           0      0   100%
example.lib       5      0   100%
-------------------------------------------
TOTAL             5      0   100%
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

As we can see above this single test managed to achieve 100% statement coverage in our example package. Next, let’s add branch analysis in the mix.

Branch coverage

The purpose of branch coverage analysis is to keep track of the logical branches in the executing of the code and to indicate whether some logical paths are not executed during the test run. Even with 100% statement coverage is rather easy to have less than 100% branch coverage.

With nose there is unfortunately no command-line switch we can use to activate branch coverage tracking so we will create a .coveragerc file in the current directory to enable it. The .coveragerc file contains the following

[run]
branch = True

In our function we have a logical branch (the if-statement) and currently our tests only exercise the True-path as can be seen when we run the tests with branch coverage enabled.

$ ./bin/nosetests -v --with-coverage example
example.tests.test1 ... ok

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
example           0      0      0      0   100%
example.lib       5      0      2      1    86%
---------------------------------------------------------
TOTAL             5      0      2      1    86%
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

The output tells us that in example.lib we have one partial branch (BrPart) which reduces the coverage in that module to 86% in this case. We’ll now add another test cases in tests.py which exercises the False-path of the if-statement.

def test1():
    from example.lib import func
    assert func(True, False) == 1

def test2():
    from example.lib import func
    assert func(False, False) == 0

Rerunning the tests with branch coverage tracking will show that we’ve now covered all logical branches.

$ ./bin/nosetests -v --with-coverage example
example.tests.test1 ... ok
example.tests.test2 ... ok

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
example           0      0      0      0   100%
example.lib       5      0      2      0   100%
---------------------------------------------------------
TOTAL             5      0      2      0   100%
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

At this point things are looking much better. We have 100% statement and 100% branch coverage in our tests. There is still one part of our function which is not fully covered by our tests which is the compound boolean expression in the if-statement. For this we need condition coverage analysis.

Condition coverage

The purpose of condition coverage analysis is to track the execution paths taken while evaluating (compound) boolean expressions.

At the logical branch level our if-statement can take one of two logical paths which we already have tests for. However, this decision on the branch is only taken once the compound boolean expression has been evaluated. Within a boolean expression the computation may take up to 2^n possible paths (because of Python’s short circuiting semantics the number of possible paths is actually less). These possible paths are probably easiest to think about using truth tables which show all the possible combinations. For our two part, “a or b“, expression we can write the following truth table

a b a or b
False False False
False True True
True False True
True True True

Because in Python and and or are short-circuit operators (meaning their arguments are evaluated from left to right, and evaluation stops as soon as the outcome is determined) the (True, False) and (True, True) lines in our truth table are equivalent which reduces the truth table to three possible logical paths. Looking at the current test code we can see that even with 100% statement and 100% branch coverage we are missing an execution path in our function. We can verify this by using instrumental to run our tests which keeps track of conditions and shows the missing lines in our truth table.

$ ./bin/instrumental -rs -t example ./bin/nosetests example -v --with-coverage
example.tests.test1 ... ok
example.tests.test2 ... ok

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
example           0      0      0      0   100%
example.lib       5      0      2      0   100%
---------------------------------------------------------
TOTAL             5      0      2      0   100%
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK
example.lib: 4/5 hit (80%)

-----------------------------
Instrumental Coverage Summary
-----------------------------

example.lib:3 < (a or b) >

T * ==> True
F T ==> False
F F ==> True

We can see the output of instrumental at the bottom. For each boolean expression instrumental prints the location and the expression followed by the corresponding truth table. The truth table contains the possible values for the expression followed by “==> True” if the corresponding logical path was executed and “==> False” if not. In the above we can see that our current tests exercise the (True, *) and (False, False) combinations but the (False, True) case is missing. instrumental denotes the short-circuited case with an asterisk (T *) meaning that the second condition was not executed at all.

We now add a third test case to exercise the missing path.

def test1():
    from example.lib import func
    assert func(True, False) == 1

def test2():
    from example.lib import func
    assert func(False, False) == 0

def test3():
    from example.lib import func
    assert func(False, True) == 0

and rerun the tests

$ ./bin/instrumental -rs -t example ./bin/nosetests example -v --with-coverage
example.tests.test1 ... ok
example.tests.test2 ... ok
example.tests.test3 ... ok

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
example           0      0      0      0   100%
example.lib       5      0      2      0   100%
---------------------------------------------------------
TOTAL             5      0      2      0   100%
----------------------------------------------------------------------
Ran 3 tests in 0.002s

OK
example.lib: 5/5 hit (100%)

-----------------------------
Instrumental Coverage Summary
-----------------------------

Now we’ve finally managed full statement, branch and condition coverage on our function!

Conclusions

Having good tests and even 100% statement coverage is very good but it should only be considered the beginning and not the final goal in any project.  With existing tools it is possible to analyze and improve test coverage with minimal effort.

Neither coverage nor instrumental is dependent on nose or any particular test runner so you should be able to use them in a variety of environments. For Zope/Plone development I can particularly recommend coverage over z3c.coverage. With coverage you can also generate statistics in XML format (for both statement and branch coverage) which can be monitored and tracked in systems such as Jenkins.

For me condition coverage analysis was the most interesting technique of the three mostly because I was already familiar with the other two. Even before using coverage to automatically track branch coverage it was part of my test writing process to manually review the code in terms of the logical branches to make sure they were covered by tests. However, having an automated tool to do that is a big benefit. The instrumental package is still in development but in the cases I’ve used it it has done its job well and revealed interesting holes in our tests. If you’re aware of other tools that provide condition coverage analysis I’d be interested in learning about them.

Advertisements
 
7 Comments

Posted by on May 7, 2011 in python, software engineering

 

Tags: , ,

BaseHTTPServer.BaseHTTPRequestHandler wastes TCP packets

While working on our first customer project using Pyramid I stumbled on a curious problem when setting up HAProxy to load balance requests among the backends. I had configured HAProxy  to use layer 7 health checks to make sure that the applications were correctly responding to HTTP requests. For some reason I was getting a lot of false negatives indicating that the backend servers were unavailable when in fact they were functioning properly. This lead me to inspect the network traffic between HAProxy and the application servers.

I had the following simple view in my application to respond to the HAProxy health checks

def ping(request):
  return Response('pong', content_type='text/plain')

which simply returns the string “pong” with a default set of HTTP headers. While inspecting the network traffic using Wireshark I noticed that this simple response was split into multiple TCP packets even though it could have easily fit in a single one. Additionally, it seemed that each HTTP header was sent in a separate TCP packet. Splitting the health check response into multiple packets was the reason behind the HAProxy problem because it caused HAProxy sometimes to truncate the response (I also found similar reports). After learning about the cause of the failing health checks I set out to find why exactly the HTTP headers were split into separate TCP packets.

Starting from paste.httpserver (which I was using to run the application) I was able track to problem down to BaseHTTPServer.BaseHTTPRequestHandler. The reason why the HTTP response is split into so many TCP packets originates from SocketServer.StreamRequestHandler which BaseHTTPRequestHandler inherits from. This is one of the convenience classes that provides a file-like API on top of a socket connection. More specifically, it provides two instance variables self.rfile and self.wfile which are file-like objects for reading from and writing to the connected socket, respectively. The comments in the StreamRequestHandler class contain the following


# Default buffer sizes for rfile, wfile.
# We default rfile to buffered because otherwise it could be
# really slow for large data (a getc() call per byte); we make
# wfile unbuffered because (a) often after a write() we want to
# read and we need to flush the line; (b) big writes to unbuffered
# files are typically optimized by stdio even when big reads
# aren't.
rbufsize = -1
wbufsize = 0

The important part here is the buffering mode for the wfile object which is set to unbuffered. This results in each call to self.wfile.write() to send the data immediately. For a “chatty” application where the connected parties exchange messages frequently in alternating fashion this makes sense. However, for HTTP this assumption is suboptimal because in the common case the data transfer consists of a single exhange of information: the client sends a request and the application writes the response. Changing the wfile to use buffered I/O by setting

  wbufsize = -1

I can see in Wireshark that the HTTP response is contained in a single TCP packet.

In case the body of the HTTP response is small there can be considerable overhead in sending the response in multiple TCP packets compared to a single packet. I wanted to benchmark this to see what the difference is between the two buffering modes. I set up the following environment

$ virtualenv-2.6 tcptest
$ cd tcptest
$ ./bin/easy_install Paste

and used the following script to run a simple WSGI app that returns 15 HTTP headers and a trivial body.

$ cat tcptest.py
def simple_app(environ, start_response):
    status = '200 OK'
    headers = [
        ('Content-type', 'text/plain'),
        ('Content-length', '4'),
        ('Server', 'paste.httpserver'),
        ('Date', 'Wed, 23 Feb 2011 15:17:48 GMT'),
        ('Last-Modified', 'Wed, 23 Feb 2011 11:15:06 GMT'),
        ('Etag', '"13cc73a-13591-49cf135880280"'),
        ('X-Foo1', 'bar1'),
        ('X-Foo2', 'bar2'),
        ('X-Foo3', 'bar3'),
        ('X-Foo4', 'bar4'),
        ('X-Foo5', 'bar5'),
        ('X-Foo6', 'bar6'),
        ('X-Foo7', 'bar7'),
        ('X-Foo8', 'bar8'),
        ('X-Foo9', 'bar9'),
        ]
    start_response(status, headers)
    return ['pong']

if __name__ == '__main__':
    import sys
    from paste import httpserver
    if sys.argv[1].strip() == 'buffered':
        print "Using buffered I/O for writing."
        httpserver.WSGIHandler.wbufsize = -1
    else:
        print "Using unbuffered I/O for writing (default)"
    httpserver.serve(simple_app, host=sys.argv[2], port=sys.argv[3])

To benchmark the difference I started the script using both unbuffered and buffered I/O and ran Apache benchmark (ab) against it. I used a single thread to run 5000 requests against the script and measured the requests per second the server achieved.

Unbuffered I/O

$ ./bin/python tcptest.py unbuffered 192.168.0.1 8000
Using unbuffered I/O for writing (default)
serving on http://192.168.0.1:8000

$ ab -c1 -n 5000 http://192.168.0.1:8000/ping
...
Requests per second:    1036.11 [#/sec] (mean)

Buffered I/O

$ ./bin/python tcptest.py buffered 192.168.11.76 8009
Using buffered I/O for writing.
serving on http://192.168.0.1:8000
$ ab -c 1 -n 5000 http://192.168.0.1:8000/ping
...
Requests per second:    1893.12 [#/sec] (mean)

The absolute numbers are specific to my setup (MacbookPro) and not very interesting but the relative difference in the number of requests per second is quite significant. This is especially the case for small requests where the number of HTTP headers dominate over the response body size.

All implementations that inherit from BaseHTTPServer.BaseHTTPRequestHandler without modifying the write buffering will suffer from this issue. These include at least paste.httpserver and SimpleHTTPServer in the standard library. The wsgiref implementation in the standard library has the same underlying issue but does not suffer from it to the same degree due to the way it handles writing of the HTTP headers. paste.httpserver iterates over the HTTP headers and calls .write() on each header whereas wsgiref (actually wsgiref.headers.Headers) builds a string containing (most of) the headers that is sent using a single .write().

Recent HAProxy releases should work better with backends that split the response in multiple packets but considering the increase in performance it may still be useful to change the buffering mode in Python HTTP servers that have this issue.

 
Leave a comment

Posted by on April 1, 2011 in python

 

Tags: , , ,

SWF metadata parser

Recently I needed to be able to determine the dimensions of SWF (Flash animation) files so I could embed them properly on a web page but I couldn’t immediately find something useful with Google that would perform the task. I am aware of the Hachoir project, but it seemed a bit overkill for my simple use case and a quick try with hachoir-metadata failed to parse my particular SWF file.

Luckily the container section of the SWF file format (which contains the metadata) is rather simple and writing a parser for it turned out to be a nice distraction from my normal duties. The result is hexagonit.swfheader which is a minimal package (no dependencies outside the standard library) that provides a single function that parses SWF files and returns the metadata.

The package comes also with a console script that you can use on the command line to quickly introspect local SWF files. In a buildout you’ll need to use the zc.recipe.egg:scripts recipe to get the script installed.

 
3 Comments

Posted by on April 16, 2008 in software engineering

 

Tags: ,