RSS

3D virtual gallery for Plone

Today we released version 1.0 of hexagonit.virtualgallery which provides a Flash based 3D gallery view for images. The meat of the package is a Flash viewer which renders images as framed paintings on the walls of a gallery. The viewer is configured using a JSON configuration and such has no direct Plone dependencies, you can easily use it in a Pyramid, Rails, or whatever web project.

The original use case for the gallery was part of a larger art related project we are working on but due to its generic nature we decided to open source it and release it separately. By default the Plone integration in the package provides views for Folders and Collections allowing you to easily control which images are placed in the gallery. It is also very easy to integrate the gallery in custom use cases where more control over the images is needed.

Additionally, you can configure the viewer on a gallery (context) basis by choosing which image scale should be used within the gallery. This is particularly useful if you are storing larger images (for example digital originals) but wish to make the gallery load up faster using a smaller scale for it.

We’d be interested in hearing out any feedback or ideas about the package!

 
4 Comments

Posted by on August 26, 2011 in plone, python

 

Test coverage analysis

I enjoy writing tests for my code for the obvious reasons: they build my confidence on the correct functionality of the software and also tend to drive the design in a more readable and well-structured direction. However, until recently I had been limited to performing only statement coverage analysis on my tests which is why I got very excited when I was able to start tracking both branch coverage and condition coverage in our recent projects.

Below is a short introduction to the different types of coverage analysis you can perform with currently available tools.

Set up

We’ll start with a virtual environment to contain the example package and run the tests. We’ll also install Nose, coverage and instrumental in the environment.

$ virtualenv-2.6 analysis
New python executable in analysis/bin/python2.6
Also creating executable in analysis/bin/python
Installing distribute......................done.

$ cd analysis
$ ./bin/pip install nose coverage instrumental

Inside the virtualenv we have a Python package called “example” with two modules: lib.py and tests.py. The lib.py module contains the following function that we will test and the tests.py will contain the test cases.

def func(a, b):
    value = 0
    if a or b:
        value = value + 1
    return value

Although the function is very simple it will allow us to demonstrate the different coverage analysis tools.

Statement coverage

Statement coverage is probably the simplest of the three and its goal is to keep track of the source code lines that get executed during the test run. This will allow us to spot obvious holes in our test suite. We’ll add the following test function in the tests.py module.

def test1():
    from example.lib import func
    assert func(True, False) == 1

With both nose and coverage installed in the virtualenv we can run the tests with statement coverage analysis with

$ ./bin/nosetests -v --with-coverage example
example.tests.test1 ... ok

Name          Stmts   Miss  Cover   Missing
-------------------------------------------
example           0      0   100%
example.lib       5      0   100%
-------------------------------------------
TOTAL             5      0   100%
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

As we can see above this single test managed to achieve 100% statement coverage in our example package. Next, let’s add branch analysis in the mix.

Branch coverage

The purpose of branch coverage analysis is to keep track of the logical branches in the executing of the code and to indicate whether some logical paths are not executed during the test run. Even with 100% statement coverage is rather easy to have less than 100% branch coverage.

With nose there is unfortunately no command-line switch we can use to activate branch coverage tracking so we will create a .coveragerc file in the current directory to enable it. The .coveragerc file contains the following

[run]
branch = True

In our function we have a logical branch (the if-statement) and currently our tests only exercise the True-path as can be seen when we run the tests with branch coverage enabled.

$ ./bin/nosetests -v --with-coverage example
example.tests.test1 ... ok

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
example           0      0      0      0   100%
example.lib       5      0      2      1    86%
---------------------------------------------------------
TOTAL             5      0      2      1    86%
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

The output tells us that in example.lib we have one partial branch (BrPart) which reduces the coverage in that module to 86% in this case. We’ll now add another test cases in tests.py which exercises the False-path of the if-statement.

def test1():
    from example.lib import func
    assert func(True, False) == 1

def test2():
    from example.lib import func
    assert func(False, False) == 0

Rerunning the tests with branch coverage tracking will show that we’ve now covered all logical branches.

$ ./bin/nosetests -v --with-coverage example
example.tests.test1 ... ok
example.tests.test2 ... ok

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
example           0      0      0      0   100%
example.lib       5      0      2      0   100%
---------------------------------------------------------
TOTAL             5      0      2      0   100%
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

At this point things are looking much better. We have 100% statement and 100% branch coverage in our tests. There is still one part of our function which is not fully covered by our tests which is the compound boolean expression in the if-statement. For this we need condition coverage analysis.

Condition coverage

The purpose of condition coverage analysis is to track the execution paths taken while evaluating (compound) boolean expressions.

At the logical branch level our if-statement can take one of two logical paths which we already have tests for. However, this decision on the branch is only taken once the compound boolean expression has been evaluated. Within a boolean expression the computation may take up to 2^n possible paths (because of Python’s short circuiting semantics the number of possible paths is actually less). These possible paths are probably easiest to think about using truth tables which show all the possible combinations. For our two part, “a or b“, expression we can write the following truth table

a b a or b
False False False
False True True
True False True
True True True

Because in Python and and or are short-circuit operators (meaning their arguments are evaluated from left to right, and evaluation stops as soon as the outcome is determined) the (True, False) and (True, True) lines in our truth table are equivalent which reduces the truth table to three possible logical paths. Looking at the current test code we can see that even with 100% statement and 100% branch coverage we are missing an execution path in our function. We can verify this by using instrumental to run our tests which keeps track of conditions and shows the missing lines in our truth table.

$ ./bin/instrumental -rs -t example ./bin/nosetests example -v --with-coverage
example.tests.test1 ... ok
example.tests.test2 ... ok

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
example           0      0      0      0   100%
example.lib       5      0      2      0   100%
---------------------------------------------------------
TOTAL             5      0      2      0   100%
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK
example.lib: 4/5 hit (80%)

-----------------------------
Instrumental Coverage Summary
-----------------------------

example.lib:3 < (a or b) >

T * ==> True
F T ==> False
F F ==> True

We can see the output of instrumental at the bottom. For each boolean expression instrumental prints the location and the expression followed by the corresponding truth table. The truth table contains the possible values for the expression followed by “==> True” if the corresponding logical path was executed and “==> False” if not. In the above we can see that our current tests exercise the (True, *) and (False, False) combinations but the (False, True) case is missing. instrumental denotes the short-circuited case with an asterisk (T *) meaning that the second condition was not executed at all.

We now add a third test case to exercise the missing path.

def test1():
    from example.lib import func
    assert func(True, False) == 1

def test2():
    from example.lib import func
    assert func(False, False) == 0

def test3():
    from example.lib import func
    assert func(False, True) == 0

and rerun the tests

$ ./bin/instrumental -rs -t example ./bin/nosetests example -v --with-coverage
example.tests.test1 ... ok
example.tests.test2 ... ok
example.tests.test3 ... ok

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
example           0      0      0      0   100%
example.lib       5      0      2      0   100%
---------------------------------------------------------
TOTAL             5      0      2      0   100%
----------------------------------------------------------------------
Ran 3 tests in 0.002s

OK
example.lib: 5/5 hit (100%)

-----------------------------
Instrumental Coverage Summary
-----------------------------

Now we’ve finally managed full statement, branch and condition coverage on our function!

Conclusions

Having good tests and even 100% statement coverage is very good but it should only be considered the beginning and not the final goal in any project.  With existing tools it is possible to analyze and improve test coverage with minimal effort.

Neither coverage nor instrumental is dependent on nose or any particular test runner so you should be able to use them in a variety of environments. For Zope/Plone development I can particularly recommend coverage over z3c.coverage. With coverage you can also generate statistics in XML format (for both statement and branch coverage) which can be monitored and tracked in systems such as Jenkins.

For me condition coverage analysis was the most interesting technique of the three mostly because I was already familiar with the other two. Even before using coverage to automatically track branch coverage it was part of my test writing process to manually review the code in terms of the logical branches to make sure they were covered by tests. However, having an automated tool to do that is a big benefit. The instrumental package is still in development but in the cases I’ve used it it has done its job well and revealed interesting holes in our tests. If you’re aware of other tools that provide condition coverage analysis I’d be interested in learning about them.

 
7 Comments

Posted by on May 7, 2011 in python, software engineering

 

Tags: , ,

BaseHTTPServer.BaseHTTPRequestHandler wastes TCP packets

While working on our first customer project using Pyramid I stumbled on a curious problem when setting up HAProxy to load balance requests among the backends. I had configured HAProxy  to use layer 7 health checks to make sure that the applications were correctly responding to HTTP requests. For some reason I was getting a lot of false negatives indicating that the backend servers were unavailable when in fact they were functioning properly. This lead me to inspect the network traffic between HAProxy and the application servers.

I had the following simple view in my application to respond to the HAProxy health checks

def ping(request):
  return Response('pong', content_type='text/plain')

which simply returns the string “pong” with a default set of HTTP headers. While inspecting the network traffic using Wireshark I noticed that this simple response was split into multiple TCP packets even though it could have easily fit in a single one. Additionally, it seemed that each HTTP header was sent in a separate TCP packet. Splitting the health check response into multiple packets was the reason behind the HAProxy problem because it caused HAProxy sometimes to truncate the response (I also found similar reports). After learning about the cause of the failing health checks I set out to find why exactly the HTTP headers were split into separate TCP packets.

Starting from paste.httpserver (which I was using to run the application) I was able track to problem down to BaseHTTPServer.BaseHTTPRequestHandler. The reason why the HTTP response is split into so many TCP packets originates from SocketServer.StreamRequestHandler which BaseHTTPRequestHandler inherits from. This is one of the convenience classes that provides a file-like API on top of a socket connection. More specifically, it provides two instance variables self.rfile and self.wfile which are file-like objects for reading from and writing to the connected socket, respectively. The comments in the StreamRequestHandler class contain the following


# Default buffer sizes for rfile, wfile.
# We default rfile to buffered because otherwise it could be
# really slow for large data (a getc() call per byte); we make
# wfile unbuffered because (a) often after a write() we want to
# read and we need to flush the line; (b) big writes to unbuffered
# files are typically optimized by stdio even when big reads
# aren't.
rbufsize = -1
wbufsize = 0

The important part here is the buffering mode for the wfile object which is set to unbuffered. This results in each call to self.wfile.write() to send the data immediately. For a “chatty” application where the connected parties exchange messages frequently in alternating fashion this makes sense. However, for HTTP this assumption is suboptimal because in the common case the data transfer consists of a single exhange of information: the client sends a request and the application writes the response. Changing the wfile to use buffered I/O by setting

  wbufsize = -1

I can see in Wireshark that the HTTP response is contained in a single TCP packet.

In case the body of the HTTP response is small there can be considerable overhead in sending the response in multiple TCP packets compared to a single packet. I wanted to benchmark this to see what the difference is between the two buffering modes. I set up the following environment

$ virtualenv-2.6 tcptest
$ cd tcptest
$ ./bin/easy_install Paste

and used the following script to run a simple WSGI app that returns 15 HTTP headers and a trivial body.

$ cat tcptest.py
def simple_app(environ, start_response):
    status = '200 OK'
    headers = [
        ('Content-type', 'text/plain'),
        ('Content-length', '4'),
        ('Server', 'paste.httpserver'),
        ('Date', 'Wed, 23 Feb 2011 15:17:48 GMT'),
        ('Last-Modified', 'Wed, 23 Feb 2011 11:15:06 GMT'),
        ('Etag', '"13cc73a-13591-49cf135880280"'),
        ('X-Foo1', 'bar1'),
        ('X-Foo2', 'bar2'),
        ('X-Foo3', 'bar3'),
        ('X-Foo4', 'bar4'),
        ('X-Foo5', 'bar5'),
        ('X-Foo6', 'bar6'),
        ('X-Foo7', 'bar7'),
        ('X-Foo8', 'bar8'),
        ('X-Foo9', 'bar9'),
        ]
    start_response(status, headers)
    return ['pong']

if __name__ == '__main__':
    import sys
    from paste import httpserver
    if sys.argv[1].strip() == 'buffered':
        print "Using buffered I/O for writing."
        httpserver.WSGIHandler.wbufsize = -1
    else:
        print "Using unbuffered I/O for writing (default)"
    httpserver.serve(simple_app, host=sys.argv[2], port=sys.argv[3])

To benchmark the difference I started the script using both unbuffered and buffered I/O and ran Apache benchmark (ab) against it. I used a single thread to run 5000 requests against the script and measured the requests per second the server achieved.

Unbuffered I/O

$ ./bin/python tcptest.py unbuffered 192.168.0.1 8000
Using unbuffered I/O for writing (default)
serving on http://192.168.0.1:8000

$ ab -c1 -n 5000 http://192.168.0.1:8000/ping
...
Requests per second:    1036.11 [#/sec] (mean)

Buffered I/O

$ ./bin/python tcptest.py buffered 192.168.11.76 8009
Using buffered I/O for writing.
serving on http://192.168.0.1:8000
$ ab -c 1 -n 5000 http://192.168.0.1:8000/ping
...
Requests per second:    1893.12 [#/sec] (mean)

The absolute numbers are specific to my setup (MacbookPro) and not very interesting but the relative difference in the number of requests per second is quite significant. This is especially the case for small requests where the number of HTTP headers dominate over the response body size.

All implementations that inherit from BaseHTTPServer.BaseHTTPRequestHandler without modifying the write buffering will suffer from this issue. These include at least paste.httpserver and SimpleHTTPServer in the standard library. The wsgiref implementation in the standard library has the same underlying issue but does not suffer from it to the same degree due to the way it handles writing of the HTTP headers. paste.httpserver iterates over the HTTP headers and calls .write() on each header whereas wsgiref (actually wsgiref.headers.Headers) builds a string containing (most of) the headers that is sent using a single .write().

Recent HAProxy releases should work better with backends that split the response in multiple packets but considering the increase in performance it may still be useful to change the buffering mode in Python HTTP servers that have this issue.

 
Leave a comment

Posted by on April 1, 2011 in python

 

Tags: , , ,

Pro Git: Professional version control

Scott Chacon has written a new book on git  called “Pro Git: profession version control” which is freely available at http://progit.org/ and licensed under Creative Commons Attribution-Non Commercial-Share Alike 3.0 license.

Make sure to check it out if you’re in need for some additional git-fu.

 
Leave a comment

Posted by on July 30, 2009 in git, software engineering

 

Using Araxis Merge with Git

I wanted to use the great Araxis Merge tool as a helper to solve merge conflicts with Git but currently it is not supported out-of-the-box. Luckily new commands can be configured by hand but a quick Google search didn’t turn up anything I could have simply copy-pasted to get it working. So here goes..

I assume that you’ve got Araxis Merge installed including the binaries that are located in the “Utilities” directory in the distribution. It doesn’t matter where you place the binaries as long as they are available. I put them under /usr/local/bin on my Mac.

In case of a merge conflict there are two possible scenarios: one, in which a common base version exists and second, where it does not exist. These scenarios require the use of a three-way-diff or a two-way-diff operation, respectively. The command line options for Araxis Merge require that we know in advance which scenario we are facing so I had to resolve to using a simple shell script wrapper that would make the appropriate call to the compare binary. The shell script I used is below.

#!/usr/bin/env bash

LOCAL=$1
REMOTE=$2
MERGED=$3
BASE=$4

MERGE=/usr/local/bin/compare

if [ -e "$BASE"  ]; then
    $MERGE -wait -merge -3 -a1 \
    -title1:"$MERGED (Base)" \
    -title2:"$MERGED (Local)" \
    -title3:"$MERGED (Remote)" \
    "$BASE" "$LOCAL" "$REMOTE" "$MERGED"
else
    $MERGE -wait -2 \
    -title1:"$MERGED (Local)" \
    -title2:"$MERGED (Remote)" \
    "$LOCAL" "$REMOTE" "$MERGED"
fi

To get it working I saved the shell script in /usr/local/bin/araxis-mergetool, made it executable and configured Git as follows

git config --global mergetool.araxis.cmd \
  'araxis-mergetool "$LOCAL" "$REMOTE" "$MERGED" "$BASE"'
git config --global merge.tool araxis

Now, when I get merge conflicts I can run git mergetool and Araxis Merge will be opened up in the proper mode with the conflicting files.

Unfortunately Araxis Merge and the compare binary do not appear to set the exit code of the process in a manner that Git would understand so after fixing up the conflict I may still need to tell Git whether the merge was successful or not.

 
2 Comments

Posted by on February 11, 2009 in git

 

Using a forward-proxy for direct access to production sites

At our company we use a fairly common production environment setup for our Plone sites. We’ve got Apache for virtual hosting, logging and SSL followed by Varnish for caching and Perlbal for  load-balancing finally backed up a farm of  Zope instances using a ZEO storage.

Each of our client sites are served by at least two different Zope instances so that we can perform rolling updates by bringing down, updating and restarting the instances one at a time without resulting in any downtime for our customers. While a Zope instance is down for maintenance we remove it from the load balancer pool so it won’t receive any requests during the maintenance period. Before returning the updated Zope instance back to the load balancer’s pool we need to be able to access it directly, mainly for two reasons:

  • to verify that the instance functions correctly and there are no regressions
  • to warm up various Zope level caches to avoid the first requests to pay the penalty of the cold start

It is also important that we can access the Zope instance without going through Varnish and the load balancer to make sure we are indeed seeing a response from the particular Zope instance instead of a cached copy.

Background

Originally we had set up a custom <VirtualHost> section in Apache that allowed us access to all sites from within a single domain name, something like https://zmi.mycompany.com/customer1 and https://zmi.mycompany.com/customer2. This worked out great in the past and allowed us to both verify easily that the instance was working properly and also to warm up the Zope memory caches. However, after warming up the customer sites and putting the instances back into the load balancer pool we started experimenting problems with inconsistent behaviour between the instances in the Zope farm with regard to links and page elements such as images. It seemed that links and images were pointing to our custom https://zmi.mycompany.com/ domain instead of the customer specific one they should have. Somehow the cache warm up process had broken up the live site.

It didn’t take long to find out that CacheFu and the in-memory Page Cache in particular were the culprit. The Page Cache caches the results of rendering a page thus persisting the links with the incorrect domain name. We could have simply purged the Page Cache after the warm up but since the goal was to warm up the caches (including the Page Cache) that did not feel right. To get around the issue we would need to access the particular Zope instance using the real customer domain name instead of our custom one. We chose to use a HTTP forward-proxy for this.

The main idea is that normal users accessing http://www.customer.com/ would be served through the normal production pipeline including Varnish and Perlbal and from the Zope instances active in the load balancer. However, when using the custom HTTP proxy we could use the same http://www.customer.com/ address but be served directly from the particular Zope instance bypassing the caching and load balancing. Since Apache provides all that we needed out of the box it was a simple choice to use it.

Configuring Apache to forward-proxy requests

We decided to implement the forward proxy configuration within a <VirtualHost> configuration. Requests coming to the customer domain would need to routed to a particular Zope instance but other requests would be proxied though to the outside world.  The configuration consisted mostly of common mod_rewrite rules but we still needed a mechanism to target a particular Zope instance from our farm.

A simple solution would have been to create a separate <VirtualHost> section for each Zope instance in the farm. However, a linear correlation on the number of Zope instances was not desirable as it would results in a large number of virtual host sections that would  be roughly 90% the same. Instead, we chose to do the following. Each physical backend machine running a number of  Zope instances would be handled by a single <VirtualHost> section and the virtual host’s ports would be mapped 1:1 to the Zope instances’ ports. In other words, we would use the port numbers to separate Zope instances within a single physical backend machine and then virtual hosts to separate the physical machines. This seemed like a good compromise.

The <VirtualHost> proxy configuration for a single backend machine running multiple Zope instances  would then look like this. We’ve called the proxy vhost backend1.proxy.mycompany.com and using 192.168.0.100 as the backend machine address.

<VirtualHost 1.2.3.4:*>
    ServerName backend1.proxy.mycompany.com
    ProxyRequests On
    <Proxy *>
        Order deny,allow
        Deny from all
        Allow from 4.5.6.7
    </Proxy>

    # Tell Apache to preserve the physical TCP port information so we
    # can map it directly to the Zope backends by reading
    # the %{SERVER_PORT} environment variable.
    UseCanonicalName Off
    UseCanonicalPhysicalPort On

    RewriteEngine On
    # Read the rewrite map from an external file. The file
    # provides a mapping from public host names to ZODB paths leading
    # to the corresponding Plone site roots.
    RewriteMap zope txt:/var/apache/proxy-rewrite-map.txt
    RewriteMap tolower int:tolower

    # Make sure we have a Host: header
    RewriteCond %{HTTP_HOST} !^$
    # Normalize the Host: header
    RewriteCond ${tolower:%{HTTP_HOST}|NONE} ^(.+)$
    # Lookup the hostname in our rewrite map
    RewriteCond ${zope:%1} ^(/.*)$
    # Finally, rewrite and proxy to the Zope backend. We map the proxy
    # ports directly to the Zope backend ports which allows a mechanism for
    # selecting a specific backend by choosing a matching port.
    RewriteRule ^proxy:http://[^/]*/?(.*)$

http://192.168.0.100:%{SERVER_PORT}/VirtualHostBase/

          http/${tolower:%{HTTP_HOST}}:80/%1/VirtualHostRoot/$1 [P,L]

</VirtualHost>

I have removed parts not relevant to this post, such as log file configuration, from the example above. Also the last RewriteRule is split on multiple lines but it should all be on a single line. The main points in the config are

  • use of “*” as the port in the <VirtualHost> node. This allows us to use a single section configuration for multiport access
  • use of <Proxy> section to limit access to the proxy. This is very important so that do not expose the proxy to the whole world. Only allow yourself access to it.
  • setting UseCanonicalName Off and UseCanonicalPhysicalPort On to make sure we get the actual port used when reading the %{SERVER_PORT} environment variable later in the rewrite rule
  • setting up an external rewrite map file that can be shared among <VirtualHost> sections.
  • performing conditional rewrites that proxy requests targeted to the customer domain to the particular Zope instance and let others through.

The external rewrite map file is a simple text file in which each line contains two values: first the customer domain name (which will matched against the normalized contents of the Host: HTTP header) followed by the ZODB path to the root of the corresponding Plone site. For example:

www.customer1.com  /customers/customer1
www.customer2.com  /customers/customer2

Now, assuming that we have two Zope instances running on machine 192.168.0.100 (the address used in the rewrite rule above) on ports 10001 and 10002 we can access them through the proxy by configuring the browser to use a HTTP proxy at backend1.proxy.mycompany.com on port 10001 or 10002 respectively.

We now have a situation where we can access a customer site, e.g. http://www.customer1.com/, and choose the particular Zope instance by configuring the HTTP proxy in our browser. This is all well and good and does achieve what we started out to do. However, manually configuring the proxy setup everytime is not fun and there is nothing to differentiate our use of the site in “normal” mode from the “backdoor” proxied mode. It is too easy to forget the proxy configuration on once the update has been finished. Luckily, there is a solution available that will make this a breeze.

Using FoxyProxy to manage proxy configurations

FoxyProxy is a Firefox extension that helps with managing multiple proxy configurations and makes switching between them quick and easy. It also shows the current proxy configuration in the status bar which makes it easy to see which backend we’re currently talking to (if you name you proxy configuration accordingly).

To continue our example, we would make two separate proxy configurations for accessing each one of our two backend Zope instances. The first one we could call “Backend #1 — Zope instance #1″ and use backend1.proxy.mycompany.com:10001 as the address and the other one “Backend #1 — Zope instance #2″ using backend1.proxy.mycompany.com:10002 as the address. Having the name of both the physical machine and the Zope instance in the proxy configuration helps to identify the particular Zope instance quickly.

With FoxyProxy configured it is now very easy to switch between accessing a site through the full production pipeline with caching or bypassing that and talking directly to a given Zope instance. Because the solution is generic we can take advantage of it with any HTTP client that is capable of using a proxy. It would now be very easy, for example, to do benchmarking with and without Varnish by simply switching between proxy configurations when using ab or another benchmarking tool.

 
Leave a comment

Posted by on January 1, 2009 in zope

 
 
Follow

Get every new post delivered to your Inbox.