Scott Chacon has written a new book on git  called “Pro Git: profession version control” which is freely available at http://progit.org/ and licensed under Creative Commons Attribution-Non Commercial-Share Alike 3.0 license.

Make sure to check it out if you’re in need for some additional git-fu.

I wanted to use the great Araxis Merge tool as a helper to solve merge conflicts with Git but currently it is not supported out-of-the-box. Luckily new commands can be configured by hand but a quick Google search didn’t turn up anything I could have simply copy-pasted to get it working. So here goes..

I assume that you’ve got Araxis Merge installed including the binaries that are located in the “Utilities” directory in the distribution. It doesn’t matter where you place the binaries as long as they are available. I put them under /usr/local/bin on my Mac.

In case of a merge conflict there are two possible scenarios: one, in which a common base version exists and second, where it does not exist. These scenarios require the use of a three-way-diff or a two-way-diff operation, respectively. The command line options for Araxis Merge require that we know in advance which scenario we are facing so I had to resolve to using a simple shell script wrapper that would make the appropriate call to the compare binary. The shell script I used is below.

#!/usr/bin/env bash

LOCAL=$1
REMOTE=$2
MERGED=$3
BASE=$4

MERGE=/usr/local/bin/compare

if [ -e "$BASE"  ]; then
    $MERGE -wait -merge -3 -a1 \
    -title1:"$MERGED (Base)" \
    -title2:"$MERGED (Local)" \
    -title3:"$MERGED (Remote)" \
    "$BASE" "$LOCAL" "$REMOTE" "$MERGED"
else
    $MERGE -wait -2 \
    -title1:"$MERGED (Local)" \
    -title2:"$MERGED (Remote)" \
    "$LOCAL" "$REMOTE" "$MERGED"
fi

To get it working I saved the shell script in /usr/local/bin/araxis-mergetool, made it executable and configured Git as follows

git config --global mergetool.araxis.cmd \
  'araxis-mergetool "$LOCAL" "$REMOTE" "$MERGED" "$BASE"'
git config --global merge.tool araxis

Now, when I get merge conflicts I can run git mergetool and Araxis Merge will be opened up in the proper mode with the conflicting files.

Unfortunately Araxis Merge and the compare binary do not appear to set the exit code of the process in a manner that Git would understand so after fixing up the conflict I may still need to tell Git whether the merge was successful or not.

At our company we use a fairly common production environment setup for our Plone sites. We’ve got Apache for virtual hosting, logging and SSL followed by Varnish for caching and Perlbal for  load-balancing finally backed up a farm of  Zope instances using a ZEO storage.

Each of our client sites are served by at least two different Zope instances so that we can perform rolling updates by bringing down, updating and restarting the instances one at a time without resulting in any downtime for our customers. While a Zope instance is down for maintenance we remove it from the load balancer pool so it won’t receive any requests during the maintenance period. Before returning the updated Zope instance back to the load balancer’s pool we need to be able to access it directly, mainly for two reasons:

  • to verify that the instance functions correctly and there are no regressions
  • to warm up various Zope level caches to avoid the first requests to pay the penalty of the cold start

It is also important that we can access the Zope instance without going through Varnish and the load balancer to make sure we are indeed seeing a response from the particular Zope instance instead of a cached copy.

Background

Originally we had set up a custom <VirtualHost> section in Apache that allowed us access to all sites from within a single domain name, something like https://zmi.mycompany.com/customer1 and https://zmi.mycompany.com/customer2. This worked out great in the past and allowed us to both verify easily that the instance was working properly and also to warm up the Zope memory caches. However, after warming up the customer sites and putting the instances back into the load balancer pool we started experimenting problems with inconsistent behaviour between the instances in the Zope farm with regard to links and page elements such as images. It seemed that links and images were pointing to our custom https://zmi.mycompany.com/ domain instead of the customer specific one they should have. Somehow the cache warm up process had broken up the live site.

It didn’t take long to find out that CacheFu and the in-memory Page Cache in particular were the culprit. The Page Cache caches the results of rendering a page thus persisting the links with the incorrect domain name. We could have simply purged the Page Cache after the warm up but since the goal was to warm up the caches (including the Page Cache) that did not feel right. To get around the issue we would need to access the particular Zope instance using the real customer domain name instead of our custom one. We chose to use a HTTP forward-proxy for this.

The main idea is that normal users accessing http://www.customer.com/ would be served through the normal production pipeline including Varnish and Perlbal and from the Zope instances active in the load balancer. However, when using the custom HTTP proxy we could use the same http://www.customer.com/ address but be served directly from the particular Zope instance bypassing the caching and load balancing. Since Apache provides all that we needed out of the box it was a simple choice to use it.

Configuring Apache to forward-proxy requests

We decided to implement the forward proxy configuration within a <VirtualHost> configuration. Requests coming to the customer domain would need to routed to a particular Zope instance but other requests would be proxied though to the outside world.  The configuration consisted mostly of common mod_rewrite rules but we still needed a mechanism to target a particular Zope instance from our farm.

A simple solution would have been to create a separate <VirtualHost> section for each Zope instance in the farm. However, a linear correlation on the number of Zope instances was not desirable as it would results in a large number of virtual host sections that would  be roughly 90% the same. Instead, we chose to do the following. Each physical backend machine running a number of  Zope instances would be handled by a single <VirtualHost> section and the virtual host’s ports would be mapped 1:1 to the Zope instances’ ports. In other words, we would use the port numbers to separate Zope instances within a single physical backend machine and then virtual hosts to separate the physical machines. This seemed like a good compromise.

The <VirtualHost> proxy configuration for a single backend machine running multiple Zope instances  would then look like this. We’ve called the proxy vhost backend1.proxy.mycompany.com and using 192.168.0.100 as the backend machine address.

<VirtualHost 1.2.3.4:*>
    ServerName backend1.proxy.mycompany.com
    ProxyRequests On
    <Proxy *>
        Order deny,allow
        Deny from all
        Allow from 4.5.6.7
    </Proxy>

    # Tell Apache to preserve the physical TCP port information so we
    # can map it directly to the Zope backends by reading
    # the %{SERVER_PORT} environment variable.
    UseCanonicalName Off
    UseCanonicalPhysicalPort On

    RewriteEngine On
    # Read the rewrite map from an external file. The file
    # provides a mapping from public host names to ZODB paths leading
    # to the corresponding Plone site roots.
    RewriteMap zope txt:/var/apache/proxy-rewrite-map.txt
    RewriteMap tolower int:tolower

    # Make sure we have a Host: header
    RewriteCond %{HTTP_HOST} !^$
    # Normalize the Host: header
    RewriteCond ${tolower:%{HTTP_HOST}|NONE} ^(.+)$
    # Lookup the hostname in our rewrite map
    RewriteCond ${zope:%1} ^(/.*)$
    # Finally, rewrite and proxy to the Zope backend. We map the proxy
    # ports directly to the Zope backend ports which allows a mechanism for
    # selecting a specific backend by choosing a matching port.
    RewriteRule ^proxy:http://[^/]*/?(.*)$
        http://192.168.0.100:%{SERVER_PORT}/VirtualHostBase/
          http/${tolower:%{HTTP_HOST}}:80/%1/VirtualHostRoot/$1 [P,L]

</VirtualHost>

I have removed parts not relevant to this post, such as log file configuration, from the example above. Also the last RewriteRule is split on multiple lines but it should all be on a single line. The main points in the config are

  • use of “*” as the port in the <VirtualHost> node. This allows us to use a single section configuration for multiport access
  • use of <Proxy> section to limit access to the proxy. This is very important so that do not expose the proxy to the whole world. Only allow yourself access to it.
  • setting UseCanonicalName Off and UseCanonicalPhysicalPort On to make sure we get the actual port used when reading the %{SERVER_PORT} environment variable later in the rewrite rule
  • setting up an external rewrite map file that can be shared among <VirtualHost> sections.
  • performing conditional rewrites that proxy requests targeted to the customer domain to the particular Zope instance and let others through.

The external rewrite map file is a simple text file in which each line contains two values: first the customer domain name (which will matched against the normalized contents of the Host: HTTP header) followed by the ZODB path to the root of the corresponding Plone site. For example:

www.customer1.com  /customers/customer1
www.customer2.com  /customers/customer2

Now, assuming that we have two Zope instances running on machine 192.168.0.100 (the address used in the rewrite rule above) on ports 10001 and 10002 we can access them through the proxy by configuring the browser to use a HTTP proxy at backend1.proxy.mycompany.com on port 10001 or 10002 respectively.

We now have a situation where we can access a customer site, e.g. http://www.customer1.com/, and choose the particular Zope instance by configuring the HTTP proxy in our browser. This is all well and good and does achieve what we started out to do. However, manually configuring the proxy setup everytime is not fun and there is nothing to differentiate our use of the site in “normal” mode from the “backdoor” proxied mode. It is too easy to forget the proxy configuration on once the update has been finished. Luckily, there is a solution available that will make this a breeze.

Using FoxyProxy to manage proxy configurations

FoxyProxy is a Firefox extension that helps with managing multiple proxy configurations and makes switching between them quick and easy. It also shows the current proxy configuration in the status bar which makes it easy to see which backend we’re currently talking to (if you name you proxy configuration accordingly).

To continue our example, we would make two separate proxy configurations for accessing each one of our two backend Zope instances. The first one we could call “Backend #1 — Zope instance #1″ and use backend1.proxy.mycompany.com:10001 as the address and the other one “Backend #1 — Zope instance #2″ using backend1.proxy.mycompany.com:10002 as the address. Having the name of both the physical machine and the Zope instance in the proxy configuration helps to identify the particular Zope instance quickly.

With FoxyProxy configured it is now very easy to switch between accessing a site through the full production pipeline with caching or bypassing that and talking directly to a given Zope instance. Because the solution is generic we can take advantage of it with any HTTP client that is capable of using a proxy. It would now be very easy, for example, to do benchmarking with and without Varnish by simply switching between proxy configurations when using ab or another benchmarking tool.

If you’re using Git and haven’t heard about or used GitHub, which is a Git hosting service with a social networking twist, then here’s your chance! Scott Chacon is doing a screencast series called “Insider guide to GitHub” for the Pragmatic Programmers. The first episode is free of charge and a great way to get introduced to the features provided by GitHub.

I just pushed in a new version of collective.buildbot to the PyPI. Some highlights of the new release are:

  • Support for PyFlakes checks
  • Refactored project and poller recipes supporting multiple repositories (previously supported by the projects and pollers variants which are now gone)
  • SVN pollers work again
  • Cygwin fixes

If you were using an earlier version you will need to update your buildout configuration to accommodate the changes in the recipe configuration options.

Some time ago Tarek Ziade started a project to make it easier to configure and set up a Buildbot environment using zc.buildout. During the Paris Plone sprint I helped Jean-Francois Roche and Gael Pasgrimaud to further improve upon this work and after the sprint the collective.buildbot project was released.

I recently took some time to polish up the package with proper documentation and examples that should make it easier to deploy it for your own projects and released the changes as version 0.2.0.

Setting up a buildbot environment is pretty easy, you create a buildout for the build master that is responsible for configuring all the projects and one or more buildouts for the build slaves. The Putting it all together section in the documentation gives you an overall picture how to accomplish this.

Hopefully this will encourage people to use buildbot to improve the quality of their software. There are already some public buildbots available, check out buildbot.ingeniweb.com or buildbot.infrae.com for example. Is your buildbot next?

UPDATE: There was a bug in the “Putting it all together” example, which is fixed in 0.2.1.

Recently I needed to be able to determine the dimensions of SWF (Flash animation) files so I could embed them properly on a web page but I couldn’t immediately find something useful with Google that would perform the task. I am aware of the Hachoir project, but it seemed a bit overkill for my simple use case and a quick try with hachoir-metadata failed to parse my particular SWF file.

Luckily the container section of the SWF file format (which contains the metadata) is rather simple and writing a parser for it turned out to be a nice distraction from my normal duties. The result is hexagonit.swfheader which is a minimal package (no dependencies outside the standard library) that provides a single function that parses SWF files and returns the metadata.

The package comes also with a console script that you can use on the command line to quickly introspect local SWF files. In a buildout you’ll need to use the zc.recipe.egg:scripts recipe to get the script installed.

Once again the annual Snowsprint hosted by Lovely Systems was a great experience. This was my second time to attend the sprint and I enjoyed it very much. The scenery at the Austrian alps is just amazing. I even managed to hold off catching the cold only after the sprint this time :)

Alternative indexing for Plone

This year I wanted to work on subjects that I’m not the most familiar with. On the first night I expressed interest in the alternative indexing topic proposed by Tarek Ziadé which lead us to work on an external indexing solution for Plone based on the Solr project. Enfold Systems had already started on working with Solr on a customer project and Tarek had arranged with Alan Runyan to collaborate on their work. Tom Groß joined us in our work and our first task was to produce a buildout that would give us a working Solr instance. We ended up creating two recipies to implement the buildout: collective.recipe.ant, which is a general purpose recipe for building ant based projects (kind of like hexagonit.recipe.cmmi for Java based projects, although you can use ant for non-Java projects just like make), and the Solr specific collective.recipe.solrinstance, which will create and configure a working Solr instance for instant use.

Enfold Systems had already a working implementation of a concept where the Plone search template (search.pt) was replaced by their own which implemented the search using only an external Solr indexing service. However, everything was still indexed in the portal_catalog as usual, so there was no gain in terms of ZODB size or indexing speed compared to a vanilla Plone site. Querying the Solr instance was of course extremely efficient which we verified using a JMeter based benchmark later on. We wanted to experiment on replacing some indexes from portal_catalog with Solr and try if we could gain any benefits in ZODB size or indexing speed.

As anyone who is at least a bit familiar with portal_catalog will know, replacing the whole of it can be a bit difficult because of special purpose indexes such as ExtendedPathIndex, which Plone heavily relies upon. So we decided to try if we could replace the “easier” indexes with Solr and have the rest be in portal_catalog. This would mean that we would need to merge results from both catalogs before returning them to the user. We did this by replacing the searchResults method in ZCatalog.Catalog.

To test our implementation we generated 20,000 Document objects in two Plone instances each and filled them with random content (more on this later) and compared the ZODB size, indexing time and query speed. The generated objects resulted in roughly 100 Mb worth of data and the size difference was about 8 % in favor of using Solr. Since we didn’t test this further with different data sets, I wouldn’t draw any conclusions based on this except to notice the (obvious) fact that externalizing the portal_catalog makes it possible to reduce the size of the ZODB to some degree. I know that some people use a separate ZODB mount for their catalogs so using an external catalog may be a good solution in some cases. The indexing times didn’t have much difference, but they were slightly in benefit of Solr. Querying our hybrid ZCatalog/Solr index turned out to be much slower than either ZCatalog or Solr by themselves :) I’m sure this was because of our non-optimized merging code that we did in searchResults.

In the end, I think the approach Enfold Systems originally had is the correct one for near-term projects. Querying Solr is very fast and indexing objects in both the portal_catalog and an external Solr instance doesn’t produce much overhead. If you need a customized search interface for your project with better than portal_catalog performance you should check Solr out. The guys at Enfold Systems promised to put their code in the Collective for everybody to use, including our buildout.

zc.buildout improvement

Godefroid Chapelle had a proposal to improve the zc.buildout so that you can use buildout to get information about the recipes it uses. After discussing the matter with Godefroid and Tarek and a quick IRC consultation with Jim Fulton we decided to prototype a new buildout command — describe — that would return information about a given recipe. Jim Fulton expressed his desire to keep recipes as simple as possible so the describe command simply inspects all the entry points in a recipe egg and prints the docstrings of the recipe classes. If the functionality is merged into mainline buildout, recipe authors should consider putting a description about the recipe and the available options in the docstrings (something that we currently see in the PyPI pages of well disciplined recipes).

The code is in an svn branch available at http://svn.zope.org/zc.buildout/branches/help-api/. The following examples are shamelessly ripped from Tarek’s blog


$ bin/buildout describe my.recipes
my.recipes
    The coolest recipe on Earth.
    Ever.

Multiple entry point support


$ bin/buildout describe my.recipes:default my.recipes:second
my.recipes:default
    The coolest recipe on Earth.
    Ever.
my.recipes:second
    No description available

Random text generation with context-free grammars

The alternative indexing topic required us to generate some random content in our test sites and both me and Tarek found doing this quite interesting on its own. After the other work was finished we started playing with an idea of creating a library for generating random text based on context-free grammars. You can read Tarek’s post on the library for more information. The end result was that we created a project on http://repo.or.cz/w/gibberis.ch.git called Gibberisch which currently contains some random text modules and a Grok interface called Bullschit :)

I worked with Ethan Jucovy on the Grok interface and which was great fun. Since this was our last day project there were really no serious goals. We just wanted to play with Grok and ended up building a RESTful interface for building up a grammar and then generating random content out of it. If you’re working on a RESTful implementation I can recommend using the RestTest add-on for Firefox, it’s a real time saver!

Basically, Bullschit models the grammar using Zope containers so that you can have multiple different grammars in one application, each grammar consists of sections that contain parts of sentences (in the context-free grammar) called Schnippets. You can use the basic HTTP verbs: POST, PUT, GET and DELETE to maintain the grammar and generate the random text.

For our presentation we hooked in the S5 slide show template to produce endless slides of total gibberisch. You can have even more fun by using the OSX speech synthesizer (or any other for that matter) to read aloud your presentation! Here’s an example of a slide generated with Bullschit and S5.

Presentation with Bullschit & S5

If you’re interested in giving it a go, you can get the code using git.


$ git clone git://repo.or.cz/gibberis.ch.git

For those interested in Git, don’t miss the recent 1.5.4 release!

Today I worked with Tarek Ziadé on ZopeSkel. Tarek concentrated on refactoring the ZopeSkel layout to put each template in its own module and wrote doctests for all available templates. Go Tarek! The test runner actually runs tests in two layers: first testing the output of the generated items and then, if the items contain tests themselves running them also.

I concentrated on improving the template for creating new zc.buildout recipes. Many useful recipes suffer from lacking documentation and an unappealing front page on PyPI. I refactored the template to include a common set of documentation files, such as CHANGES.txt, README.txt, CONTRIBUTORS.txt etc. and added code that puts all those documents nicely together to produce a serious looking ReST document that looks good on PyPI. So now its up to the recipe author to just fill in those files accordingly.

To help recipe authors and especially people new to zc.buildout I also added comments in both the documentation files and the code to help on implementing the recipe and especially on how to document it so that other people are able to use the recipe in their own buildouts. To me, one of the most importart parts of a recipe’s documentation is the list of available options and their semantics. Looking at the PyPI pages for zc.buildout and zc.recipe.egg you can easily get information about the component. I’ve also tried to do the same with my own recipes (hexagonit.recipe.cmmi, hexagonit.recipe.download). The template provides a stub for documenting the options in the README.txt file that authors can fill in.

I also created a minimal doctest for the buildout. While being only a skeleton the test actually runs a buildout using the recipe so you can run the test case for the recipe right after ZopeSkel is finished generating it. This should help recipe authors to get started with testing the recipe while they implement it.

In addition I updated the trove classifiers to appropriate values for a buildout recipe and added support for getting the trove classifier for the license to be added automatically in the setup.py file. So now when paster asks for a license for the recipe and you answer, for example, ZPL you get ‘License :: OSI Approved :: Zope Public License’ in your setup.py automatically. This code is actually in zopeskel.base and you can easily re-use it in the other ZopeSkel templates. Just take a look at how the recipe template uses it.

If you haven’t used ZopeSkel before, give it a try!

$ easy_install ZopeSkel
$ paster create --list-templates
$ paster create -t recipe collective.recipe.foobar

If you want to try the recent changes, you need to get ZopeSkel from the collective.


http://svn.plone.org/svn/collective/ZopeSkel/trunk/

There’s been lots of interest in ZopeSkel here at the Snowsprint so expect to have cool new templates there soon!

Update: 25.01.2007

ZopeSkel 1.5 was released which contains the latest changes.