Once again the annual Snowsprint hosted by Lovely Systems was a great experience. This was my second time to attend the sprint and I enjoyed it very much. The scenery at the Austrian alps is just amazing. I even managed to hold off catching the cold only after the sprint this time 🙂
Alternative indexing for Plone
This year I wanted to work on subjects that I’m not the most familiar with. On the first night I expressed interest in the alternative indexing topic proposed by Tarek Ziadé which lead us to work on an external indexing solution for Plone based on the Solr project. Enfold Systems had already started on working with Solr on a customer project and Tarek had arranged with Alan Runyan to collaborate on their work. Tom Groß joined us in our work and our first task was to produce a buildout that would give us a working Solr instance. We ended up creating two recipies to implement the buildout: collective.recipe.ant, which is a general purpose recipe for building ant based projects (kind of like hexagonit.recipe.cmmi for Java based projects, although you can use ant for non-Java projects just like make), and the Solr specific collective.recipe.solrinstance, which will create and configure a working Solr instance for instant use.
Enfold Systems had already a working implementation of a concept where the Plone search template (search.pt) was replaced by their own which implemented the search using only an external Solr indexing service. However, everything was still indexed in the portal_catalog as usual, so there was no gain in terms of ZODB size or indexing speed compared to a vanilla Plone site. Querying the Solr instance was of course extremely efficient which we verified using a JMeter based benchmark later on. We wanted to experiment on replacing some indexes from portal_catalog with Solr and try if we could gain any benefits in ZODB size or indexing speed.
As anyone who is at least a bit familiar with portal_catalog will know, replacing the whole of it can be a bit difficult because of special purpose indexes such as ExtendedPathIndex, which Plone heavily relies upon. So we decided to try if we could replace the “easier” indexes with Solr and have the rest be in portal_catalog. This would mean that we would need to merge results from both catalogs before returning them to the user. We did this by replacing the
searchResults method in
To test our implementation we generated 20,000 Document objects in two Plone instances each and filled them with random content (more on this later) and compared the ZODB size, indexing time and query speed. The generated objects resulted in roughly 100 Mb worth of data and the size difference was about 8 % in favor of using Solr. Since we didn’t test this further with different data sets, I wouldn’t draw any conclusions based on this except to notice the (obvious) fact that externalizing the portal_catalog makes it possible to reduce the size of the ZODB to some degree. I know that some people use a separate ZODB mount for their catalogs so using an external catalog may be a good solution in some cases. The indexing times didn’t have much difference, but they were slightly in benefit of Solr. Querying our hybrid ZCatalog/Solr index turned out to be much slower than either ZCatalog or Solr by themselves 🙂 I’m sure this was because of our non-optimized merging code that we did in
In the end, I think the approach Enfold Systems originally had is the correct one for near-term projects. Querying Solr is very fast and indexing objects in both the portal_catalog and an external Solr instance doesn’t produce much overhead. If you need a customized search interface for your project with better than portal_catalog performance you should check Solr out. The guys at Enfold Systems promised to put their code in the Collective for everybody to use, including our buildout.
Godefroid Chapelle had a proposal to improve the zc.buildout so that you can use buildout to get information about the recipes it uses. After discussing the matter with Godefroid and Tarek and a quick IRC consultation with Jim Fulton we decided to prototype a new buildout command — describe — that would return information about a given recipe. Jim Fulton expressed his desire to keep recipes as simple as possible so the describe command simply inspects all the entry points in a recipe egg and prints the docstrings of the recipe classes. If the functionality is merged into mainline buildout, recipe authors should consider putting a description about the recipe and the available options in the docstrings (something that we currently see in the PyPI pages of well disciplined recipes).
The code is in an svn branch available at http://svn.zope.org/zc.buildout/branches/help-api/. The following examples are shamelessly ripped from Tarek’s blog
$ bin/buildout describe my.recipes my.recipes The coolest recipe on Earth. Ever.
Multiple entry point support
$ bin/buildout describe my.recipes:default my.recipes:second my.recipes:default The coolest recipe on Earth. Ever. my.recipes:second No description available
Random text generation with context-free grammars
The alternative indexing topic required us to generate some random content in our test sites and both me and Tarek found doing this quite interesting on its own. After the other work was finished we started playing with an idea of creating a library for generating random text based on context-free grammars. You can read Tarek’s post on the library for more information. The end result was that we created a project on http://repo.or.cz/w/gibberis.ch.git called Gibberisch which currently contains some random text modules and a Grok interface called Bullschit 🙂
I worked with Ethan Jucovy on the Grok interface and which was great fun. Since this was our last day project there were really no serious goals. We just wanted to play with Grok and ended up building a RESTful interface for building up a grammar and then generating random content out of it. If you’re working on a RESTful implementation I can recommend using the RestTest add-on for Firefox, it’s a real time saver!
Basically, Bullschit models the grammar using Zope containers so that you can have multiple different grammars in one application, each grammar consists of sections that contain parts of sentences (in the context-free grammar) called Schnippets. You can use the basic HTTP verbs: POST, PUT, GET and DELETE to maintain the grammar and generate the random text.
For our presentation we hooked in the S5 slide show template to produce endless slides of total gibberisch. You can have even more fun by using the OSX speech synthesizer (or any other for that matter) to read aloud your presentation! Here’s an example of a slide generated with Bullschit and S5.
If you’re interested in giving it a go, you can get the code using git.
$ git clone git://repo.or.cz/gibberis.ch.git
For those interested in Git, don’t miss the recent 1.5.4 release!