RSS

Using a forward-proxy for direct access to production sites

01 Jan

At our company we use a fairly common production environment setup for our Plone sites. We’ve got Apache for virtual hosting, logging and SSL followed by Varnish for caching and Perlbal for  load-balancing finally backed up a farm of  Zope instances using a ZEO storage.

Each of our client sites are served by at least two different Zope instances so that we can perform rolling updates by bringing down, updating and restarting the instances one at a time without resulting in any downtime for our customers. While a Zope instance is down for maintenance we remove it from the load balancer pool so it won’t receive any requests during the maintenance period. Before returning the updated Zope instance back to the load balancer’s pool we need to be able to access it directly, mainly for two reasons:

  • to verify that the instance functions correctly and there are no regressions
  • to warm up various Zope level caches to avoid the first requests to pay the penalty of the cold start

It is also important that we can access the Zope instance without going through Varnish and the load balancer to make sure we are indeed seeing a response from the particular Zope instance instead of a cached copy.

Background

Originally we had set up a custom <VirtualHost> section in Apache that allowed us access to all sites from within a single domain name, something like https://zmi.mycompany.com/customer1 and https://zmi.mycompany.com/customer2. This worked out great in the past and allowed us to both verify easily that the instance was working properly and also to warm up the Zope memory caches. However, after warming up the customer sites and putting the instances back into the load balancer pool we started experimenting problems with inconsistent behaviour between the instances in the Zope farm with regard to links and page elements such as images. It seemed that links and images were pointing to our custom https://zmi.mycompany.com/ domain instead of the customer specific one they should have. Somehow the cache warm up process had broken up the live site.

It didn’t take long to find out that CacheFu and the in-memory Page Cache in particular were the culprit. The Page Cache caches the results of rendering a page thus persisting the links with the incorrect domain name. We could have simply purged the Page Cache after the warm up but since the goal was to warm up the caches (including the Page Cache) that did not feel right. To get around the issue we would need to access the particular Zope instance using the real customer domain name instead of our custom one. We chose to use a HTTP forward-proxy for this.

The main idea is that normal users accessing http://www.customer.com/ would be served through the normal production pipeline including Varnish and Perlbal and from the Zope instances active in the load balancer. However, when using the custom HTTP proxy we could use the same http://www.customer.com/ address but be served directly from the particular Zope instance bypassing the caching and load balancing. Since Apache provides all that we needed out of the box it was a simple choice to use it.

Configuring Apache to forward-proxy requests

We decided to implement the forward proxy configuration within a <VirtualHost> configuration. Requests coming to the customer domain would need to routed to a particular Zope instance but other requests would be proxied though to the outside world.  The configuration consisted mostly of common mod_rewrite rules but we still needed a mechanism to target a particular Zope instance from our farm.

A simple solution would have been to create a separate <VirtualHost> section for each Zope instance in the farm. However, a linear correlation on the number of Zope instances was not desirable as it would results in a large number of virtual host sections that would  be roughly 90% the same. Instead, we chose to do the following. Each physical backend machine running a number of  Zope instances would be handled by a single <VirtualHost> section and the virtual host’s ports would be mapped 1:1 to the Zope instances’ ports. In other words, we would use the port numbers to separate Zope instances within a single physical backend machine and then virtual hosts to separate the physical machines. This seemed like a good compromise.

The <VirtualHost> proxy configuration for a single backend machine running multiple Zope instances  would then look like this. We’ve called the proxy vhost backend1.proxy.mycompany.com and using 192.168.0.100 as the backend machine address.

<VirtualHost 1.2.3.4:*>
    ServerName backend1.proxy.mycompany.com
    ProxyRequests On
    <Proxy *>
        Order deny,allow
        Deny from all
        Allow from 4.5.6.7
    </Proxy>

    # Tell Apache to preserve the physical TCP port information so we
    # can map it directly to the Zope backends by reading
    # the %{SERVER_PORT} environment variable.
    UseCanonicalName Off
    UseCanonicalPhysicalPort On

    RewriteEngine On
    # Read the rewrite map from an external file. The file
    # provides a mapping from public host names to ZODB paths leading
    # to the corresponding Plone site roots.
    RewriteMap zope txt:/var/apache/proxy-rewrite-map.txt
    RewriteMap tolower int:tolower

    # Make sure we have a Host: header
    RewriteCond %{HTTP_HOST} !^$
    # Normalize the Host: header
    RewriteCond ${tolower:%{HTTP_HOST}|NONE} ^(.+)$
    # Lookup the hostname in our rewrite map
    RewriteCond ${zope:%1} ^(/.*)$
    # Finally, rewrite and proxy to the Zope backend. We map the proxy
    # ports directly to the Zope backend ports which allows a mechanism for
    # selecting a specific backend by choosing a matching port.
    RewriteRule ^proxy:http://[^/]*/?(.*)$
        http://192.168.0.100:%SERVER_PORT/VirtualHostBase/
          http/${tolower:%{HTTP_HOST}}:80/%1/VirtualHostRoot/$1 [P,L]

</VirtualHost>

I have removed parts not relevant to this post, such as log file configuration, from the example above. Also the last RewriteRule is split on multiple lines but it should all be on a single line. The main points in the config are

  • use of “*” as the port in the <VirtualHost> node. This allows us to use a single section configuration for multiport access
  • use of <Proxy> section to limit access to the proxy. This is very important so that do not expose the proxy to the whole world. Only allow yourself access to it.
  • setting UseCanonicalName Off and UseCanonicalPhysicalPort On to make sure we get the actual port used when reading the %{SERVER_PORT} environment variable later in the rewrite rule
  • setting up an external rewrite map file that can be shared among <VirtualHost> sections.
  • performing conditional rewrites that proxy requests targeted to the customer domain to the particular Zope instance and let others through.

The external rewrite map file is a simple text file in which each line contains two values: first the customer domain name (which will matched against the normalized contents of the Host: HTTP header) followed by the ZODB path to the root of the corresponding Plone site. For example:

www.customer1.com  /customers/customer1
www.customer2.com  /customers/customer2

Now, assuming that we have two Zope instances running on machine 192.168.0.100 (the address used in the rewrite rule above) on ports 10001 and 10002 we can access them through the proxy by configuring the browser to use a HTTP proxy at backend1.proxy.mycompany.com on port 10001 or 10002 respectively.

We now have a situation where we can access a customer site, e.g. http://www.customer1.com/, and choose the particular Zope instance by configuring the HTTP proxy in our browser. This is all well and good and does achieve what we started out to do. However, manually configuring the proxy setup everytime is not fun and there is nothing to differentiate our use of the site in “normal” mode from the “backdoor” proxied mode. It is too easy to forget the proxy configuration on once the update has been finished. Luckily, there is a solution available that will make this a breeze.

Using FoxyProxy to manage proxy configurations

FoxyProxy is a Firefox extension that helps with managing multiple proxy configurations and makes switching between them quick and easy. It also shows the current proxy configuration in the status bar which makes it easy to see which backend we’re currently talking to (if you name you proxy configuration accordingly).

To continue our example, we would make two separate proxy configurations for accessing each one of our two backend Zope instances. The first one we could call “Backend #1 — Zope instance #1” and use backend1.proxy.mycompany.com:10001 as the address and the other one “Backend #1 — Zope instance #2” using backend1.proxy.mycompany.com:10002 as the address. Having the name of both the physical machine and the Zope instance in the proxy configuration helps to identify the particular Zope instance quickly.

With FoxyProxy configured it is now very easy to switch between accessing a site through the full production pipeline with caching or bypassing that and talking directly to a given Zope instance. Because the solution is generic we can take advantage of it with any HTTP client that is capable of using a proxy. It would now be very easy, for example, to do benchmarking with and without Varnish by simply switching between proxy configurations when using ab or another benchmarking tool.

Advertisements
 
Leave a comment

Posted by on January 1, 2009 in zope

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: