Category Archives: Development

Http response time monitoring with Munin

There’s nothing better than graphs, and for a web site there are few better things to graph than the response times for your web pages. While there are plenty of external services out there that will probe your web site and graph the results, it’s a good idea to do this on your own too.

Munin is a monitoring tool that can provide graphs in plentiful for your servers. Out of the box, or at least out of its Ubuntu box, it monitors a variety of system metrics and applications, but there is no bundled support for response-time monitoring.

Luckily, it’s really easy to extend Munin with new plugins, so I decided to write my own plugin for monitoring response times, which you can download from my bitbucket repository. It will produce graphs like the one below:

http_response_time_ example graph

Here the plugin has been configured to monitor three URLs in the same graph. Unlike normal Munin probes these URLs are external to the actual server running the plugin, but you could just as well monitor localhost URLs too.

To get the plugin up and running you first need to install Munin, if you haven’t already. For a one-server setup under Ubuntu, with master and client both running on the same machine, you do:

sudo apt-get install munin munin-node

This will start the munin-node service in the background, and also add the master-node cron jobs to /etc/cron.d/munin. To install the plugin itself you add the Python script to a folder of your choice, make it executable, and then symlink it from the Munin plugin folder, like so:

sudo cp ./http_response_time_ /usr/local/bin/
sudo chmod 755 /usr/local/bin/http_response_time_
sudo ln -s /usr/local/bin/http_response_time_ /etc/munin/plugins/http_response_time_example

If you want to monitor more than one set of URLs, and thus have more than one graph, you can accomplish that by creating one symlink for each graph that you need. The names of the symlinks are used as section titles when configuring the plugin in /etc/munin/plugin-conf.d/munin-node. For the graph shown above, the configuration would look something like this:

[http_response_time_example]
env.url1_url http://www.djangoproject.com/
env.url1_name django
env.url1_label djangoproject.com
env.url2_url http://rubyonrails.org/
env.url2_name rails
env.url2_label rubyonrails.org
env.url3_url http://php.net/
env.url3_name php
env.url3_label php.net

The plugin requires that the following environment variables are specified for each URL to be monitored:

  • urlX_url — the URL that should be monitored
  • urlX_name — Munin field name for the probe
  • urlX_label — legend description of the URL

The ‘X’ in the variable names above should be replaced with an incremental index for each URL, e.g. url1 and url2. In addition, the following environment variables are also supported:

  • graph_title — the title of the graph (default is “Response time”)
  • graph_category — the category Munin should show the graph in
  • request_timeout — the socket request timeout (same for all URLs)
  • urlX_warning — warning level for the probe (for Nagios)
  • urlX_critical — critical level for the probe (for Nagios)

Note that Munin uses its own timeout when fetching plugin data. The default value is 10 seconds, which also is the default value for the URL request timeout. Because of this it might be appropriate to increase the Munin fetch timeout so that it equals the number of URLs being monitored times the request timeout, to make sure all probes have time to run.

Once you have configured the plugin to your satisfaction you need to restart the Munin node to make it discover the new plugin:

sudo /etc/init.d/munin-node restart

Happy graphing!

Django performance tip: select_related()

I was optimizing a Django application I’m working on the other day using Simon Willison’s excellent DebugFooter middleware, which adds a footer to each page showing which SQL queries were executed by Django when generating the page.

I’m a bit of a caching addict so I had already added a caching layer on top of my models, and thus I was quite surprised to find that the most important page on the site still generated 5-15 SQL queries on every access, even though the objects it was accessing supposedly were cached.

The objects were indeed cached, but every time I was accessing one of the ForeignKey fields on the model objects Django generated a SQL query to find the data for the related object. This could quickly turn nasty if you follow such relationships in a loop on a high-traffic web site.

The solution was the select_related() QuerySet method. Borrowing from the Django documentation, a normal ORM lookup would look like this:

e = Entry.objects.get(id=5)
b = e.blog

This would generate two SQL queries, one to fetch the entry object and one to fetch the blog object once it’s referred to from the entry object. The same example with select_related() becomes:

e = Entry.objects.select_related().get(id=5)
b = e.blog

This example only generates one SQL query, albeit bigger and slower than each of the individual queries in the first example because of the necessary join between the model tables to find all the data in one go. However, this doesn’t matter if the fetched object will go directly into a cache anyway and stay there for a possibly rather long time, which was the case for me.

Some notes on Amazon, Tarzan, and Yii

Today Amazon announced yet another new name for its excellent product feed API: Product Advertising API. It was formerly called the Amazon Associates Web Service, and it provides access to Amazon’s vast database of product offers and related content such as user reviews. It has been around since 2004 so it’s certainly not a new service, but I recently decided it was time I tried it out.

Choosing an API implementation
Implementing the API manually wouldn’t be too hard, but I figured someone would have done it for me already. Surprisingly, I did not find anything for Python that seemed mature or up-to-date. PyAWS was last touched in 2007, and pyecs only implements a subset of the API operations.

Although I’m sure something could be built on pyecs or PyAWS, I found that both PHP and Ruby had more mature packages available, in the shape of Tarzan and ruby-aws. Having also wanted to look into PHP web frameworks for a while, I decided this was a good opportunity so I went with Tarzan.

Choosing a PHP web framework
There’s a gazillion PHP frameworks out there, and most of them seem to have their fair share of loud-spoken supporters. Coming from Django, naturally my first thought was that I wanted the PHP framework equivalent of it. After some fruitless googling I decided there was no such thing, and instead I decided the selection criteria to be something fast, light-weight, and PHP5-based.

With the help of posts like this, this, and this, I somewhat arbitrarily narrowed down the contenders to Kohana, Zend, and Yii, and ultimately picked Yii since it was the new kid on the block.

A few snags…
I ran into a few snags during my brief foray into PHP land with Tarzan and Yii so I thought I’d write them down here, just in case it might help someone facing the same issues as I did, or in case I run into them again myself. 🙂

  • There’s a bug in the stable version on Tarzan that makes it ignore any Amazon Associate IDs you supply in the configuration or the class constructors. Product links returned by the Amazon API will because of this not be tracked, which means you won’t get any revenue share from Amazon to your affiliate account.
  • Tarzan returns SimpleXML objects, but apparently it’s not possible to serialize PHP built-in objects. I learned this when I tried to put the data I got back from Tarzan uncasted into memcached, and got this perplexing error message on retrieval:

    unserialize() [function.unserialize]: Node no longer exists

  • I first had multiple class definitions per PHP file, and this worked fine with Yii and its autoload support. However, when I tried to put these objects into memcached I again got confusing errors when they were unserialized, e.g.:

    YiiBase::include(BrowseNode.php) [yiibase.include]: failed to open stream: No such file or directory

    The BrowseNode class was not defined in a file of its own, and I suppose that’s why it couldn’t be found. When I moved it into a separate BrowseNode.php file, things started to work.

  • Update: this note is not quite correct, please see the comments below this post! Something really weird happens when you try to use the CHtml class in Yii to output an image with an empty or missing URL. This will make the controller action execute twice! I have no idea why this happens but it took me a good while to track down the cause. To reproduce, add the line below to a view in your Yii application and add a log statement to the corresponding controller action:
    
    

Final words
Amazon’s product API really is a solid, fast, and comprehensive service that deserves all the praise it has received. With the new API name, Amazon today also announced that API requests in the future will have to be authenticated through a hash calculated for every request based on your AWS identifiers and the parameters of the request. This requirement is phased in over a period ending on August 15, 2009.

Tarzan obviously lacks support for this, but at least the author is aware of the change. Apart from this and the annoying Associate ID bug I mentioned previously, Tarzan worked great for me and I wouldn’t hesitate to use it again, seeing that it’s actively maintained and tries to stay on top of Amazon’s evolution of services.

As for Yii, I did not use it enough to give a proper verdict — I barely tested the ORM support for instance — but it was easy getting started and its MVC structure seemed logical enough, although the relative youth of the framework is visible in some rough edges here and there. Yii markets itself as a high-performance framework and although I don’t have that many reference points, the execution speed was more than satisfactory. Would I use it again? Probably, but I’ll check out Kohana too at some point.

Django shortcomings and Facebook architecture

I’ve watched two presentations lately that I enjoyed, so I thought I’d link to them here.

The first one is by Cal Henderson at DjangoCon 2008. Cal is an engineering manager at Flickr, which not surprisingly is written in PHP, and he delivered a keynote address on why he hates Django.

Although made tongue-in-cheek, it contains a bunch of very valid points about Django. One of the main ones being Django’s monolithic database approach. This is probably also my own biggest concern with Django. I have first-hand experience of making this design mistake for a web site that grew rather big, and it can easily turn into a major and prolonged headache.

The other presentation is by Aditya Agarwal, an engineering director at Facebook, at QCon SF 2008. Aditya talks about the Facebook software stack, which somewhat crudely described is a normal LAMP stack, albeit heavily tuned, backed by memcached and a number of backend services. Facebook is obviously a very extreme environment but many of the design choices and observations in this presentation are valid for smaller sites too.

Multi-language content spots with django-chunks and friends

Update March 3: django-better-chunks has now been patched, so there is no need to use my project fork anymore.

In my previous post I wrote about how to use flatpages in Django for serving static content pages. The flatpages module only deals with full pages however, so if you would like to include static content on a more fine-grained level, or have multiple content spots per template, you need to look elsewhere.

This is where django-chunks, or one of the many projects forked from it (e.g. django-better-chunks and django-flatblocks), comes to the rescue. django-chunks allows you to for example create a content spot called “home_page_right” in admin, and then include it in your template like this:

<div id="right">
    {% chunk "home_page_right" %}
</div>

The chunk tag also accepts an optional second parameter that specifies a cache timeout in seconds, e.g. 3600 for an hour’s caching.

So far so good, but a theme of some of my previous posts has been multi-language support, and unfortunately django-chunks is lacking this. Luckily, django-better-chunks was created to remedy just that. Unluckily though, while adding language support they also broke the cache support.

That being said, django-better-chunks is the only module I’ve found that does support multiple languages, so I felt it would be the best project for me to build on. To fix the broken caching I’ve created yet another fork of this project: django-better-chunks-devdoodles. Hopefully the patch can be merged into the original project at some point. Here’s how to get django-better-chunks (or my fork) up and running:

Step 1 — download and install
Download django-better-chunks or, perhaps preferably until the caching is fixed, django-better-chunks-devdoodles and install in your Python/Django path.

Step 2 — edit settings.py
As usual, you need to add LocaleMiddleware to your middleware classes if you want Django to automatically choose which language to use:

MIDDLEWARE_CLASSES = (
    ...
    'django.middleware.locale.LocaleMiddleware',
)

Then add contrib.sites (there by default), admin, and chunks to your installed applications:

INSTALLED_APPS = (
    ...
    'django.contrib.sites',
    'django.contrib.admin',
    'chunks',
)

contrib.sites is needed since django-better-chunks connect the content chunks to a site through a ForeignKey relationship. Admin is needed to add and edit content spots.

Optionally, if you want to use another caching backend for chunks than the default local-memory cache (locmem://), then you add it here too. To use a local memcached cache on the default port instead, add:

CACHE_BACKEND = 'memcached://127.0.0.1:11211/'

Step 3 — activate admin in urls.py
Uncomment the three lines needed to activate admin in urls.py.

Step 4 — sync database
Create database tables:

python manage.py syncdb

Step 5 — create spots in admin
At this point all that remains is to create the content spots in the Chunks section in admin (/admin/chunks/chunk/) and start using them from your templates. Make sure to load the chunk tag library first using:

{% load chunks %}

Finally, beware that the help text in the admin UI gives as language examples ‘sv-se’ and ‘de-de’, which often won’t work well with Django’s automatic language detection. This is because Django by default almost always resolves to base languages (e.g. ‘en’) and not sublanguages (e.g. ‘en-us’), since most of the languages defined in the LANGUAGES setting in global_settings.py are defined only as base languages.

Static content with django-multilingual flatpages

While working on a multilingual Django project I encountered the need to have pseudo-static pages, e.g. an about page or FAQ page, translated into multiple languages. In my earlier post I wrote about how to use the i18n tag library in templates to handle translations, and although this approach would work for static pages too it would not be a perfect fit.

Django comes bundled with the flatpages application, which rather cleverly hooks into the 404 errors generated by Django when it cannot find a page and maps the requested URL to a database list. If there’s an entry for the requested URL, it shows the page stored in the database for that URL instead of the 404 page.

The bundled flatpages application has no inherent multi-language support, and I was pretty close to adapting it for my needs before Google came to the rescue. Obviously, this had already been done by someone, and it’s distributed as a part of the django-multilingual module, which is a generic module for having translated fields in Django models. Here’s what I did to get it up and running, based on the steps described on the project wiki.

Step 1 — install django-multilingual
Check out the Subversion trunk for the project as described in the wiki, and make the checked out module available for Python somehow. I just copied the multilingual sub-folder to my project folder as if it were my own application.

Step 2 — edit settings.py
Add the list of languages you want to support to settings.py, and mark English as the default language (through its tuple index). The LANGUAGES setting is actually already defined in global_settings.py, but I don’t want to support all those languages so I override it.

LANGUAGES = (
    ('en', 'English'),
    ('sv', 'Swedish'),
)
DEFAULT_LANGUAGE = 1

Add the multilingual context processor to TEMPLATE_CONTEXT_PROCESSORS. This setting is not included by default in your settings.py file, but the first four core processors below are set as default in the global settings (for reference, see here and here):

TEMPLATE_CONTEXT_PROCESSORS = (
    'django.core.context_processors.auth',
    'django.core.context_processors.debug',
    'django.core.context_processors.i18n',
    'django.core.context_processors.media',
    'multilingual.context_processors.multilingual',
)

Add the middleware classes in the order listed below to support language detection and for the actual mapping of 404s to flatpages to be triggered. Curiously, the FlatpageFallbackMiddleware is not mentioned in the official installation instructions, but you can deduce that it’s needed by its mentioning in the original flatpage documentation and, of course, the fact that nothing happens without it.

MIDDLEWARE_CLASSES = (
    ...
    'django.middleware.locale.LocaleMiddleware',
    'multilingual.middleware.DefaultLanguageMiddleware',
    'multilingual.flatpages.middleware.FlatpageFallbackMiddleware',
)

Add admin and the multilingual apps to INSTALLED_APPS:

INSTALLED_APPS = (
    ...
    'django.contrib.admin',
    'multilingual',
    'multilingual.flatpages',
)

Step 3 — activate admin in urls.py
Uncomment the three lines needed to activate admin in urls.py.

Step 4 — sync database
Create all database tables needed for admin and flatpages:

python manage.py syncdb

Step 5 — create template
Create a ./flatpages/ sub-folder in your template-root directory, and create a default.html template in it. This is the default template used for displaying the flatpages, but it can be overridden in admin (see the next step). Its context is populated by a flatpage variable with two fields: title and content. An example template is available here.

Step 6 — create pages in admin
Go to the flatpages section in your admin application (/admin/flatpages/multilingualflatpage/) and create all the pages and translations you want to serve using the flatpages application.

Step 7 — done!
We’re done! As before, you can test your work by modifying LANGUAGE_CODE in settings.py or changing the preferred-languages setting in your web browser.

Ubuntu, cmemcache, and python-memcached (once again)

I wrote earlier about my not quite satisfactory attempt to install cmemcache on Ubuntu. After stumbling upon this exhaustive guide on how to set up a full-blown Django environment under Ubuntu I thought I’d try the apt-get route for libmemcache too. All you need to do to get libmemcache and the necessary header files installed is:

sudo apt-get install libmemcache-dev libmemcache0

After that, the steps for installing cmemcache itself are the same as before. The good news is that libmemcache and cmemcache both get installed without fuss using this approach, without any need for manual patching. The bad news is that the test.py suite still fails with same error message as before.

I cannot say that I fully understand the significance of this test failing. It might be harmless, but I sure like my tests to pass successfully so cmemcache will remain on my naughty list for now. Left with only python-memcached as a choice, I installed it instead, which was a breeze compared to cmemcache:

wget ftp://ftp.tummy.com/pub/python-memcached/python-memcached-1.43.tar.gz
tar xvfz python-memcached-1.43.tar.gz
cd python-memcached-1.43/
sudo apt-get install python-setuptools
sudo python setup.py install

python-setuptools is used by the setup.py script, so it needs to be installed before running the setup. Once done, you can test python-memcached almost identically to cmemcache earlier (memcached needs to run in the background on localhost for this to work):

>>> import memcache
>>> c = memcache.Client(['127.0.0.1:11211'])
>>> c.set('testkey', 'testval')
True
>>> c.get('testkey')
'testval'
>>>