Log in

No account? Create an account

Tue, Oct. 14th, 2008, 05:02 pm
If you're gonna be dumb then you gotta be tough

My constant prattling on about message queues has been going on for a while now.

Since then I've been looking long and hard at RabbitMQ and QPid, both of which look hella sexy and very, very promising.

However, at the moment both still don't quite pass my "as easy as memcached to set up test".

Given that I've been drinking with known both the Dopplr and the Flickr team for a while (and borrowed desk space off Dopplr in return for tea every morning and Pho once a week) it's not surprising that I've bent their collective ears about this more than once.

They're both big advocates of message queus - witness recent blog post from both of them. However something else that I've ranted at talked to them about is the concept of premature scaling.

There are certain semi-best practices that you can do to make web sites scale (both technically and socially). Stuff like sharding for example. Later, if you get really big you'll maybe want to internationalise. But you obviously don't want to do all that from the get go when you're only 1 person working on a VM somewhere on some shared box in a colo.

So there's a line somewhere about what you should do upfront and what you should leave till later. It's a grey area and it's different for every site but general consensus was that making sure you're not using auto incrementing primary keys on tables you might want to shard is probably the right thing but leave your i8ln till later (although you always want to keep that sort stuff in mind and gnerally structure your code to make it as easy as possible later).

Obviously, being me, messaging came up. Now there's really two types of messaging - one when you want to broadcast an event to an unknown number of listeners (let's call that PubSub) and one when really you're just using it to do asynchronous tasks (let's call that Job Queues).

PubSub very much falls into the realm of "You don't need this now" I think. Job Queues on the other hand - they can be very useful. From generating thumbnails to sending notification emails or SMSs - none of this needs to be done to render the page so you might as well shunt it off to be done later when you're not desperately scrambling to return some HTML to the user as fast as possible.

LJ uses a system called TheSchwartz (the name is actually not one but two semi-elaborate in-jokes) for this stuff. Dopplr uses ActiveMQ I believe and Flickr use their own internal system.

TheSchwartz is fantastic. It scales for a start. And it runs off a database so it's easy to set up and admin (since you already presumably have a database running your site).

Tom Insam (who works at Dopplr and is somewhat of a mad genius even if he does have girly long hair) recently pondered about having a really simple, database backed message queue that used STOMP as its interface. Then, when you got really big, you could switch over transparently to using one of the big boy queues.

"Hmm," thinks I, "I wonder how hard it would be to make a STOMP interface to TheSchwartz"

So I started hacking on a generic STOMP sever with every intention of just do a very simple layer over the top of TheSchwartz.

Then generic layer took all of, ooh, about 30 minutes to write.

What I should have done then is then write TheSchwartz specific layer which would probably have only taken another half an hour or so.

Instead I started sketching out in my head how you'd write a real proper PubSub message broker.

A couple of hours later I'd done most of it - only stopping when the dev machine I was testing on suffered a prolonged power outage.

So here we go.

This contains both the generic layer (the simple bit) and the actual broker (the tricky bit).

How it works is that a Parent server starts up and starts listening. When a new STOMP client connects it fork()s off a child which listens for new STOMP frames and reacts accordingly.

The clever bit is when a message comes in. The Child serialises the frame and sends it via a socket pair to the Parent. Then parent then sends it back down via another socket pair to all the Children. The Children then decide whether or not to send it to their Client depending on what SUBSCRIBE frames they've received.

Queues (first come, first served) are a bit more complicated than Topic (broadcast) messages though.

In that case the Parent also creates a semaphore and the Child then tests to see if their the first to decrement it. If they are then they send the message. The last Child to check removes the Semaphore.

Now, don't get me wrong - this is very, very alpha quality at the moment - I'm pretty sure the queue implementation is duff and also that it's liable to fork() bomb your machine at a moment's notice but it shouldn't be too hard to clean up.

However I doubt it will ever be Enterprise class. For a start it's all memory backed at the moment but even when I write a planned subclass that uses a DB and TheSchwartz for persistence it's just not careful enough or efficient enough to properly scale. It might still be useful for small sites though but really it's an intellectual exercise. It is incredibly simple to use though. In most cases just running
    stomp-broker --daemonize 

should be enough.

Tomorrow though, I'll finish off the far more useful STOMP interface to TheSchwartz during our weekly Hack Days and make it as easy to install and run.

The title of this post refers both to the fact that the Job Queue might be useful despite being so simple and also to the fact that I'm a dumbass for not doing the useful thing first and fiddling round with the intellectual geejaw instead.

Oh, and it's catchy Roger Alan Wade song used, appropriately, for Jackass.

Wed, Oct. 15th, 2008 09:53 am (UTC)

Tom has recently had a haircut, so it is now neither long nor girly.

Secondly, job queues are indeed very useful. I'm waiting to see if a web application framework comes along with one built in, since it seems like I get about a hundred lines of code into any given Flickr web application I want to write and then, blam, I need a job queue.

I suppose what I'm saying is "I r dum n lazy, can haz simpl job q pls?"

Wed, Oct. 15th, 2008 12:57 pm (UTC)
(Anonymous): have you tried rabbit recently simon

Hey Simon, cool post. If you have not looked at RabbitMQ recently please do because I'm not sure if it could get much easier to set up. The current packaging gets a lot of praise. Yes we want to make it better and better but we need to be told if things are missing.

Example (using Ruby on OSX) - http://playtype.net/past/2008/10/10/kickass_queuing_over_ruby_using_amqp/

Hope to see you next week!

Cheers, alexis

Wed, Oct. 15th, 2008 05:40 pm (UTC)
deflatermouse: Re: have you tried rabbit recently simon

It's actually not the installation that's really the problem per se - and don't get me wrong RabbitMQ has made that list I have with my non-existent girlfriend which says that I'm allowed to cheat on her with either Alyson Hannigan or RabbitMQ with no-recriminations aslong as she's allowed to do the same with Johnny Depp or LLVM.

It's actually more that - when you're starting out you tend to have a web framework and a database and that's it and installing Erlang and admining another daemon might be a step too far for the hobbyist. As you're well aware my thinkings on this are vague and hand-waving and mostly beer fuelled so if I'm less than clear then I promise to buy you a beer next week :)

Also Tom's updated his request and, having chatted with him I've got another post brewing.

Mon, Oct. 20th, 2008 04:05 pm (UTC)

It worries me that while we both have broadly the same job of "programmer", this post may as well have been written in Hindi for all I understood of it.

Mon, Oct. 20th, 2008 05:27 pm (UTC)

Dude, have you ever read GDA?

"Attach the rain-volume to the camera, but align it to the ground"

"Render front faces and then back faces into two separate depth
buffers you can compute a thickness value per texel along the view
direction. If you then combined this with a captured environment cubemap
perhaps you could emulate subsurface scattering"

"Place rain sheets statically in the world and cull them with an
extremely close far-plane"

"Perhaps a simple incremental method for generating a bounding cone of
the animation keys might be worth trying. Using the initial bind
orientation of the joint as the starting "cone", for each animation key
determine if the joint lies within the current bounding cone."

"Potentially viable for lower-frequency gross shadowing with some

I mean hell, I've done this sort of stuff and occasionally that list sounds like someone ran a list of graphics terms through that Post Modern Essay generator.

Edited at 2008-10-20 05:27 pm (UTC)