Log in

No account? Create an account

Tue, Mar. 18th, 2008, 03:23 pm
Standing in Line

For years the default website stack has been something similar to the classic LAMP stack - originally "Linux, Apache, MySQL, Perl" but now really meaning "Free Unix, Free Web server, Free Database and Free Web Language" or just "OS, Web server, Database, Language" to be even more general and friendly to the Windows people out there.

Relatively recently we've started to see that we should add another layer - a Cache. To be honest, from what I can see, Memcached has pretty much got this sewn up by virtue of being awesome although there are other technologies like APC. So I'd like to coin a new phrase - the CLAMP stack for Caching LAMP. I can't find a reference to it so I'm going claim it as mine. MINE. MUHAHAHAHAHAAH. Maybe in the future it will make me famous. Maybe. *cough*

Even more recently I think there's been a need for some sort of new layer.

At 6A we use TheSchwartz as a reliable asynchronous job manager - you sling a job onto the queue and at some point in the future it will get done and you don't have to think about it. Twitter has recently released Starling which does similar and, in a nice bit of circularity, speaks the Memcached protocol.

I also know of at least 4 or 5 other big places with similar bits of software and at least one of those is built on a PubSub messaging system.

When I was at Yahoo! we had a really successful TibCo Rendezvous a-like for doing both asynchronous jobs (in this case publishing news pages and baking hundreds of run time templates from language agnostic seed templates) and for more traditional PubSub tasks such as broadcasting messages without really caring who's listening.

I like PubSub messaging systems. No, I love them. Or, rather I love the idea of them. In practice I've had mixed results especially with ActiveMQ which, in my experience anyway, had far too many rough edges, to many cool and quirky features when I all I really wanted was bullet proof basic functionality. And by basic I mean - high availability, fault tolerance, reliable messaging, queues, topics and some sort of message selectors (I have to admit, I did like ActiveMQ's ability to have SQL like message selectors as well as glob/path based ones even if I did get bitten by some quirks).

The cool thing about PubSub systems is that when you start using them you end up finding uses for them everywhere.

But the problem is that all the open source ones kind of suck.

I don't mean that to diss the implementations out there it's just that they're trying to do too much for what I actually want them to do.

One of the reasons that Memcached has been so successful is that it does one thing and does it well. It's incredibly easy to set up so that you can integrate it into your app from a very early stage. Then, as your app gets bigger, you just add more Memcached servers and update your config. With Consistent Hashing you can have a virtually downtime free system in a way that moistens the gussets of the kind of people who habitually use phrases like Enterprise Class Turnkey Systems without sniggering and/or dying a little inside.

So what I want from a basic queue is
  • Reliable / Guaranteed Delivery
  • High Availability / Failover
  • Optional Ordered Delivery
  • Incredibly easy to set up
  • Language independent

from the "would be nice" pile
  • Selectors of some kind
  • Ability to tune reliability vs performance
  • Retroactive consumers
  • Topics / Broadcast messages

What I actually want out of this is a kind of de facto standard like Memcached or Apache or whatever that's so simple to understand and use that it changes the way people develop their apps such that they just start with the assumption that some of their processing is going to be done asynchronously and build it in right from the start.

This means that, in the future, when their site starts to get much bigger the async stuff makes it really easy to scale. Then, at a later stage, they can decide that they want to start broadcasting events - so that their Atom Update Stream, Logging server and notification Server just learn about a new post without their web app backend having to know about any of them. Later, when they add a Search Index, they don't have to change anything else. All these things should be almost trivial.

I've been thinking about this on and off for, oooooh, about 6 years but really only thinking seriously about it for the last few months, Basically I've been trying to mull stuff over in my head to get the shape of the problem. It's still somewhat indistinct - there's a fine line between sufficiently featureful and over engineered, between reuse of existing technology and not adding any extra value.

I think the requirements I specified above are the right feature list - for me at least. And, after bending the ears of several much smarter people than I they seem to be roughly in the right ball park.

I'm also leaning towards having multiple pluggable back ends with a sensible default. This seems like a good compromise between simplicity, flexibility and 'future proofing'.

I'm umming and ahhing about having pluggable protocol adaptors. No two protocols are exactly the same so you're almost always going to have a lossy conversion which seems like a bad thing to encourage. On the other hand it helps with adoption and fitting in with existing Ecosystems. So it ought to support one, some or all of AMQP, STOMP, the Memcached protocol or its own home-brewed affair.

I'm not the first person to think about this by a long shot - I really like what Audrey and Jesse did with IPC::PubSub for example - and I really need to spend some serious quality time it as well as with with RabbitMQ, Spread, QPid and probably ActiveMQ as well if only to see if they'd be good candidates for being back ends for a start.

It's weird - I'm kind of desperate for someone to come up to me and say that this is a stupid idea. Or that it's not needed. Or not doable. Or that's someone else has done it. I think that's why I've been telling people about it so that they'll talk me out of it because, I think deep down, I don't want to get drawn into it - probably out of a combination of laziness, fear of failure and fear of the kind of success wherein suddenly something kind of consumes your life.

It's also possible I'm just over thinking things and that I should shut the hell up and JFDI.

Which is probably more likely.

Tue, Mar. 18th, 2008 10:36 pm (UTC)

Please stop the hurting by misappropriating the memcache protocol! It's really an incredibly shitty protocol. It sucks. It's AWFUL. The only reason people are using it is because there are very, very fast clients that know how to handle multiple async sockets. Ok, that's a pretty good reason.

So let's fix the protocol situation. Take a look at the latest binary protocol document spec and start listing off the requirements we need for special-purpose protocol extensions. These beasts are directly supported by the binary protocol, which allows new commands to be defined fairly readily. Let's not go crazy, but let's separate things out that are actually separate!


For example, I started writing out a Get-Range extension (which I proposed as a replacement for the UDP protocol fuckage, but ultimately I think we're going to end up with extant dumb UDP wrapper around the text protocol applied to the binary protocol, and a separate set of ranged get/set requests -- C'est la vie).


Tue, Mar. 18th, 2008 10:57 pm (UTC)

I definitely agree - I think it's one of the bad design decisions of Starling and it's kind of misleading to say that it 'speaks' Memcached since although that's technically true, really, it has nothing to do with it. It conflates the two when they shouldn't be.

It's one of the reasons why I said in the meeting on Monday that we should abstract out the fast parallel async stuff so that it can be used in Memcached, Net::Stomp and the other stuff we were talking about.

Wed, Mar. 19th, 2008 02:39 am (UTC)

So the job servers are fine, queue servers are fine, but message (job multiplexers?) suck? I'm not all too clear on what you mean is wrong with all of the tools you listed.

I haven't worked with them very much. The only goal I have is to roll gearman-style functionality into a memcached storage engine (as those start to exist soon). Mostly because it's hard to convince people to actually use gearman. Instead you get starling, which is a decent but minimal _queue_ lacking a lot of the basics gearman doesn't suck at.

Yes. Fix things.

Wed, Mar. 19th, 2008 05:15 am (UTC)

It's more a matter of the way these things are designed. I like the idea of all these things - Gearman, TheSchwartz, ActiveMQ etc etc but none of them have what I want. They're either to simple (or, more precisely, more low level), don't have the features I want (for example, TheSchwartz isn't *really* language independent and doesn't have ordering) or are too complicated or not simple enough or, in some cases, too buggy.

Not to say that some of them wouldn't be good back ends for the system I've described. In much the same way that Lucene gives you a tool kit for building a search engine without actually giving you a search engine. It's just that none of them can provide a simple, easy to use system that works pretty much out of the box.

Wed, Mar. 19th, 2008 08:02 pm (UTC)
ext_90542: +1

Agree on almost every front. A dead-simple, fast, language agnostic implementation of queue + pub/sub would be fantastic.

Wed, Apr. 2nd, 2008 07:36 pm (UTC)

Have you looked at MemcacheQ at all?

Fri, May. 2nd, 2008 11:21 am (UTC)

have you had a look at mantaray? and i hear sapo's broker is very good, too.

Fri, May. 2nd, 2008 07:07 pm (UTC)

It's not just the technology though - it's the ease of use. I think there may be a bunch of technologies that might be useful as back ends but they're a bit of a sledgehammer to crack a nut.

Basically what I want is something that has a config file that looks
like this

        # or RabbitMQ or Spread or Jabber or whatever

And then you just do

    % simplemq
    Started SimpleMQ on port 6464

And from your app you can do

        class MyClass implements SimpleMQListener {

          SimpleMQ mq;
          public MyClass() {
                mq = new SimpleMQ("localhost", 6464);
                mq.setListener("/sixapart/posts/*/public", this);
                mq.sendMessage("queue:///sixapart/posts/vox/public", "Hello!");

          public receiveMessage(SimpleMQMessage m) {
                System.out.println("Got a message on "+m.path);

or in Perl
        my $mq = SimpleMQ->new("localhost", 6464);
        $mq->set_listener("/sixapart/posts/*/public", \&receive_message);
        $mq->send_message("queue:///sixapart/posts/vox/public", "Hello!");

        sub receive_message {
          my $m = shift;
          print "Got a message on ".$m->path."\n";
          print $m->body."\n";
And have it be exactly that simple.

Thu, May. 15th, 2008 09:23 am (UTC)

Hi, I'm a totally biased Qpid developer, but I reckon we should fit your bill. We've just (as in, yesterday) released M2.1 which is the Best Qpid Ever.

Our out of the box experience is pretty much "download, run broker, attach client" with no configuration required, we even ship "reliable, slow" and "unreliable, fast" configs for your tweaking needs.

- Aidan (who found this while googling something else, hi)