Relatively recently we've started to see that we should add another layer - a Cache. To be honest, from what I can see, Memcached has pretty much got this sewn up by virtue of being awesome although there are other technologies like APC. So I'd like to coin a new phrase - the CLAMP stack for Caching LAMP. I can't find a reference to it so I'm going claim it as mine. MINE. MUHAHAHAHAHAAH. Maybe in the future it will make me famous. Maybe. *cough*
Even more recently I think there's been a need for some sort of new layer.
At 6A we use TheSchwartz as a reliable asynchronous job manager - you sling a job onto the queue and at some point in the future it will get done and you don't have to think about it. Twitter has recently released Starling which does similar and, in a nice bit of circularity, speaks the Memcached protocol.
I also know of at least 4 or 5 other big places with similar bits of software and at least one of those is built on a PubSub messaging system.
When I was at Yahoo! we had a really successful TibCo Rendezvous a-like for doing both asynchronous jobs (in this case publishing news pages and baking hundreds of run time templates from language agnostic seed templates) and for more traditional PubSub tasks such as broadcasting messages without really caring who's listening.
I like PubSub messaging systems. No, I love them. Or, rather I love the idea of them. In practice I've had mixed results especially with ActiveMQ which, in my experience anyway, had far too many rough edges, to many cool and quirky features when I all I really wanted was bullet proof basic functionality. And by basic I mean - high availability, fault tolerance, reliable messaging, queues, topics and some sort of message selectors (I have to admit, I did like ActiveMQ's ability to have SQL like message selectors as well as glob/path based ones even if I did get bitten by some quirks).
The cool thing about PubSub systems is that when you start using them you end up finding uses for them everywhere.
But the problem is that all the open source ones kind of suck.
I don't mean that to diss the implementations out there it's just that they're trying to do too much for what I actually want them to do.
One of the reasons that Memcached has been so successful is that it does one thing and does it well. It's incredibly easy to set up so that you can integrate it into your app from a very early stage. Then, as your app gets bigger, you just add more Memcached servers and update your config. With Consistent Hashing you can have a virtually downtime free system in a way that moistens the gussets of the kind of people who habitually use phrases like Enterprise Class Turnkey Systems without sniggering and/or dying a little inside.
So what I want from a basic queue is
- Reliable / Guaranteed Delivery
- High Availability / Failover
- Optional Ordered Delivery
- Incredibly easy to set up
- Language independent
from the "would be nice" pile
- Selectors of some kind
- Ability to tune reliability vs performance
- Retroactive consumers
- Topics / Broadcast messages
What I actually want out of this is a kind of de facto standard like Memcached or Apache or whatever that's so simple to understand and use that it changes the way people develop their apps such that they just start with the assumption that some of their processing is going to be done asynchronously and build it in right from the start.
This means that, in the future, when their site starts to get much bigger the async stuff makes it really easy to scale. Then, at a later stage, they can decide that they want to start broadcasting events - so that their Atom Update Stream, Logging server and notification Server just learn about a new post without their web app backend having to know about any of them. Later, when they add a Search Index, they don't have to change anything else. All these things should be almost trivial.
I've been thinking about this on and off for, oooooh, about 6 years but really only thinking seriously about it for the last few months, Basically I've been trying to mull stuff over in my head to get the shape of the problem. It's still somewhat indistinct - there's a fine line between sufficiently featureful and over engineered, between reuse of existing technology and not adding any extra value.
I think the requirements I specified above are the right feature list - for me at least. And, after bending the ears of several much smarter people than I they seem to be roughly in the right ball park.
I'm also leaning towards having multiple pluggable back ends with a sensible default. This seems like a good compromise between simplicity, flexibility and 'future proofing'.
I'm umming and ahhing about having pluggable protocol adaptors. No two protocols are exactly the same so you're almost always going to have a lossy conversion which seems like a bad thing to encourage. On the other hand it helps with adoption and fitting in with existing Ecosystems. So it ought to support one, some or all of AMQP, STOMP, the Memcached protocol or its own home-brewed affair.
I'm not the first person to think about this by a long shot - I really like what Audrey and Jesse did with IPC::PubSub for example - and I really need to spend some serious quality time it as well as with with RabbitMQ, Spread, QPid and probably ActiveMQ as well if only to see if they'd be good candidates for being back ends for a start.
It's weird - I'm kind of desperate for someone to come up to me and say that this is a stupid idea. Or that it's not needed. Or not doable. Or that's someone else has done it. I think that's why I've been telling people about it so that they'll talk me out of it because, I think deep down, I don't want to get drawn into it - probably out of a combination of laziness, fear of failure and fear of the kind of success wherein suddenly something kind of consumes your life.
It's also possible I'm just over thinking things and that I should shut the hell up and JFDI.
Which is probably more likely.