I just got back after 4 days away and had to deal with the morass of email, feeds, todo items and Tweets that had built up in my absence and it occurred to me that the way I deal with these datum is very much like how various Garbage Collection schemes work: In general I use a Ref Counting system similar to how Perl does GC - I mentally keep a tally of how much external information is needed for an email and when that reaches zero I reply. This has the benefit of timeliness in much the same way a Ref Counted GC guarantees timely-destruction on scope exit. The disadvantage is that, like Ref Counting, it's a lot of mental accounting and constant work to keep it under control. It tends to leak slightly so that the number of unhandled items slowly creeps up or some drastic event happens and it shoots up. In those scenarios I tend to do roughly the same as jwz "When my main Inbox folder hits 2000 messages, I sit bolt upright for 16 hours and cause it to contain less than 200 messages."
This is the equivalent of a stop-the-world mark-and-sweep GC a la older versions of Java and it's main disadvantage is it brings everything else in your world to a grinding, screeching halt - this is why Java apps occasionally lock solid: because a GC run has started and nothing can interrupt that.
I should probably start using a more modern scheme such as a Generational (sometimes known as Ephemeral) Concurrent Garbage Collector. This would mean that certain emails of different ages or priorities would be dealt with sooner and the rest when I have idle time. This sounds ideal except that it still requires an initial triage run and would also require much more sophisticated scaffolding for accounting and record keeping.
I actually wrote this about 2 years ago and, for some reason, never posted it but I was having a conversation about it recently and figured I might as well publish. And be damned. Searching isn't a one step malarkey. Ignoring tokenising and stemming, thesauruses and normalisation you have to select a working set of documents and then rank them. How you rank them isn't a simple thing either - it's generally a multi faceted score based on a whole host of factors such as some sort of document score like Lucene's, recentness or some sort of trust metric like Google's PageRank amongst others. That's enough to get you a basic result set but it's very passive. Shouldn't a search system react to its users and learn what they like, like a particularly skilled lover who is handy with a bottle of baby oil and a lightly blunted cheese grater? So, you can build a click tracking network which is a form of neural network. You train it using the raw tokens in the query (note, the raw tokens, not the normalised, stemmed and otherwise munged ones), what the results shown were and what they clicked on. Chaper 4 of " Programming Collective Intelligence" has some great stuff on this but, in brief - you build a multilayer perceptron (MLP) network consisting of, err, multiple layers of nodes. The first layer will be the tokens entered in the query, each token representing one node. The last layer will be the weightings for the different matches that were returned (each match being one node). The middle layer (MLPs can have multiple layers but in this case we'll only use one) is never contacted by the outside world but it's still very important. All nodes on a layer are conceptually connected to all the nodes on its neighbour layer(s). First off we train the network - when someone searches for 'polish sausage' and then clicks on Document A we use a algorithm called back propogation to strengthen the connection between the Document A node and the nodes for 'polish' and 'sausage' using the hidden, middle layer. Then the next time someone searches for 'polish sausage' we turn on the nodes representing the tokens and they in turn on nodes in the hidden layer and when they get a strong enough input they'll turn on nodes in the output layer. In the end the output layer's nodes will have been turned on a certain amount and we can use this amount to calculate how strongly certain words are associated with certain Documents. We then adjust the search ranking accordingly and viola! Bob is your Mother's Brother. This turns out to be useful because, if someone was to search for 'furniture polish' instead we can learn from our users that Document A is useless and instead rank Document B higher. Observant readers will have noticed that this mechanism seems startling close to Contextual Network Graphs and they should be applauded. There are differences however and, unfortunately, the margin of this email is too narrow to contain them. The neat thing is that we can do this at more than one level. We can track the clicks of everyone and we can also track all your clicks and those will give us different weightings which we can combine somehow to (such as, in the simplest case, use your weighting unless you haven't rated this query in which case fall back to the crowd's rating) to tweak the scores that way. Or, if we know who your friends are, we can combine their weightings too, perhaps factoring how much we think you like them into the equation too. Now, at first, unless you were dealing with a closed social network, that may have been tricky but now we can utilise the Open Social Graph for maximum funitude. The clouds will part, birds will sing, rainbows will, err, bow and all will be a shiny and happy future with jet packs and hovercraft for all. Apart from the pesky privacy thing raising its head again. For example - say you only have one friend on this social network. When you're logged out a search for "Donkey" gives you jolly pictures of the aforementioned beasts, perhaps wearing jaunty hats or chilling at the beach Then you log in and do the same search and suddenly the same search brings back images of a more, shall we say, interspecies erotica nature. At which you have some handy blackmail material for you single friend. Ah, you might say, can't we guard against that by, say, only using your friends' data if you have more than a certain number of friends? See, if there's one thing we learnt is that patterns can be extracted from even carefully anonymised data such as the NetFlix recommendation data or, more pertinently, the AOL search results leak. So, if we implemented this data it's possible, nay likely even (since the universe seems to tend towards maximum lulz) that someone would figure out a way to extract the data and work out who had been clicking on what and suddenly your clownsuit-wearing muskrat fetish is public knowledge. Which, you know, is nothing to be ashamed of per se but might lead to awkward small talk at the next party you go to.
I have to admit that I haven't played around with Git much beyond familiarising myself with the commands enough to get by with the various projects I know that are using it. That's not to say I don't appreciate aspects of its design - it has a certain inherent elegance that seems to invite experimentation and it certainly follows the Unix philosophy of small tools designed to be chained together which also facilitates that. But to be honest I've just not needed to use it. I use SVN which, for all its flaws (I've never really had a complaint about speed to be fair but every time I need to roll back I have to go looking for a blog post I read one time that explains how to do it) does me just fine. When I need to do more distributed work I just fire up the awesome SVK and Bob, is as they say, your Mother's Brother. One of the questions that niggled at me though, especially in the wake of the democratising GitHub, was whether it would encourage siloing. Back in the dark days when modem speeds were measured in baud and people still thought Digital Watches were a pretty neat idea there was the NCSA http daemon and Lo! many people maintained various patch sets against it and, when you wanted to run it you went and downloaded the main package and then you downloaded all the patches you wanted and then you tried to apply them, massaging bits here and there where the patch set hadn't tracked the main app. It was, to be frank, a giant pain in the huevos. And then came Apache. Literally "A Patchy Webserver" which rekickstarted (is that even a word?) the stalled development of httpd and collected together the various patches. In my opinion this was a good thing - it's possible some people disagree I suppose - and I fear a return to those early, bad times. And what's this got to do with Git? See, Git was designed to help with Linux and Linux doesn't really have a master code base - it has various trees representing different flavours and patches flow between them like, I dunno, bad ideas at a conspiracy theorists' convention. Or something. That model works well for Linux. Maybe. I'll presume it does. Git eases the pain points expressed by Andrew Morton, the maintainer of the -mm tree, in this message entitled "This Just Isn't Working Any More"In short, it makes siloing easier. And that's awesome for them. But it's not what I want for 99.99% of open source projects I use. No matter how good the tools are I don't want to be spending time tracking fixes and feature patches round various Git repositories and assembling my own custom version. And as a developer I live in fear of someone saying "I know everyone else is running your code perfectly but when *I* run it under FooBar v1.6 with patches from here, here and here then it fails mysteriously" I did console myself with the fact that this was a worst case scenario and that it was unlikely to happen for small libraries ... except this morning I was chatting with someone and
17:12 * jerakeen finds that someone has actually fixed the ruby-oauth code to actually
_work_
17:12 <@jerakeen> assuming you pick the right one of the _27_ forks of the codebase on
github.
then later
17:17 <@jerakeen> I forked pelle-oauth a while ago to make my local code actually
_work_, because that was important to me
17:17 <@jerakeen> so it was going to get siloed _anyway-
17:17 <@jerakeen> thanks to github, I can tell who else has done what, and where things
have gone
17:17 <@muttley> did you push the changes back?
17:18 <@jerakeen> no, not at all. My changes were the equivalent of duct-tape round
things. I ripped half of them off from the mailing list, which was 30
messages deep in a discussion over what was the Right Way to do it
Now to be fair to jerakeen - he's one of the smartest and most pragmatic programmers I know so I'm pretty sure he doesn't enjoy the situation but he was forced into it by the prevailing development methodology of the library he wanted to use which ended up that way because the version control tool it uses explicitly encourages it. To use an old and overdone meme 
I'm hoping this is a one off case and that people are just relearning good development manners after forgetting them when presented with a new shiny, sparkly toy. But the cynic and the pessimist in me died a little inside.
My constant prattling on about message queues has been going on for a while now. Since then I've been looking long and hard at RabbitMQ and QPid, both of which look hella sexy and very, very promising. However, at the moment both still don't quite pass my "as easy as memcached to set up test". Given that I've been drinking with known both the Dopplr and the Flickr team for a while (and borrowed desk space off Dopplr in return for tea every morning and Pho once a week) it's not surprising that I've bent their collective ears about this more than once. They're both big advocates of message queus - witness recent blog post from both of them. However something else that I've ranted at talked to them about is the concept of premature scaling. There are certain semi-best practices that you can do to make web sites scale (both technically and socially). Stuff like sharding for example. Later, if you get really big you'll maybe want to internationalise. But you obviously don't want to do all that from the get go when you're only 1 person working on a VM somewhere on some shared box in a colo. So there's a line somewhere about what you should do upfront and what you should leave till later. It's a grey area and it's different for every site but general consensus was that making sure you're not using auto incrementing primary keys on tables you might want to shard is probably the right thing but leave your i8ln till later (although you always want to keep that sort stuff in mind and gnerally structure your code to make it as easy as possible later). Obviously, being me, messaging came up. Now there's really two types of messaging - one when you want to broadcast an event to an unknown number of listeners (let's call that PubSub) and one when really you're just using it to do asynchronous tasks (let's call that Job Queues). PubSub very much falls into the realm of "You don't need this now" I think. Job Queues on the other hand - they can be very useful. From generating thumbnails to sending notification emails or SMSs - none of this needs to be done to render the page so you might as well shunt it off to be done later when you're not desperately scrambling to return some HTML to the user as fast as possible. LJ uses a system called TheSchwartz (the name is actually not one but two semi-elaborate in-jokes) for this stuff. Dopplr uses ActiveMQ I believe and Flickr use their own internal system. TheSchwartz is fantastic. It scales for a start. And it runs off a database so it's easy to set up and admin (since you already presumably have a database running your site). Tom Insam (who works at Dopplr and is somewhat of a mad genius even if he does have girly long hair) recently pondered about having a really simple, database backed message queue that used STOMP as its interface. Then, when you got really big, you could switch over transparently to using one of the big boy queues. "Hmm," thinks I, "I wonder how hard it would be to make a STOMP interface to TheSchwartz" So I started hacking on a generic STOMP sever with every intention of just do a very simple layer over the top of TheSchwartz. Then generic layer took all of, ooh, about 30 minutes to write. What I should have done then is then write TheSchwartz specific layer which would probably have only taken another half an hour or so. Instead I started sketching out in my head how you'd write a real proper PubSub message broker. A couple of hours later I'd done most of it - only stopping when the dev machine I was testing on suffered a prolonged power outage. So here we go. This contains both the generic layer (the simple bit) and the actual broker (the tricky bit). How it works is that a Parent server starts up and starts listening. When a new STOMP client connects it fork()s off a child which listens for new STOMP frames and reacts accordingly. The clever bit is when a message comes in. The Child serialises the frame and sends it via a socket pair to the Parent. Then parent then sends it back down via another socket pair to all the Children. The Children then decide whether or not to send it to their Client depending on what SUBSCRIBE frames they've received. Queues (first come, first served) are a bit more complicated than Topic (broadcast) messages though. In that case the Parent also creates a semaphore and the Child then tests to see if their the first to decrement it. If they are then they send the message. The last Child to check removes the Semaphore. Now, don't get me wrong - this is very, very alpha quality at the moment - I'm pretty sure the queue implementation is duff and also that it's liable to fork() bomb your machine at a moment's notice but it shouldn't be too hard to clean up. However I doubt it will ever be Enterprise class. For a start it's all memory backed at the moment but even when I write a planned subclass that uses a DB and TheSchwartz for persistence it's just not careful enough or efficient enough to properly scale. It might still be useful for small sites though but really it's an intellectual exercise. It is incredibly simple to use though. In most cases just running
stomp-broker --daemonize
should be enough. Tomorrow though, I'll finish off the far more useful STOMP interface to TheSchwartz during our weekly Hack Days and make it as easy to install and run. The title of this post refers both to the fact that the Job Queue might be useful despite being so simple and also to the fact that I'm a dumbass for not doing the useful thing first and fiddling round with the intellectual geejaw instead. Oh, and it's catchy Roger Alan Wade song used, appropriately, for Jackass.
Someone sent me a link to this post entitled " Desktop Linux suckage: where's our Steve Jobs?" and it tickled something in my booze shattered brain. Actually, it tickled three thoughts. Sometimes I wonder if my brain works entirely in parallel because it tends to operate as either off or a screaming cascade of constantly collapsing and expanding thoughtwaveforms. It occurs to me that that this is very much a fork bomb. And that if my brain does run in parallel then I can guarantee it has race conditions. So, the three somewhat related thoughts were:
- When the Gnu project started up people said that it couldn't do more than replace Unix utilities. Then there was emacs and people said that that was the very limit of the ability of free software teams to work on apps. Then the Linux kernel came along and people said "Well, fine but you can't produce enterprise class Desktop apps" and then OpenOffice got made and now people have conceded that particular point but have pointed out that it's all well and good but the fundamental model of Free and Open Source Software (i.e that there's an itch a programmer wants to scratch) pretty-much precludes having a usability person in the mix from the start and that this means that free software will always look fucking gash. History tells us not to underestimate the ability of open projects to evolve. Note: this history is truncated, inaccurate and intended only for illustrative purposes only
- Despite the horrifically broken app building model utilising the unholy trinity of AJAX, HTML and HTTP we seemed to have hit a fairly happy center ground between UI designers and programmers. If someone had proposed a programming language in which, in order to create a GUI app you had to use one scripting language on the front end, another language on the backed, do everything using largely stateless function calls and then create your GUI, which has no standard widget set, by dynamically generating a verbose, textual markup language by concatenating strings together ... you'd have quite rightly bitch slapped like they talked dirty about your daughter. It's like people took the worst features of X, Swing and a whole host of other GUI ideas, threw away the good bits stuck it all together using peanut butter.
Yet that's the state we're in at the moment and weirdly it's working out quite well - we're getting pretty looking web apps and the trolls doing the backend and the eloi doing the pretty bits need the minimum of contact. Hell, with mashups et al they need never even meet. Frabulous joy!
Interestingly we seem to have wended our way into a situation quite close to the original MVC model (i.e not what Web Framework people call MVC) and also, rather fascinatingly, sort of what Microsoft wanted with VB and COM.
- In making Linux and the other free Unixen ubiquitous on the Server we may just have killed apps on the Desktop. Which I guess means that Linux on the Desktop becomes obsolete. Which I suppose means we win after all. Maybe.
My building, like many here in the fair City O' Fog, has a door entry system which calls a phone number - I can then buzz the door open using the hash pound octothorpe key. This is handy because even when I'm not home I can let people in. Or if I forget my keys then at least I don't have to wait around outside. The disadvantage is that I have to have reception. Ironically the worst place for reception is in the apartment. Also, quite often either I or my flat mate are abroad so if you phone one of us then it might well be silly-o-clock where we are. So what I want is use something like Grand Central or maybe an install of Asterisk or Adhearsion set up with a phone number that calls multiple mobile phones when we're buzzed. However, there could also be a motion detector (or something that detected our phones connecting to the wireless) in the apartment so that it would also add the land line. Furthermore it could check our Dopplr feeds and work out if either of us were out of the country and then remove them from the "to call" list. Because, you know, having a door entry system you have to debug rather than just fricking work sounds like a wonderful idea.
The latest buzz on the hyper insular echo-chamber-nets is Fake Following. Shiny! Fresh! A ground breaking new paradigm leveraging social synergies in a cross mind share mashup or something! Hauntingly familiar to be to being able to do the same thing with " Default View" on LJ though. I have to admit though, I do like the idea of a " pause" button though, even if it's just a nicer UI for the same thing.
So a while back I mentioned using Contextual Network Graphs as translation tools then said nothing more about it. Then at Google I/O I saw a talk by Jeff Dean that talked about sort of similar ideas to do with a statistical approach to machine translation (the relevant stuff starts at 40:53). So my idea was this. You take a corpus of text in a given language, the larger the better, and every time two words appear next to each other you either create a link between them or make the link between them stronger (depending on whether you've seen that juxtaposition before). Then you do it again with another language. Then you analyse the two graphs too look for similar patterns. An easy way to do this would be to take a word that you know is the same in both languages and energise it in both graphs and then list the resulting energised nodes in both. So, for example, you might know have a corpus in both English and French with the phrases "dog sled", "dog house", "dog food", "dog catcher", "dog end" in the respective languages and you happen to know that "dog" in French is "chien". So you activate the "dog" and "chien" nodes with same amount energy in both graphs, wait a for the energy spread to settle and then list the energised noes and compare the fact that "house" has a residual energy of (completely arbitarily) 0.0034 and that "maison" has a residual energy of 0.0032. Which gives you a probability that the two words are the same. You do that with a few other words and then you end up with a bunch of statistics about which words are the same. Do that with a large enough corpus and you have a list of words you know each with a list of words in the foreign language that it might be (with the probabilities that they're the same) - hell you can even find out the probabilities of what they mean in different contexts (by activating the energy of the surrounding words) so that you can know that "free" can mean "gratuit" or "libre" depending on the context. Now, this obviously has several flaws in it but when I did a rough and ready test a few years back I was pleasantly surprised by the results. And I'm kind of gratitifed that Google's approach doesn't appear to be completely unsimilar.
Thu, May. 22nd, 2008, 01:51 pm Pheltup
It would appear that Pheltup is running off the same technology that the stillborn Flume did. I believe it was originally called "Smoke and Mirrors" but has been renamed "Bullshitr" for marketing purposes. As NTK used to say ... injokes for outcasts. Completely unrelated observation: it's not attractive when grown men giggle.
People use Excel a lot. Abuse is possibly a better way of putting it. In fact they abuse the whole of the office suite - you've all received the three word reply wrapped up in a Word document or the PowerPoint document that just has a bunch of images in it. The difference is that some of the Excel abuses are actually pretty clever. There's those quizzes that get emailed around - guess the lyric, solve the puzzle. But even beyond that you get the Finance and Ad booking departments that run off a single shared Excel spreadsheet of such breathtaking utility that you kind of have to step back and gasp. It's so horrible! Yet so beautiful! It does everything for them - it tracks bookings and various sales stats and shows past performance and projects future trends. Sure, the programmer in you is recoiling in absolute horror - this isn't programming, this is a crime against compilers. Forget spaghetti code or lasagne code. This is macaroni baked with Velveeta. Yet like Mac'n'Cheez[tm] it works. It's delicious. It's a remarkably sophisticated app that does exactly what they want and does it fast. And they wrote it. The sales people wrote it. These people who seem to look down on programmers and who won't update the HTML for the copy on the web page because "I was never very good at maths at school". I'm being flippant of course. And despite some occasional minor differences, I've always got on with the biz dev teams in the companies I've worked for which is why it's sometimes baffled me that they won't even touch, say, HTML. The problem, I think, is familiarity. We know that the amount of damage they could do with HTML is minimal. Except, well, if they leave out a single character then maybe the whole page is blank. It's not a big deal to us - we know it's not hard and that anything is easy to fix but that's beside the point. We're familiar with HTML, with the Web stack. They're not. In much the same way we're not familiar with (plucks bogus example completely out of thin air) filling in form 64-8 for authorisation of a VAT agent. Yet they're familiar with Excel. And not just as a spreadsheet - as an environment. They know what makes it tick, what its capabilities and quirks are and how to make it dance in a way that you and I may never. I'm kind of reminded of this quote about welders "One of the unexpected things about watching the steel guys work is how the solidity of metal means nothing to them. Most people think of metal as something hard and inflexible, but welders don't. Which should be obvious in hindsight, I guess. But, for example, they have these saw-horses that are made of tube steel. And I can see how that came about: they needed some saw-horses; they had some steel. It took them 30 seconds to make them. And, an example with the stairs: the legs of the stairs' landing platform have big threaded bolts for feet, to fine-tune the height of the legs for levelling. And there are these steel tube sleeves that go around the legs, that drop down and cover the bolts. So when they were moving this platform in, they had to flip it over, and they didn't want the sleeves to fall off while they did this. Now to me, that job calls for duct tape. To them: they welded the sleeves in place, then de-welded them when they needed them to move again."
When you step back and look at it - Excel is a programming language. But not in the way that you and I think about it - with source code and a compiler and a execute/debug cycle. It's a cell based programming language where data and code are intermixed but never mistaken for each other. It's self modifying but easily observable. Changes are implemented trivially and immediately. And in parallel. Whilst most programmers have been struggling to get to grips with functional programming and Monads our brethren at the other end of the office have been doing it for years. And their IDE is better than any of ours. Hell it even has visual data and code dependency tracking:  Think I'm joking? Take a look at this article about building a 3D Engine in Excel. A 3D Engine with multiple rendering backends. And, if we try and shoe horn conventional programming metrics in it does it in 30 lines (and 24 columns) of 'code'. Including comments. To be honest I'm being a little light hearted here but actually my point is kind of serious. There's a new wave of programming paradigms - Map/Reduce, asynchronous execution, grid computing, sharded and column orientated databases and others. These aren't new ideas, especially in the academic world , but they're gaining more widespread acceptance. A cell, or to look at it slightly differently, a node based approach makes a lot of sense for a bunch of them. At my previous job we were used to dealing with huge quantities of data a day. Our rendering farm sat on the list of the worlds top supercomputers. We dealt with parallelism all the time - from Renderman to pixel and vertex shaders. We did our compositing using a program called Shake which is entirely node based.  Shake's kind of interesting (apart from Apple's slightly comical attempts to dissuade people from using it on Linux) - it's a very different way of doing image creation than what most people are used to. I watched with amusement the blogosphere cooing over the price drop for OS X and then giggled when they fired it up for the first time and didn't know what the hell to do with it. But my co-workers who really knew how to drive it used it all the time - need to put a mountain behind the crusaders. Use Shake. Need to create a facsimile of a bustling medieval London Bridge using nothing but a background plate of somewhere in Prague, some smoke elements and some video of a pigeon from outside. Use Shake. Need to create an icon for that new Shake node you just wrote ... use Shake. It's the same familiarity as the Excel users and the welders. We ended up writing a Node based programming language called Ripple that automatically went and deployed itself over the farm. It self balanced, passed variables and sorted out the DAG. You just strung nodes together and ran the script and tied into Alfred if you wanted a visual feedback and/or to reprioritise or delete tasks. If you wanted new functionality then you just wrote a new node type - we had nodes that did everything from skinning and compositing to generating dailies and emailing people. It was, to be frank, pretty cool. I've been lead to believe that several of the major banks have node based languages which do things like complex price matching or constantly take input from the market, ripple changes down through the DAG and give pretty much immediate access to the level of exposure that the bank. Want more capacity? Just chuck more servers in - it should just all work. There's no real conclusion to this other than - parallel, grid based computing looks like it should be hard but it's coming, it has significant advantages once you can get your head round it and as long as the tools are good it might actually turn out to be a better way to program.
I was trying to think of something I could do to play around with FireEagle and came up with something which both tickles my development fancy and also is so incredibly insular San Francisco navelgazing Wanking2.0 that I kind of feel compelled to do it. So, and excuse the hand waving here, the way it would work is this:
- You'd purchase a cheapo RFID reader from somewhere - the ones from ThinkGeek, Phidget and Parallax all look good.
- Hook it up to a computer and run TheSoftware which, as yet, exists only in my brain. You will tell TheSoftware what the physical location of your card reader is.
- Swipe your brand new card which will prompt you to register yourself with a remote, centralised service.
- This service will prompt you to give it FireEagle access.
- From this point on whenever you swipe your card over the reader TheSoftware will inform the centralised service which will, in turn, tell FireEagle.
In and of itself this is not very useful but if you had a reader at work then you could swipe in there in the morning and then swipe in at home at the end of the day (or if you work somewhere suitably large then put multiple readers around the place). And then if your friends started getting RFID readers and installing them in their homes then when you went round there you'd be able to easily let FireEagle know where you were. Hell if you could persuade your favourite bars and clubs to do it then you could do it there. Hook it up to your Social Graph and then you can easily work out where all your friends are. Then of course the data can be subpoenaed by the Government to prove that you're a terr'ist or something.
Yahoo! recently released FireEagle and jolly nice it was too - I've hooked it up to my Dopplr account and I have an idea of what to do with it of which more later. However things were a bit confusing - there was a Net::FireEagle on CPAN by Aaron Straup Cope yet also a Net::FireEagle::Client linked to on the FireEagle page itself and they weren't really that much alike. Because all of SF is a seething cabal I asked around and found that the CPAN version was an early version based on the old version of the API. And the ::Client version was somewhat lacking in things like, well, documentation. Or comments. Also it wasn't on CPAN which makes it somewhat of a second class citizen in the Perl world. So after a bit, I ended up taking over the both of them. I renamed ::Client to just Net::FireEagle, adding CPAN scaffolding, refactored the hell out of it, wrote a load of docs and some (very basic) tests and a nifty little command line script which also serves as an example of how to do the Auth Dance[tm] (which reminds me - the OAuth Auth Dance is much nicer than the Google, Flickr and especially Facebook one). And lo the updated version now resides on CPAN. It even has a user.
Since I don't have, and refuse to ever get, a Twitter account I have decided to summarise this weekend in the style of LoudTwitter because, yes goddammit I AM that geeky. - Friday night French Laundry
- Saturday afternoon A Luau complete with a pit baked pig
- Saturday night Awesome ice cream with two completely random new friends on the way to ...
- Saturday night (again) ... a house warming party where it turns out that the two random new friends who were just giving me a lift knew someone at the party.
- Saturday night (even later, more Sunday morning) Watch second half of Malaysian Grand Prix at Overtime on 7th and Harrison
- Sunday morning Tool hire, DIY, ladders, drilling
- Sunday afternoon the Big Wheel race down Vermont. Hilarity and Panda Bears ensued.
- Sunday evening post BYOBW pizza and beers and Jesus Camp at ydnar's.
Automatically shipped by My Hands
 Sixers (and Jesus) on Vermont
Living in San Francisco really doesn't suck.
Originally posted on deflatermouse.vox.com
WRT my last post I think I managed to spectacularly avoid saying the most important thing in my head which was this ... I don't think that's anything fundamentally wrong with any of the PubSub systems in existence at the moment. However most of them seem to have escaped from or are inspired by the kind of messaging you need at banks and other financial type institutions. This is great and many of the design goals are the same but they're designed to be complicated and complete from the get go. And this works for them. However I want something more like Memcached or Rails or similar - you install it out of the box and it Just Works[tm] and for 80% of people that'll probably suffice modulo some trivial tweaking. Then there will be another 10-15% of people who can take that base and after some simple to moderate modifications make it do what they want. There may even be a further 1% who can make it go even further but, at this point, it's diminishing returns and really if things were changed to make things easier for them it would compromise how simple things are for the 80%. And to be frank, the 1% would probably be better off with something designed from the start to do what they want. Not everyone wants Oracle - some people are just happy with SQLite and MySQL. Hell some people are more than happy with BerkeleyDB. And that's a good thing.
For years the default website stack has been something similar to the classic LAMP stack - originally "Linux, Apache, MySQL, Perl" but now really meaning "Free Unix, Free Web server, Free Database and Free Web Language" or just "OS, Web server, Database, Language" to be even more general and friendly to the Windows people out there. Relatively recently we've started to see that we should add another layer - a Cache. To be honest, from what I can see, Memcached has pretty much got this sewn up by virtue of being awesome although there are other technologies like APC. So I'd like to coin a new phrase - the CLAMP stack for Caching LAMP. I can't find a reference to it so I'm going claim it as mine. MINE. MUHAHAHAHAHAAH. Maybe in the future it will make me famous. Maybe. *cough*Even more recently I think there's been a need for some sort of new layer. ( Snip addled musings ... )
I'm a little discombobulated at the moment for various reasons but this idea popped in to my head and I have no idea how stupid it is. What better to test than to fling it the internet like so much poop and see if it sticks. Imagine if sites had a <meta name="searchurl" value="http://example.com/search?query=%q" />[1]tag in their headers. This would allow agents to autodiscover and utilise a site's search engine if one was available simply by substituting %q for a url encoded query. There could even be a type="..." attribute that gave the mime type of the results - Atom would be good. Although that could just as easily be done with Accept headers and the other standard mechanisms for negotiating types. Search engines could even use it to get better results from stuff like shopping and review sites. Of course there's a possibility (nay, a probability even) that it'd be co-opted by spammers and also you have to ask yourself - why would sites provide this as a service and who would want to use it anyway so it's probably one for the "WTF were you thinking" file but hey ho. I need more tea. [1] Although it occurs to me that <link rel="alternative" name="search" href="..." />might work even better.
There are tonnes of JSON modules on CPAN. Why do it one way right when you can do it a hundred ways wrong? JSON::Any mitigates some of these problems by abstracting away the interface so that you can use JSON, JSON::XS, JSON::Syck, JSON::DWIM ... Annoyingly JSON::XS completely changed its API between versions 1 and 2. JSON::Any dropped support for JSON::XS 1.x and now only supports 2.x. Until now. This patch feels somewhat dirty but, meh, what the hell, it works.
Yesterday got me thinking - I think it was the combination of the impromptu burlesque show at the flower shop, the gig, the pint, the synchronicity and the conversation but for whatever reason it got me thinking about Shelf again. Shelf is people orientated - it makes heavy use of the address book and finds connections between what you're doing now and people you know. Which is fine. But it could be cooler. Instead of just have a person as an initial seed for the clues how about other things? Starting simply - how about urls? There's already information out there about urls - for a start there's whether it's owned by someone you know. Or its stats from Alexa. Maybe its PageRank value. Then there's when you last visited it and how often you've visited it and what's changed since then. And whether you del.icio.us-ed it or Duggit or whatever. And whether it was mentioned in any of your RSS or Twitter feeds or emails. You could add notes to annotate it. The next natural step is your friends - what have they said? Have they added notes? When did they last visit it (ignoring the glaring privacy concerns for the moment)? Where did they go next? Hell, throw it open to everyone. What has the rest of the world got to say about this? Suddenly every page has comments whether they like it or not. And notes and errata. It's a Web! It's a Wiki! It's a Dessert Wax and a Floor Topping! And then there's places. You're looking at a museum or a gallery and it tells you what pubs and restaurants are nearby. And if any of your friends will be close by. Show you photos from the location. Throw in a map. Maybe some historical information or local trivia. Great for when you're sitting at your desk but even better when you're actually out on the street and you look down at your iBlackickreo95 and it's using Cell location or GPS to work out where you are. Listening to music? Album covers, lyrics, other albums, recommendations. Films? Stuff from IMDB - the actors, what else they've been in, awards, trivia, more recommendations from my friends. Nurse! Come quick! I think the restraints are coming loose.
I've been thinking about latent meta data for a long time. A long time. Partly that's because it's such a large and vague topic - the amount of data is large and meta covers everything that you can infer about it. In this case I've been thinking about how we can write tools to help us understand all the personal data we have knocking around. We have mountains of emails and contacts and web surfing history and conversations and other miscellania and the more we get the harder it is to organise yet perversely there are more rich informational pickings to sift over. I've written apps that listen on IRC and try and build a view of the world based on what's said. I've written stuff that indexes email corpuses and helps you rotate the data about any point. I've written secretarial bots that act as stenographers and note takers and who do calculations and lookups and go fetch things without you having to context switch, without you having to even ask in some cases, just like a really good PA should. I've written things that crawl Wikipedia and infer and answer questions. I've talked (ranted, really) about this sort of software alot to my friends, sometimes to the long suffering Tom Insam who seems to bear the brunt of more than his fair share of my insane ravings and half baked ideas. One of the things I got excited about was Beagle ( née Dashboard), the Gnome program that allows you to search all your information from a single interface. I liked its novel use of Clue Packets but in some ways it felt stale - unlike Dashboard you had to go type something in whereas Dashboard would infer from what you were doing. Something about that bothered me - it wasn't new enough I suppose. It was just an evolution of Windows Search and Sherlock and Spotlight. I want a PA, not a reticent knowitall in the corner I have to coax answers out of. Because he's not a whiny bitch like me and because he seems to have more JFDI than is humanely possible, Tom competely ignored all my frothing and has since come up with his own system - Shelf - which has been getting some heavy weight coverage recently. It harkens back to the Dashboard model and uses polling to workout what you're doing in what apps and heads from there. It's, to be frank, pretty sweet. But I'm still not totally satisfied and it boils down to this. I get distracted enough - I have IMs and IRC and emails and feeds and phones going all the time and I have to be careful because suddenly it's 3 hours later and the cursor is still blink accusatorily on the blank editor. What I actually want is something more Exposé like - I want to hit a key combination and everything that Shelf knows about the current context it can gracefully swoosh up in an achingly translucent overlay. The app can then either continuously scan what I'm doing or, for lower spec machines, it can just do its mojo on demand. This also solves the problem that, if you're running the Social Graph plugin it doesn't need to send every url you're looking at to Google (which, in and of itself is part of a wider Seperation of Personas theme of which more sometime in the future). But not everyone's like me so seperate the frontend and the backend. That way I can have my Exposé mechanism and info junkies can mainline clues and we'll all be happy. Apps could subsume the functionality giving richer integration through a common broker. Hell for those who just need to know what's going on in the same way Britney needs the limelight, you could make a meta app that streams their Shelf clues, and Twitter streams and RSS feeds and access logs and email notifications and Calendar updates in some sort of context firehose that you get to drink from. Throw in some special self-referential sauce so that it understands itself and Oh look Dave Winer's accessing my site and now he's written a Twitter about it and someone else has written a blog post about it and 3 of my friends have commented defending me and ... It's the sort of Intertwingularity that gives Ted Nelson a full on chubby. |
|