Log in

No account? Create an account

Sun, Dec. 9th, 2007, 10:21 pm
Everybody knows what you've been through

A repost of a review I did on the London.pm Website of the new O'Reilly Book "Programming Collective Intelligence".

The field of data mining is a tricky one to write about. For a start what you're mining depends on the nature of your business and the shape of the data - there is no one-size-fits-all technique, no off the shelf, drag and drop solution.

Secondly some of the techniques require some pretty tricksy maths and even if you do understand them then once they're applied you still have to interpret the results and tweak the multitude of input variables. Building a data mining tool - from a search engine to a collaborative filter to a genetic algorithm - is an art as much as a science or engineering problem.

So all that said, you should buy this book.

Reading it will help you understand why I just said all that. But it will also give you a bunch more techniques in your mental toolbox so that when you're looking at a problem you can think "Ooooh! I remembering reading about some problem like that" and then you can go pick up the book again and use it as a reference manual rather than reading it from cover to cover.

And there's a goodly number of techniques to pick up and there's a lot to cover - there are chapters on collaborative filtering and recommendation systems, clustering and group discovery, search and ranking techniques, document filtering, Bayesian classification, kernel methods and support-vector machines, and genetic algorithms, amongst others.

Each chapter gives an overview of the problem domain, gives an example problem and then walks the reader through a simple solution. The problems with the solution are then highlighted and various enhancements are shown.

The techniques are demonstrated in Python - although they are all clear, understandable and perfectly legible to any competent programmer, especially a scripting language programmer. Just enough detail is covered to give you a solid grounding without getting you bogged down.

In summary - this is well worth your 20 quid, even more so if you can get your company to pay for it. If you're working with existing data this may spark off an inspiration that will let you add some new features or up your accuracy. Or if you're presented with a problem this book may give you techniques that will help you solve it without having to work everything out from first principles. It's well written manual that'll handily expand your repertoire.

Mon, Dec. 10th, 2007 06:49 pm (UTC)

I just started reading this book - it's quite good.

Fri, Dec. 14th, 2007 10:33 pm (UTC)

That does look interesting! Have you seen the Information Foraging Theory book? It makes my head hurt a little but the principles are fascinating. Jared Spool's gone a long way with this.

BTW, are you still working on LJ search?