Simon Wistow (deflatermouse) wrote,
Simon Wistow
deflatermouse

trimming bloglines

After a brief discussion with someone about so called 'folksonomies' (euugh) and whether there was duplication of tags leading to bad data someone else mentioned that you could download the current 10,716 Bloglines categories.

So I threw togther various Linugua::EN::* modules and got this which takes a few seconds to run over the whole thing and trims the list by a healthy 2000 odd items (removing the stemming stage means you only trim 1520).

If I could get hold of an ISO format English thesaurus then I reckon I could probably trim an extra 1000 odd off.

Tags: bloglines, english thesaurus, extra, iso format, odd, reckon, stage, stemming, trims
Subscribe

  • 90s Music Moving

    I realised that it's stupid posting my 90s Music Monday stuff to both LJ and Vox every week so I've decided that I'm going to keep LJ as my technical…

  • On Vox: 90s Music Monday

    At the same time that this was released The Divine Comedy also released a track called " Something For The Weekend" - or " Something…

  • On Vox: 90s Music Monday

    We're going a little leftfield here. And showing my age. This is one of the classic 16 bit demos - technically some of the graphics aren't…

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments