Simon Wistow (deflatermouse) wrote,
Simon Wistow
deflatermouse

trimming bloglines

After a brief discussion with someone about so called 'folksonomies' (euugh) and whether there was duplication of tags leading to bad data someone else mentioned that you could download the current 10,716 Bloglines categories.

So I threw togther various Linugua::EN::* modules and got this which takes a few seconds to run over the whole thing and trims the list by a healthy 2000 odd items (removing the stemming stage means you only trim 1520).

If I could get hold of an ISO format English thesaurus then I reckon I could probably trim an extra 1000 odd off.

Tags: bloglines, english thesaurus, extra, iso format, odd, reckon, stage, stemming, trims
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments