It's been a constant bug bear that Mariachi was so slow with large number of files because of its refusal to split stuff up into months.
The two reasons it's so slow is that firstly everytime you add a new index page each index page has to be regenerated to accomodate the new pager.
My solution is for each index page to 'know' its number and then include, via an IFRAME a pager.html, passing its number.
Best case scenario is that the item gets added at the end and only one index page will need to be regenerated.
Worst case scenatio is, of course if an item gets added right at the start (not that common) then we'll have to regenerate every page but hopefully that won't be more than a few hundred pages which isn't so bad.
The second reason it's so slow is that in this worst case scenario you have to regenerate every mail page as well (to update their pointer back to their index page) which is incredibly slow since you have to parse and then format potentially thousands of emails.
The solution to this is to have each email page include, in an IFRAME, $item_id.index.html which would contain its index page number. That way you wouldn't have to rewrite every item page (which would be expensive) just every $item_id.index.html which would be much, much cheaper.
Some other suggestions from Tom Insam and Ian Malpass amongst others include building in some flex space into the index pages (so they can have between 100 and 120 items per page for example) which should reduce the frequency of having to do a full rebuild dramatically. In fact you could even have some annealing where the overspill is spread between $n subsequent (and prior if possible) pages which would probably prevent you from having to ever do a full rebuild.
If that wasn't available then one could fall back to having a CGI script which 301s to the correct index to provide all the links. This is, apparently, what Livejournal does. I'm less enamoured with this idea since the whole reason of having static pages is to make it easy to create an archive. Having a working CGI environment is just an order of magnitude more complicated than I really want.
Ian felt that the pager could be generated on the fly as long as each index knew its number (which it will) and knew the total number of mails (which is eminently doable). And, because he hates iframes, he thought instead of embedding iframes in the email pages, one could generate scripts and have the pages generate that instead.
I quite like this idea although, again, there's the issue of caching.
I'm less enamoured with his idea to have a $message-id.tmpl file for each email which contains the parsed and formatted email and then some place holders like $INDEX, $NEXT, $PREV etc. When you wnated to regenerate the file you'd just grep for those tokens and replace them with the correct numbers.
The big problem, of course, is that this requires parsing HTML (or just hoping that your marker sequence doesn't appear anywhere else) which is, in and of itself, a slow and tedious process.
I really need to sit down and start experimenting with this stuff.