I haven't bothered to play with the timings, but attached is my current efforts. I've got a 10 by X variable width font engine working which I'm pleased with. I had it at 8 high to begin with, but 10 lets you have hanging letters and for what I want to do I can have 5x11px rows (10+1 space) and then 9px for a status bar at the top.
Conveniently it takes almost no RAM and doesn't need much in the way of a buffer though it's just fast enough at 1MHz (fine because only a few characters change at any one time), at 8 it's more than quick. Fonts can be stored in program memory space (around 2k out of 32k for 8-bit ASCII) and the engine renders one character at a time into a 4x10 pixel block , compensates for overflow into the next block and adds an (ascii) space afterwards. It also pads up to 4 pixels on the end to ensure the entire string prints. The print string function returns an x-offset so instead of concatenating strings, you could do:
/* printstring(uint8_t * input_string , char intensity, uint8_t x, uint8_t y, uint8_t * x offset); */
printstring("ADC Value", 15,0,0,&offset);
printstring(itoa(ADC), 15, offset,0,&offset);
You might lose up to 3 pixels - as the previous string could have forced 3px of space maximum, but it's not such a big deal.
Ignore the little battery thing at the top right, I was experimenting. You can fit nearly 6 rows of text, allowing for some tails being cut off on the bottom row. The number of characters per line is variable due to the changing letter widths, but you should be able to get at least 32 full width characters worst case, and often more.
It's cool now it works, debugging programs via the screen is so much easier than simulating in Proteus and staring at logic traces!
Apologies for the blurrycam, my SLR is at home :(