A thought occurred to me about why it's taking some users so much more time (and a larger corpus) to get good results. I was getting very good results (98%+) at 18k words in each corpus, my home account is getting good results with 3k good words/6k bad words (99.81%), and yet someone recently pointed out that he was at 70k words (total) and still at 70%.
Here's the thought/question:
Both at home and at work, I filter my newsletters before they get scored by the bayesian filter and so I never correct on / train the newsletters into the bayesian corpus. I'm wondering if the people who are getting good results either a) don't get many newsletters or b) do get newsletters but filter them and don't train them into the Bayesian filters as good, whereas people who are getting poor results might be training on the newsletters and, thus, requiring a larger corpus before getting satisfactory results.
I ask because a lot of newletters can have a spammy look to them and just wondering if this could be a factor. Just a question . . .