Project documentation
- A set of Frequently Asked Questions (FAQ).
- The SpamBayes wiki exists to let the users
and developers of SpamBayes cooperate to develop documentation, share tips and recipes,
and generally help each other out. It would be great to see documentation improvements,
hints and tips, scripts and recipes, and anything else (related to SpamBayes) that takes
your fancy added here.
- Instructions on installing Spambayes and integrating it into your mail system.
- The Outlook plugin includes an "About" File, and a
"Troubleshooting Guide" that can be accessed via the toolbar.
(Note that the online documentaton is always for the latest source version, and so might not correspond exactly with the version you are using.
Always start with the documentation that came with the version you installed.)
- The README-DEVEL.txt information that should be of use to people planning on developing code based on SpamBayes.
- The TESTING.txt file -- Clues about the practice of statistical testing, adapted from Tim
comments on python-dev.
- There are also a vast number of clues and notes scattered as block comments through the code.
Search the mailing lists
A quick-n-dirty google search interface for the mailing list archives - put your search terms in the box with the existing ones:
A useful(?) glossary of terminology
- Bayesian
- A form of statistical analysis used (in a form) in Paul
Graham's initial "Plan for Spam" approach. Now used as a kind of
catch-all term for this class of filters, no doubt horrifying
statisticians everywhere.
- corpus
- In this context, a body of messages. Usually referring to a
training database. (plural is corpora).
- false negative
- A spam that's incorrectly classified as ham. Also
abbreviated as "fn" or "FN".
- false positive
- A ham that's incorrectly classified as spam. Also
abbreviated as "fp" or "FP".
- ham
- The opposite of spam; not necessarily email that you want or
that you asked for, just anything that's not unsolicited bulk email.
There is a second use for the term which means an email message which
SpamBayes classified as good email. That doesn't mean it's so, just
that based upon the evidence provided to the classifier it looked like
good email. (See also: spam, unsure.)
- hapax, hapax legomenon
- A word or form occurring only once in a
document or corpus. (plural is hapax legomena).
- spam
- Broadly speaking, any email that's not wanted by the
end-user. More specifically: unsolicited bulk email; email that you
do not want and did not ask for, and was sent to a whole bunch of
people by automated means at the same time it was sent to you. This
definition deliberately excludes viruses and those stupid jokes sent
to you by your Aunt Tillie. There is a second use for the term which
means an email message which SpamBayes classified as bad email. That
doesn't mean it's so, just that based upon the evidence provided to
the classifier it looked like bad email. (See also: ham, unsure.)
- training
- The process of feeding spambayes some sample spam and ham
messages, to teach it what to look for.
- unsure
- An email message that could not reliably be classified as
either ham or spam.
|