Rspamd Statistics (Bayesian filter)
Statistics is enabled by default, but it needs to learn before working.
Without enough learning, Rspamd skips the Bayesian filter with the following logs.
bayes_classify: not classified as ham. The ham class needs more training samples. Currently: 0; minimum 200 required
Bayes expiry module
According to the Rspamd statistic settings, create /etc/rspamd/local.d/classifier-bayes.conf to specify what to learn, and expire for the Bayes expiry module.
expire = 8640000;
autolearn {
spam_threshold = 6.0;
junk_threshold = 4.0;
ham_threshold = -0.5;
check_balance = true;
}
Reload Rspamd.
sudo systemctl reload rspamd
Autolearn
Rspamd automatically learns the obvious ham/spam.
(See Autolearn configuration for more details.)
rspamd_stat_check_autolearn: <mail id>: autolearn ham for classifier ‘bayes’ as message’s score is negative: -4.80
If you receive a sufficient number of incoming emails, Rspamd starts using the Bayesian filter after it learns more than 200 emails for both ham and spam.
Manual learn
If you already have ham or spam emails, you can let Rspamd learn from them.
Steps:
(See doveadm-search manual for samples and details.)
- Extract ham/spam emails from dbox
- Save emails as
.emlfile - Learn extracted ham/spam emails
Create the following shell script.
mailbox="INBOX"
user="user@example.jp"
if [ -d /tmp/eml ]; then
rm -rf /tmp/eml
fi
mkdir /tmp/eml
doveadm search -u "$user" mailbox "$mailbox" ALL
while read guid uid; do
doveadm fetch -u "$user" text mailbox "$mailbox" uid "$uid" > "/tmp/eml/${mailbox}_${uid}.eml"
done
Execute this script as root:
# bash ./extract.sh
Execute Rspamd learn command.
rspamc learn_ham /tmp/eml
Extract spam emails and learn them with the following command (if you have spam).
rspamc learn_spam /tmp/eml
Don’t forget to clean up the emails you extracted.
sudo rm -r /tmp/eml