Rspamd Statistics (Bayesian filter)

Statistics is enabled by default, but it needs to learn before working.
Without enough learning, Rspamd skips the Bayesian filter with the following logs.

bayes_classify: not classified as ham. The ham class needs more training samples. Currently: 0; minimum 200 required

Bayes expiry module

According to the Rspamd statistic settings, create /etc/rspamd/local.d/classifier-bayes.conf to specify what to learn, and expire for the Bayes expiry module.

expire = 8640000;

autolearn {
  spam_threshold = 6.0;
  junk_threshold = 4.0;
  ham_threshold = -0.5;
  check_balance = true;
}

Reload Rspamd.

sudo systemctl reload rspamd

Autolearn

Rspamd automatically learns the obvious ham/spam.
(See Autolearn configuration for more details.)

rspamd_stat_check_autolearn: <mail id>: autolearn ham for classifier ‘bayes’ as message’s score is negative: -4.80

If you receive a sufficient number of incoming emails, Rspamd starts using the Bayesian filter after it learns more than 200 emails for both ham and spam.

Manual learn

If you already have ham or spam emails, you can let Rspamd learn from them.

Steps:
(See doveadm-search manual for samples and details.)

Extract ham/spam emails from dbox
Save emails as .eml file
Learn extracted ham/spam emails

Create the following shell script.

mailbox="INBOX"
user="user@example.jp"

if [ -d /tmp/eml ]; then
  rm -rf /tmp/eml
fi
mkdir /tmp/eml

doveadm search -u "$user" mailbox "$mailbox" ALL
while read guid uid; do
  doveadm fetch -u "$user" text mailbox "$mailbox" uid "$uid" > "/tmp/eml/${mailbox}_${uid}.eml"
done

Execute this script as root:

# bash ./extract.sh

Execute Rspamd learn command.

rspamc learn_ham /tmp/eml

Extract spam emails and learn them with the following command (if you have spam).

rspamc learn_spam /tmp/eml

Don’t forget to clean up the emails you extracted.

sudo rm -r /tmp/eml