Production Mail Server Guide 2018

Building a Production Mail Server in 2018

Welcome to Infrastructure Week July 2018! New articles and tools every day this week.

It’s 2018. Do you know where your mail server is1?

What has changed in the world of email between 2009 and 2018? Let’s update my mail infrastructure and find out!

My mail machine started as a physical machine in 2009, had its disk duplicated to become a VM, then ended up moving around different hosts to stay alive for almost ten years. Everything was still running, but not quite well enough anymore. It’s time to deploy a new mail system for the next ten years2.

TOC:

How long do you think it would take to replace an existing system with basically an upgraded version of the same system? 2 days? 3 days?

It took me 8 days and I ran into 5 code defects resulting in bug reports across 4 different pieces of software. Welcome to the real world.

This page details how my setup works and how you can duplicate it if you want. If you’re impatient you can skip right to the files. Hopefully these details can help save you 5+ days of time if you attempt it yourself.

Big Picture

Here’s my deployment guide for 2018 mail server hosting including:

  • incoming mail server (plain text and TLS)
  • virtual user/domain hosting (system account not required)
  • outgoing mail server authenticated against your virtual user DB (TLS)
  • IMAP server for virtual users (TLS)
  • server-side spam filtering
  • server-side mailbox filing rules
  • automated backups (de-duplicated, compressed, encrypted)
  • real time IMAP login attack mitigation

These Times They Are A Changing

What is changing between 2009 and 2018 in the world of mail servers?

From Version To Version Notes
CentOS 5.3 Ubuntu Bionic 18.04 Ubuntu is more widely supported and actively maintained than CentOS these days.
postfix 2.10.2 postfix 3.3.0 Always postfix. Never change.
dovecot 1.2.2 dovecot 2.2.33.2 Doves of a feather.
crm114 2008-11-11 rspamd 1.7.7 crm114 was abandoned in 2009. I sure know how to pick ’em.
borgbackup 1.0.7 borgbackup 1.1.5 Previously used duplicity but borg is better in every way3.
borgmatic ini (2016) borgmatic yaml (2018) This is just a wrapper around borgbackup allowing readable configuration files instead of using command line arguments.
denyhosts 2.6 fail2ban 0.10.2 denyhosts stopped being maintained and fail2ban is the modern replacement with integrations covering more services and actions.
procmail 3.22 sieve via dovecot-pigenhole 2.2.33.2 procmail was last updated in 2001 and under modern mail systems is no longer useful. sieve is a bytecode compiled4 programming language designed for efficient mail filtering.
Maildir mdbox mdbox is a modern chunked append-only storage format for efficient on-disk mail access.

Main Changes

The main changes are:

  • CentOS replaced with Ubuntu LTS
  • crm114 text-only filter replaced with rspamd (which has over 400 active, passive, and statistical metrics for determining the spam rating for mails)
  • procmail rules replaced with sieve scripts for server-side mail filtering
  • Maildir replaced with dovecot’s mdbox storage format
  • dovecot’s antispam plugin was abandoned in favor of sieve support
  • RSA keys/certs supplemented with additional EC keys/certs in postfix and dovecot

Architecture: Email Data Flow

How does an email reach your inbox? One of the tricky parts about email infrastructure is each component duplicates functionality from other components:

  • postfix can write messages into mailbox storage, but so can dovecot.
  • postfix can check messages using your spam filter, but so can dovecot.
  • postfix can use spam block lists itself, but so can your actual spam system.

Historical 2009 Email Setup

In 2009, I set things up using system (/etc/passwd) login accounts for email which does limit how easily and freely you can hand out new accounts if you are receiving mail for many different users across many different domain names.

The 2009 email operations looked like:

[2009] Receiving Mail

  1. postfix [text or TLS] message arrives, hands off to
  2. procmail [command]
    1. runs spam filter script
    2. If spam score high, put in Spam mailbox
    3. If spam score medium, put in Unsure mailbox
    4. If spam score low, put in Inbox or use further procmail rules for filing
    5. delivers to ~/Maildir/

[2009] Checking Mail

  1. dovecot [TLS] verifies IMAP user login against system account
  2. dovecot [TLS] logged in user reads mail from their ~/Maildir/
  3. dovecot [TLS] if user remains connected with IMAP IDLE, monitor Maildir for changes allowing instant new mail notification

[2009] Sending Mail

  1. postfix [TLS] verifies user login against system account
  2. postfix [TLS] accepts mail for sending if login success

The system worked, but it has a few deficiencies for being a scalable hosted email platform:

  • it uses system users instead of virtual users, so every user needs a system account
  • because mail is system accounts, all mail must be routed using postfix virtual alias files
  • can’t change email login passphrase without changing system passphrase

But we’ve fixed all those problems in…

Modern 2018 Email Setup

The 2009 setup worked fine for years, but was never quite extensible enough and the spam filtering was never as accurate as I’d like. For example, my Spam mailbox had over 60,000 messages yet still got 5-10 “unsure,” but clearly spam, messages every day — the accuracy just stopped and wasn’t increasing.

Our 2018 setup is built on the best modern email infrastructure available:

[2018] Receiving Mail

  1. postfix [plain text or TLS] message arrives, consults (i.e. Milter protocol chats) with
  2. rspamd [localhost]
    1. runs over 400 checks on inbound message including:
      1. mail server block lists (known spammers)
      2. URL block lists and de-referencing URL shorteners (known scam URLs)
      3. fuzzy hash block lists (known spam text)
      4. encoding weirdness (is the message ALL base64? Does it switch character sets unnecessarily? Why is your message using IBM850?)
      5. are you spoofing a from address?
        1. Does this message have a From address on your own domain, but it didn’t come from “inside the house” as it were? (rspamd blocks CEO spam right out of the box with no user configuration, which is more protection than gmail gives you)
      6. multiple concurrent statistical spam filters you train
    2. If message is highly spam, rspamd tells postfix to reject the message right on the connection. The message doesn’t even hit your disk and is rejected directly on the SMTP connection.
    3. If message is either maybe spam or not spam, rspamd tells postfix to continue processing the message.
  3. postfix live rejects message completely if rspamd said mail is 100% spam
  4. postfix checks alias mappings to determine if user should be redirected before delivery
  5. postfix sends message to Dovecot LDA for delivery to user
  6. dovecot [localhost]
    1. queries virtual user database (sqlite in our case) for user’s virtual home directory
    2. runs global sieve scripts (if header :is "X-Spam" "Yes" { fileinto "Junk"; })
    3. runs user’s sieve scripts found in virtual home directory
    4. delivers mail into user’s mail storage
    5. if user has IMAP IDLE connection, notify user new mail arrived

This email receiving setup looks more complex than the 2009 setup because it has more moving parts, but it has big advantages:

[2018] Checking Mail

  1. dovecot [TLS] verifies IMAP user login against virtual user database (sqlite in our case)
  2. dovecot [TLS] logged in user reads mail from their virtual home directory
  3. dovecot [TLS] since dovecot manages mail storage itself, IMAP IDLE new mail notifications are handled internally without polling the file system

With virtual users, your email accounts don’t need system users so you can maintain multiple accounts across your system much easier. You can also use safer password methods like memory hard hashes or just run SHA512 in a circle for 5 seconds (time doveadm pw -s SHA512-CRYPT -r 6000000).

[2018] Sending Mail

  1. postfix [TLS] asks dovecot to verify user logins
  2. dovecot [localhost] uses virtual user database to check username and password
  3. postfix consults with rspamd using the milter protocol
  4. rspamd [localhost]
    1. sees mail is from localhost to the outside world, so assumes mail is trusted and doesn’t suggest blocking
      • (you can configure rspamd to act on spaminess of outgoing mail too if you don’t trust your users)
    2. optionally DKIM sign outgoing messages if you set up signing keys5
  5. postfix accepts mail (as suggested by the rspamd milter) and sends to recipient(s)

How Well Does rspamd Work?

Here’s my rspamd stats page after almost a month of uptime. Seems to be working pretty well.

rspamd Summary Stats

Each one of those red Reject messages were rejected at postfix before they were even accepted. Those 8k rejects never hit a mailbox.

Detail Stats

Why is my spam level trending down over the past month? I wonder if that’s just natural spam weather patterns or if spammers are starting to remove my address due to 100% rejections at the mail server level.

Doesn’t seem likely since the spam is largely from compromised residential fiber ISPs in china or hacked endpoints in eastern europe, but who knows how their systems report back?

“Most Spammy” Email Received

rspamd scores all messages with a point system. The higher your points, the more likely your message is absolute spam.

The default threshold for rejecting messages completely is 15 points. Here’s my rspamd stats console sorted by highest (worst) score. They are usually by combinations of:

  • known spam text
  • from known spam senders
  • and getting high spam scores on local statistical filters
  • and not having good mail hygiene (reverse DNS, MX records, SPF, etc)

Click And See

In fact, if we click on one of the message rows, rspamd shows us exactly how the score was calculated:

Symbols FUZZY_DENIED(12)[1:b75a1312a0:1.00:txt]
MSBL_EBL(7.5)[yuetanruoza@163.com,7406a20b103fa8b32a8eb345c7791b601b046b2b]
BAYES_SPAM(4)[100.00%]
MX_MISSING(3.5)[no records with this name]
INVALID_POSTFIX_RECEIVED(3)
HFILTER_HOSTNAME_UNKNOWN(2.5)
INTRODUCTION(2)
NEURAL_SPAM_SHORT(1.995264)[0.998,0]
IP_SCORE(1.765717)[ipnet: 112.80.0.0/13(4.84), asn: 4837(3.90), country: CN(0.10)]
SUBJ_EXCESS_BASE64(1.5)
HFILTER_FROMHOST_NORES_A_OR_MX(1.5)[renshxyd.info]
AUTH_NA(1)
FAKE_REPLY(1)
RDNS_NONE(1)
MX_INVALID(0.5)[greylisted]
MV_CASE(0.5)
MIME_BASE64_TEXT(0.1)
MIME_GOOD(-0.1)[text/plain]
PREVIOUSLY_DELIVERED(0)[future@matt.sh]
RCPT_COUNT_ONE(0)[1]
ASN(0)[asn:4837, ipnet:112.80.0.0/13, country:CN]

Look at some of those crazy sections! There’s even two penalty points (“INTRODUCTION(2)”) because “Sender tries to introduce themselves.” The “MV_CASE” penalty applies when an email has their MIME header formatted wrong (Mime-Version vs MIME-Version).

On the other hand, if messages are behaving, they get rewarded with negative penalty points. For example, when you have good MX records and reverse DNS and SPF, or you are a “known good” sender who has been replied to before, or your IP range is known to be trusted, etc. With rspamd you’ll often receive messages with “negative badness” scores from good email providers.

Adjust Your Adjusters

rspamd has over 400 individual criteria it uses to judge each message. The initial settings are sane defaults, but if you want to be more or less aggressive against some features, you can easily increase or decrease penality/reward weights as necessary.

Here’s a sample of default weights from the most-penalized email features:

Again, each feature is just a component of the entire score. It takes either a few big bad features to reach the 15 point reject range or many medium-bad features.

Anything below 15 points gets delivered to your mailbox and your mail filters can decide which part of the (-inf, 15) score represents “good” vs “maybe spam” vs “oops, actual spam got through” ranges.


A Detour Into Ansible

As you can probably tell, a minimal email setup has multiple moving parts:

  • an external mail receiver (SMTP, postfix)
  • an internal mail receiver (LDA, dovecot)
  • an internal spam filter (milter, rspamd)
  • an external IMAP server (IMAPS, dovecot)

In addition to extra features if you get super fancy:

  • advanced mailbox searching (solr / lucene)
  • antivirus (stop using windows)
  • web mail interface (exploit city!)

We want our email setup to be modern and easy to re-deploy, so we’re going to automate everything and not hand edit any config files on any servers.

Let’s use Ansible6 to setup, configure, and maintain all our services.

Ansible is a 5th generation configuration management system. Ansible runs 100% over ssh and can run operations in parallel across many servers, plus it doesn’t require local agents to be installed on remote hosts7.

Ansible lets you build up automation infrastructure at your own pace. You can do as little as just automating SSH commands to groups of servers then collecting all results, or you can go as far as using multiple Ansible files for fully idempotent concurrent deployments.

Terminology Overload

Unfortunately, Ansible has gone a bit overboard with how it names things.

To use Ansible, you have to remember all of these:

  • Roles
    • Tasks
    • Handlers
    • Defaults
    • Vars
    • Meta
    • Files
    • Templates
    • Modules
  • Inventories
  • Hosts
  • Host Vars
  • Group Vars
  • Playbooks

Got all that? Good.

Directory Fetishism

Also unfortunately, Ansible has extreme directory fetishism. Everything is a directory, even if the directories only have one file 99.999% of the time.

From their own docs:

Hey Ansible, instead of needing tasks/main.yml and handlers/main.yml and defaults/main.yml, you should let us just use files named tasks.yml and handlers.yml and defaults.yml in the role directory8!

Changes Ahoy

Ansible also isn’t afraid of changing their basic design as they release new versions, so it’s really a 50/50 chance whether any questions you lookup online were answered with new vs old syntax.

Example: A simple question on stackoverflow has the question in old/bad syntax and half of the answers are in old/bad syntax even though some answers are kinda new.

Online QA platforms haven’t evolved a way to mark answers as applying to certain versions or release date rages of software yet. I don’t know about you, but I’d estimate about 50% of my time searching for software answers online involves being confused by outdated solutions.

Abstract Abstractions

Lastly in this Ansible detour, as magical as automatic deploys can be, every config management system is still a false abstraction.

You can’t just say “hey config system, install vim and python3.7 and nginx” because each of those are packages tied to your individual operating system release, operating system package manager (of which operating systems now have 2-4 different ones in each release), individual OS service management mechanisms, etc.

So, at best we can automate our known-OS targets, which is fine for now. Just pick an OS distribution then automate config files, package installs, and service running based on your needs.

Jump Into It

We’ll use Ansible to install, configure, and maintain our mail services on a recent Ubuntu release (as of this writing).

Create a new VM9 if you want to follow along.

For our mail setup, we need to install and configure:

  • postfix
  • rspamd
  • dovecot

Plus (if you want to do it the right way):

  • deploy TLS keys and certificates
  • configure backups
  • configure mail virtual users
  • configure fail2ban

Luckily those first three are a common setup. I heavily modified Cullum Smith’s mail configuration to support my preferred usage:

  • loosen postfix receive restrictions to support virtual users and wildcard domain hosting
    • otherwise, postfix rejects mail because ijofajoifwea@matt.sh “isn’t a valid user,” but I want to receive mail for any made up address anyway
  • sqlite instead of LDAP for virtual user mappings10
    • and disable dovecot password cache because otherwise if you change your password it doesn’t take effect!
  • no external search feature (grepping through gigabytes of text is fast enough)
  • no DKIM
  • prefer using network connections instead of unix domain sockets for IPC
    • improves connectivity and scalability between postfix, rspamd, and dovecot without needing to manage and massage file system permissions
  • use simultaneous RSA and EC certificates for both postfix and dovecot
  • prevent dovecot login abuse with fail2ban and dovecot tcpwrapper support

Automated Deployment

Grab the git repo from mattsta/mailweb and start automating your mail server creation.

Here’s how to use the repository:

  • clone it
  • populate your hosts in file inventory/inventory
    • create directory inventory/host_vars/[hostname] where [hostname] is the hostname you put in your inventory/inventory for your mail deployment
    • copy the variable files from sample/host_vars/* to inventory/host_vars/[hostname]/
    • edit the variable files for your deployment (set interfaces, hostnames, backup parameters if you have them, certs you want deployed on the server, etc—the files are somewhat documented).
  • edit the Ansible playbook mailmash.yml11 to match your deployment
    • hosts is the ansible group of servers to deploy this playbook into
    • remote_user is a user on the remote machine with the ability to sudo -s
    • become means “run everything as root after sudo
    • roles is a YAML list of each role to run from the roles/ directory
  • NOW YOU CAN FINALLY DO SOMETHING
    • Running ./runner.sh mailmash will run the mailmash.yml playbook against a host called mailmash
      • obviously do not run this against live or production servers until you figure out what all the requested roles actually do
        • by default, the playbook will reconfigure the target VM with ssh only listening on one interface and disable the firewall because this is supposed to be a three-public-service-only host (SMTP, SMTPS, IMAPS). We don’t need a firewall since we control all the services and know exactly what is listening.

Conclusion

Conclusion? We’re still in the middle of things! Sadly, this page has gotten too big. Time to take a break.

So, this is just a conclusion for today. We still need to cover:

  • how to actually add virtual users (dovecot authdb.sqlite database)
    • locations of your mail (/var/mail/vhosts/ and /var/mail/attachments/)
  • how to create and maintain your personal dovecot mail filters with sieve (external tutorial)
  • how to add domains and redirects to postfix (files domains and virtual)
  • how to configure backups (borg init --encryption=repokey-blake2 remotehost:backupdirectory)
  • how to access the rspamd stats console (ssh -J outsidehost -L 11334:localhost:11334 insideVM; open browser to http://localhost:11334)
  • configuring TLS RSA and EC certs integration into postfix and dovecot (based on placing your certs in /etc/ssl/ and keys in /etc/ssl/private/ named after the domain they control)
  • all the bugs I ran into over a week:
    • fail2ban inconsistent lookback-on-startup bug
    • macOS Mail.app just ignoring IMAP password changes after one failed attempt (rdar://41826332)
    • macOS Mail.app refuses to show mail in nested folders (rdar://41824401 duplicate of rdar://15570921)
      • they closed the bug as “dup” — Apple knows Mail.app is losing mail and failing to show you mail you receive, but they just don’t seem to care.
      • look how low the duplicate number is! How many years has Apple refused to fix this mail loss problem?
    • autossh was missing support for ssh -J parameter (supplied fix patch to maintainer over email)
    • Ansible weirdness (maybe not bugs, but just… weirdness — I’ve used unix systems for 23 years and have to keep 8 browser tabs open to figure out how to copy files and start services using YAML syntax)
    • systemd consistency bugs (lol what else is new?)
    • Ubuntu netplan system configuration limitations (filed bug with netplan, but can’t always reproduce consequences)
    • and so much more! does anybody even use software?!

You can get all the integrated config files at mattsta/mailweb to see how all the services work together (how postfix talks to rspamd; how postfix talks to dovecot; how dovecot stores virtual users; how dovecot lets you login with IMAPS; how fail2ban detects dovecot login failures to ban hackers; etc).

Based on the automated service provisioning setup, you can use everything directly through Ansible, or fork it and modify for own needs, or just consult the config files to create your own SMTP+SMTPS+IMAPS+spamfilter deployments by hand or with a different config system.

Happy Mailing and Happy Spam-Minimized Mail Receiving!

-Matt@mattsta☁mattsta


Stay tuned for more Infrastructure Week July 2018! New articles and tools every day this week.


  1. sure, keep trusting google that works out great for everybody

  2. which is an optimistic view considering society will be gone kinda soon

  3. except borg requires a live shell on the receiving system, so it can’t backup to S3 or B2 or other object-only stores natively, but rsync.net has borg user discounts since with borg you manage your own snapshots and deduplication

  4. sample output of dumping the bytecode of a sieve filter: sieve-dump .dovecot.svbin

    * Script metadata (block: 0):
    
    class = file
    class.version = 0
    location = /var/mail/vhosts/genges.com/matt/.dovecot.sieve
    
    * Required extensions (block: 1):
    
      0: variables (id: 17)
      1: fileinto (id: 5)
      2: envelope (id: 7)
      3: subaddress (id: 10)
      4: mailbox (id: 21)
      5: regex (id: 13)
    
    * Main program (block: 2):
    
    Address   Line  Code
    00000000:       DEBUG BLOCK: 3
    00000001:       EXTENSIONS [6]:
    00000002:         variables
    00000004:           VARIABLES SCOPE [0] (end: 00000009)
    00000009:         fileinto
    0000000b:         envelope
    0000000d:         subaddress
    0000000f:         mailbox
    00000011:         regex
    00000013:    7: HEADER
    00000016:         match type: regex
    00000018:         header names: STR[7] "Subject"
    00000022:         key list: STR[73] "Notify NYC - (Silver|Road|NYS Missing|Missing|Traffic|Alternate|Amber).*$"
    0000006e:    7: JMPFALSE 12 [0000007b]
    00000073:    8: FILEINTO
    00000074:         folder: STR[3] "nyc"
    0000007a:    9: STOP
    000000ca:   17: ADDRESS
    000000cd:   18:   match type: is
    000000d0:         header list: STR[4] "from"
    000000d7:         key list: STR[24] "notifications@github.com"
    000000f2:   18: JMPFALSE 70 [00000139]
    000000f7:   52: HEADER
    000000fa:         match type: matches
    000000fd:         header names: STR[7] "subject"
    00000107:         key list: STR[8] "*[*/*] *"
    00000112:   52: JMPFALSE 38 [00000139]
    00000117:   54: FILEINTO
    0000011a:         SIDE-EFFECT: create
    0000011c:         folder: CAT-STR [4]:
    0000011e:           STR[13] "INBOX/github/"
    0000012e:           MATCHVAL 2
    00000131:           STR[1] "/"
    00000135:           MATCHVAL 3
    00000138:   55: STOP
    00000139:   58: HEADER
    0000013c:         match type: regex
    0000013e:         header names: STR[7] "Subject"
    00000148:         key list: STR[10] "^Monthly.*"
    00000155:   58: JMPFALSE 38 [0000017c]
    0000015a:   59: FILEINTO
    0000015b:         folder: STR[29] "amazonNotifications-Spurrious"
    0000017b:   60: STOP
    0000017c:   63: ADDRESS
    0000017f:         match type: contains
    00000182:         header list: STRLIST [5] (end: 000001aa)
    00000188:           STR[4] "from"
    0000018f:           STR[2] "to"
    00000194:           STR[2] "cc"
    00000199:           STR[3] "bcc"
    0000019f:           STR[8] "reply-to"
    000001aa:         key list: STR[24] "notifications@github.com"
    000001c5:   63: JMPFALSE 28 [000001e2]
    000001ca:   64: FILEINTO
    000001cb:         folder: STR[19] "githubNotifications"
    000001e1:   65: STOP
    000001e2:   65: [End of code]
  5. though that can come back to bite you in the ass

  6. you know you can trust software websites when they have a big header saying SSH KEYS ARE YOUR FRIENDS

  7. though if you prefer the agent approach, you can use ansible-pull to continuously update your ansible config git repo and apply it locally to all your machines.

  8. but, you may ask, what if someone makes tasks.yml and tasks/main.yml? Well, just make that an error. Or, just run include them both. We’ll leave that design decision is up to you. Just help us save our wrists by not needing to jump in and out of one-file directories unnecessarily.

  9. password updates are as simple as: sqlite3 authdb.sqlite "update users set password='$(doveadm pw -s SHA512-CRYPT -r 1856250)' where userid='matt@matt.sh';