Message store options

To reduce the load on RabbitMQ, MailerQ can use an external message store. In this case, only email metadata (the recipient, the envelope address, et cetera) has to be stored in the JSON object in RabbitMQ, while the full MIME data can be stored in the message store.

storage-address:        mongodb://hostname/database/collection
storage-threads:        1
storage-policy:         all
storage-ttl:            3600

This message store is completely optional: if you set the "storage-address" variable to an empty string, MailerQ works just as well (even faster because no extra communication with the storage server is necessary), but the load on RabbitMQ and the network will be much higher.

When configured, MailerQ interacts with the message store in three situations:

Supported storage engines

MailerQ supports multiple storage engines: MongoDB, Couchbase, SQLite, MySQL and Postgres are the supported databases, where MongoDB is the preferred platform. Although it is possible to use MySQL, SQLite or PostgreSQL, we do not recommend using such relational databases, as those systems are not optimized for document storage. If you want to store the messages on disk, you can use a dir:// based backend. Messages can also be stored in AWS S3 buckets, which can be useful when running MailerQ on AWS. If you set the storage address to a "http" or "https" URL, the bodies will be stored and retrieved via HTTP requests.

The address of the message store can be set with the "storage-address" config file variable. The following addresses are supported:

storage-address:        mongodb://hostname/database/collection
storage-address:        couchbase://password@hostname/bucketname
storage-address:        sqlite:///path/to/database/file
storage-address:        mysql://user:password@hostname/databasename
storage-address:        postgresql://user:password@hostname/databasename
storage-address:        dir:///path/to/directory
storage-address:        s3://accesskey:secretkey@region/bucketname
storage-address:        http://hostname/path/to/dir
storage-address:        https://hostname/path/to/dir

If you have a cluster of Couchbase or MongoDB servers, you can split the hostnames with semicolons (;). For MongoDB you can also specify the name of the replica set in the address if you have more than one server.

storage-address:        couchbase://password@host1;host2;host3/bucketname
storage-address:        mongodb://host1;host2;host3/replicaset/database/collection

MailerQ relies on third party libraries to be present on your system to communicate with the storage platform. The Couchbase C client library has to be installed if you want to use the Couchbase storage engine, and the Mongo C Driver is needed to connect to MongoDB. For the Sqlite, Mysql and PostgreSQL engines MailerQ also relies on the availability of the associated drivers on the system.

MongoDB specifics

MongoDB is the recommended storage platform. The address string to connect to MongoDB is directly passed to the MongoDB driver. All the options that are supported by this driver can be used in the address string.

Click here for the MongoDB documentation

Sometimes MongoDB cannot succesfully retrieve data, even though it is available. In this case, it is useful to repeat the fetch operation a couple of times. To enable this feature, you can add a special option to the address:

storage-address:        mongodb://hostname/database/collection?readAttempts=3

The default number of attempts is 1. If you want to repeat failed lookups a couple of times, you can pass in a higher value.

To prevent that many small read operations are fired at MongoDB, MailerQ normally groups fetch operations into a single "multi-get" operation that fetches many documents in one go. This reduces the number of queries that are sent to MongoDB, but if one of the requested documents is hard to find, it also slows down the lookup of all the other documents in the same query. To get a balance between this, you can limit the number of fetch operations that are grouped together:

storage-address:        mongodb://hostname/database/collection?maxQuerySize=10

MongoDB has a limitation of around 16 MB per document (there is some overhead due to the usage of their internal BSON representation). If MailerQ has to store a bigger document, the message is split up into smaller parts that are all individually stored into MongoDB. As a consequence, you can find three types of documents in the database.

Most mails are less than 16mb big. So you will mostly see regular messages in the database with the following properties:

If the message is bigger than 16mb, MailerQ splits up the message and stores a master document with the following properties:

Each individual part of a big message has the following properties:

Directory specifics

If you use the "directory://" storage backend, MailerQ stores all the messages in separate files on the file system. To prevent that the number of files in a directory becomes too big, MailerQ creates a nested directory structure. By default, this directory structure is four files deep. If you want to use a different depth, you can specify this via a "depth" parameter:

storage-address:        directory:///path/to/directory?depth=3

REST specifics

If you set your storage backend to a "http://" or "https://" address, the message bodies are retrieved via HTTP requests. This allows you to set up a REST service that loads the bodies from storage engines that are not natively supported by MailerQ. Or you can build a REST service that generates message bodies on the fly. Your REST service has to support the following methods:

If, for example, you set your storage address to "http://example.com/path", MailerQ will load all documents from URL's like "http://example.com/path/KEY" where "KEY" corresponds to the identifier of the message body.

The REST service must return full message body strings in MIME format. It is possible to compress them with GZIP compression. If your REST service sends out such compressed messages, make sure that you also add a "content-encoding: gzip" HTTP header. This same header is sent by MailerQ when it uploads message bodies.

Threads

Strangely enough, some storage drivers (MongoDB, MySQL, Sqlite and PostgreSQL) only offer synchronous, blocking drivers. The storage operations can only be executed sequentially, after previous operations were completed. To prevent that the entire MailerQ process gets blocked while a storage operation is in progress, MailerQ opens multiple connections to the storage servers, and starts separate threads in which the operations are being executed.

The number of threads (and the number of storage connections) can be set using the "storage-threads" variable. If you set this to a higher value, the throughput of storage operations gets better too.

Storage policy

The "storage-policy" config file setting tells MailerQ what type of messages should be stored in the message store. Valid values are "all", "out", "in" and "none". The "none" setting is meaningful if you only want MailerQ to retrieve mime data from external storage, without ever starting storage operations.

Before MailerQ publishes a message to RabbitMQ (for example, before it sends a received message to the inbox queue, or before it send a delayed message back to the outbox queue) it checks the storage policy to see whether the mime data should be sent to RabbitMQ too, or whether the mime data should be stored in a different storage system.

If you want all messages to be stored in the message store, use the "all" policy. If this policy is enabled, MailerQ checks each JSON object before it gets published to RabbitMQ. If the JSON still contains mime data, MailerQ removes this data from the JSON and stores it in the message store instead. The JSON data will be updated with a "key" property that refers to the data in the message store.

The "in" and "out" policies are more complex. The "out" policy instructs MailerQ to use the message store for outgoing messages only. If a message is greylisted or delayed and is published back to the outbox, MailerQ first strips the mime data from the JSON, and stores that in the message store. Incoming messages (like the ones that come in on the SMTP port, or the messages dropped in the spool directory) are not checked and the full mime data is published to RabbitMQ.

The "out" policy is often used, because most emails get delivered at the very first attempt, and it is therefore often a waste of resources to store incoming messages first in a NoSQL environment: the messages will probably be retransmitted a fraction of a second later. By using the "out" storage policy, initial injected emails are completely sent to RabbitMQ. Only if the initial delivery fails and the message is sent back to the outbox for later delivery, the full MIME data is stripped from the JSON and stored in the separate storage.

The "out" policy especially makes sense in setups where the majority of all deliveries succeed at the first attempt, and rescheduled attempts are likely to be pumped around between MailerQ and RabbitMQ for a number of times.

Only "all", "out" and "none" are meaningful policies. For completeness however, we also support the "in" property which does exactly the opposite as the "out" policy: all incoming messages are stored in the message store, and only metadata is published to RabbitMQ. However, when a mail is delayed and has to be published back to the outbox, the mime data is kept inside RabbitMQ.

Time-to-live

When you store messages, you probably don't want to keep them forever in your message store. To overcome this, every message has a time-to-live value, and expired message are automatically removed from storage. The "storage-ttl" config option specifies the default time-to-live that is used for message that are stored in the document store.

Note that the time-to-live is added to the mail max delivery time. If you try to send out an email using MailerQ, and that email has to be delivered within 24 hours, and your "storage-ttl" is set to 3600 seconds (one hour), the mime data will be stored in NoSQL for at most 25 hours.

Timeout

MailerQ uses a timeout for retrieval operations. If the storage server does not send back the message body within this time, the message is published back to RabbitMQ and will be retried a little later.

storage-timeout:        20
storage-reschedule:     120

With the above settings you tell MailerQ to timeout fetch-operation after 20 seconds, and publish the mail back to RabbitMQ. After 120 seconds the message will be loaded again from RabbitMQ, and is retried.

Compression

You may choose to compress the messages you store to reduce the load on your storage server and limit the amount of data you need to send. This feature can be turned on by specifying "compression=gzip" in the storage URL, e.g.:

storage-address:        mongodb://hostname/database/collection?compression=gzip

MailerQ then takes care of compressing all data behind the scenes. Note that this does mean that the contents of your database are no longer human readable.

The compression feature is currently enabled only for

Support for the couchbase and SQLite backends is still in an experimental state.