BridgeDB needs Nagios checks that the Email Distributor is working. The best way to do this would be to send an email to bridges@torproject.org which say "get help".
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I think what is needed here is a passive style service check.
This check is runs on it's own schedule via cron or something; it sends e-mail to the Email Distributor
and then periodically checks it's e-mail inbox via IMAP...
If we don't receive an email with the heuristics we are looking for in X minutes then send an alert to the nagios server.
After chatting with lunar about it we began discussing additional monitoring for the email distributor. The check_email_delivery nagios plugin was suggested.
I also wondered if we should consider whitelisting tp.o addresses for use by the monitoring system (among other reasons).
We can't safely whitelist torproject.org email addresses because the torproject.org mailserver doesn't do DKIM. Because of this, I started adding a (email_address, gpg_fingerprint) whitelisting feature, requiring that such whitelisted addresses be signed with a particular key. (See #9332 (moved) and note that this feature would present a maintainability nightmare.)
For what it's worth, we're now monitoring BridgeDB's SMTP port with sysmon. We will get notified if the SMTP server disappears but we are unable to detect more subtle, application-layer breakage.
I refactored hiro's "check for emails" script in this commit. The script writes its output to /srv/bridges.torproject.org/check/status. I can set up a cron job that runs this script every, say, six hours. We will probably encounter some more hiccups once the script is running in production. Hiro, can you remind me what will happen if nagios considers BridgeDB's email responder down? Will I be able to see this in the nagios web UI? I'm asking because there will probably be a few more hiccups with the "check email" script once it's running continuously.