Perl one-liner for adding newlines to HTML

When the rich editor puts all HTML in one line, and I want to edit it, I could always use the “tidy” utility, however it does too much. All I want is a newline here and there to make the whole thing accessible.

So this simple one-liner does the job:

perl -pe 's/(<\/(?:p|h\d|div|tr|td|table|ul|ol|li)>)/"$1\n"/ge'

Not perfect, but gives something to work with.

Enabling STARTTLS on sendmail with Let’s Encrypt certificate

Introduction

I’ll start with the crucial point: My interest in giving sendmail a Let’s Encrypt certificate (along with a secret key, of course) has nothing to do with security. The real reason is that some mail servers won’t deliver their mail to my server unless the link is encrypted. As of today (March 2026), I know only of one such case, but that’s enough for me to understand that I must at least support an opportunistic TLS upgrade for arriving mails. In other words, my mail server must allow STARTTLS for servers who want to drop a mail at my server.

The problem with servers that won’t play ball without STARTTLS is that the mail is lost, sometimes without the sender being notified. I discovered this issue when one of those confirm-your-email messages didn’t arrive, and I sent the sender’s tech support a complaint. To which they responded with the reason: My server didn’t support a STARTTLS upgrade.

So the goal is to reduce the risk of mail loss, nothing else. And the main concern is that mail will not be delivered in cases it would have before adding STARTTLS. For example, where a cleartext connection would have been OK, but the sides attempted and failed to initiate a STARTTLS encrypted session and then the connection is terminated altogether, no mail delievered.

I’m running sendmail 8.14.4 on a Debian 8 machine (yes, it’s really ancient, but don’t play around with a stable system).

As for sources of information, there are a lot of guides out there. For those preferring the horse’s mouth, there’s the README on configuration files at /usr/share/sendmail-cf/README. Look for the section about STARTTLS. An example configuration can be found at /usr/share/sendmail/examples/tls/starttls.m4.

And since this topic revolves around certificates, maybe check out another post of mine which attempts to explain this topic. And my own little tutorial on setting up sendmail.

Is encryption worth anything?

If a mail server refuses to talk with anyone unless a TLS link is established with a mutually verified certificate (both sides check each other), encryption indeed adds security. Otherwise, it’s quite pointless.

Let’s begin with accepting arriving emails: If a server agrees to cleartext connections by clients for dropping off mails, it’s up to the sender to decide the level of security. Even if the arriving connection is encrypted with TLS, that doesn’t mean that connection is secure. Surely, the server is requied to submit a certificate when setting up the TLS session, but did the client verify it? And it the verification failed, did it terminate the connection? If it didn’t, a simple man-in-the-middle attack allows an eavesdropper can come in the middle, feed the client with a cooked-up certificate, accept the email, and then relay this email to the real destination, this time with a proper TLS connection. The creates an illusion that the mail was transmitted securely.

As a receiver of this mail, you can’t be sure who’s on the other side without checking the client’s certificate. A lot of clients won’t supply a certificate, though (my own server included, more about this below).

As for sending emails, an eavesdropping server might pretend to be the destination (possibly by DNS poisoning of the MX record). In the simplest man-in-the-middle attack, the eavesdropper doesn’t allow STARTTLS, and the message is transmitted in cleartext. If someone bothers to look in the mail server’s logs, this can be detected. Alternatively, the eavesdropper might suggest STARTTLS and offer an invalid certificate. For example, a self-signed certificate might seem like an innocent mistake. If the sending server agrees to sending the email nevertheless, the attack is successful (but with a “verify=FAIL” in the logs, allowing spotting the attack, if verifications usually are successful).

So to be really secure all mail servers must insist on TLS and verify each other’s certificates, or else the mail doesn’t go through. At present, going this path with a public mail server means a lot of undelivered mails (from legit sources). In particular, insisting that the sender of the email offers a valid certificate is not going to end well.

The situation before I made any changes

With the default configuration, sendmail has no certificates available for use. With no certificates, I mean no certificates to identify itself, but also no root certificates that allow verifying other servers.

This doesn’t prevent my server from connecting to other servers with TLS for outbound mails. In the mail log, the relevant entry looks like this:

sm-mta: 6237AaWA022257: from=<my@address.com>, size=2503, class=0, nrcpts=1, msgid=<f11f0@address.com>, bodytype=8BITMIME, proto=ESMTP, daemon=IPv4-port-587, relay=localhost.localdomain [127.0.0.1]
sm-mta: 6237AaWA022257: Milter insert (1): header: DKIM-Signature: [ ... ]
sm-mta: STARTTLS=client, relay=mx.other.com., version=TLSv1/SSLv3, verify=FAIL, cipher=ECDHE-RSA-AES128-GCM-SHA256, bits=128/128
sm-mta: 6237AaWA022257: to=<friend@other.com>, ctladdr=<my@address.com> (1000/1000), delay=00:00:03, xdelay=00:00:03, mailer=esmtp, pri=122503, relay=mx.other.com. [128.112.34.45], dsn=2.0.0, stat=Sent (Ok: queued as )

Note the “VERIFY=fail” on the third row, discussed in length below. For sending email, the only drawback for not setting up anything encryption-related is that the server’s identity isn’t verified. Plus, my server didn’t send a certificate if it was asked to do so, but that’s quite usual.

So to the server receiving the email, everything is normal. My server, acting as a client, upgraded the connection to encrypted with STARTTLS, and went through with it. No problem at all.

Should I install root certificates?

In order to allow my server to prevent a man-in-the-middle attack for outbound email, I can install root certificates and make them available to sendmail by virtue of configuration parameters. I also need to configure sendmail to refuse anything else than a TLS with a verified server. Otherwise, it’s pointless, see above.

At the very least, I will need to update the root certificates often enough, so that new root certificates that are generally accepted are recognized by sendmail, and that expired root certificates are replaced by new ones.

And even if I do everything right, some mails will probably not go through because the destination mail server isn’t configured correctly. Or doesn’t support STARTTLS at all, just as my own didn’t, before the changes I describe here.

So clearly, no root certificates on my server. I want all emails to fail verification, so if I mistakenly enable some kind of enforcement, all deliveries will fail, and not one isolated case a couple of weeks after making the mistake.

Now to the practical part

In the existing installation, /etc/mail is where sendmail keeps its configuration file. So I created the /etc/mail/certs/ directory, and populated it with three files:

  • my.pem: The certificate obtained from Let’s Encrypt.
  • my.key: The secret key for which this certificate is made (or to be really accurate, the certificate is made for the public key that pairs with this key). Readable by root only (for security and to make sendmail happy).
  • ca.pem: The intermediate certificate which completes the trust chain from my.pem to the root certificate (Let’s Encrypt’s entire chain consists of just three certificates, root included).

I was a bit sloppy in the description for my.pem, because I use bacme to obtain the certificate from Let’s Encrypt, and that script gives me a certificate file that contains both my.pem and ca.pem, concatenated. The script that separates these two into separate files is shown further below.

And then I added these rows to sendmail.mc:

define(`confCACERT_PATH', `/etc/mail/certs')dnl
define(`confCACERT', `/etc/mail/certs/ca.pem')dnl
define(`confSERVER_CERT', `/etc/mail/certs/my.pem')dnl
define(`confSERVER_KEY', `/etc/mail/certs/my.key')dnl

Note that with this setting, the server doesn’t supply any certificate when acting as a client (i.e. when submitting an outbound email), even if asked for it. To enable this, the confCLIENT_CERT and confCLIENT_KEY options need to be assigned. This option should not be used with Let’s Encrypt’s certificates, as discussed below.

Actually, my initial attempt was to add only the two last rows, defining confSERVER_CERT and confSERVER_KEY. As the certificate file I get from my renewal utility already contains both my own and intermediate certificates, why not give sendmail only this combo file and forget about CAs? I mean, this is how I do it with Apache’s web server!

That idea failed royally in two different ways:

  • If confCACERT_PATH and confCACERT aren’t defined, sendmail will start, but won’t activate the STARTTLS option. Actually, even if both are defined as above, and ca.pem is empty, no STARTTLS. Raising the loglevel with this definition allowed sendmail to complain specifically about this:
    define(`confLOG_LEVEL', `14')dnl
  • When I put an irrelevant certificate in ca.pem, sendmail activated STARTTLS, ignored ca.pem (which is fine, it doesn’t help) but presented only the first certificate in my.pem to the client connecting.

To put it simple, sendmail wants my own certificate in my.pem and the CA’s certificate(s) in ca.pem, and kindly asks me not to fool around. And I have no problem with this, as the intermediate certificate can’t be used by my server to verify other server’s certificates. So adding it doesn’t work against my decision that all attempts to verify certificates by my server will fail.

But this arrangement requires writing a little script to separate the certificates, which is listed below. As far as I understand, those using the mainstream certbot don’t have this problem, as it generates separate files.

Reloading the daemon

It says everywhere, that sendmail must be restarted after updating the certificates. Even though I have the impression that sendmail is designed to shut itself down properly and safely in response to a SIGTERM, and even more importantly, that it’s designed not to lose or duplicate any mails, I didn’t fancy the idea of sending the server the signal that is usually used when shutting down the entire computer.

Instead, it’s possible to send the server a SIGHUP, which makes the server reload its configuration files (except for sendmail.conf, I read somewhere?) and of course the certificates. It’s actually a quick shutdown and restart, so maybe the difference isn’t so great, but reloading is the way it was meant to be. And it’s easily done with this command:

# /etc/init.d/sendmail reload > /dev/null

Redirection to /dev/null silences output (suitable for cron jobs).

It works!

There are two ways to see that mails actually arrive with TLS. One is through the mail logs:

sm-mta: STARTTLS=server, relay=mail-lj1-f180.google.com [209.85.208.180], version=TLSv1/SSLv3, verify=FAIL, cipher=ECDHE-RSA-AES128-GCM-SHA256, bits=128/128

“STARTTLS=server” in the log, indicates that a client has connected with STARTTLS for inbound mail. A reminder from above, if it says “STARTTLS=client”, it’s the server that has connected to another one for outbound mail. And along with that, sendmail tells us how the verification of the other side went.

Even easier, the TLS session leaves its tracks in the relevant Received: header in the mail itself:

Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com
 [209.85.208.180])	by mx.server.com (8.14.4/8.14.4/Debian-8+deb8u2) with
 ESMTP id 6245JWdJ019568	(version=TLSv1/SSLv3
 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=FAIL)	for
 <me@server.com>; Wed, 4 Mar 2026 12:19:34 GMT

No need to be alarmed about the verify=FAIL part. It just indicates that my server failed to verify the certificates that Gmail sent, which is quite natural, as it has no root certificates to go with. Which, as mentioned above, is intentional. See below for more about the verify=FAIL thing.

If something goes wrong…

In order to obtain debug messages, increase the log level to 14 by adding this to sendmail.mc (and compile with make).

define(`confLOG_LEVEL', `14')dnl

This is the log output after a reload (with SIGHUP), ending with STARTTLS working fine:

sm-mta: restarting /usr/sbin/sendmail-mta due to signal
sm-mta: error: safesasl(/etc/sasl2/Sendmail.conf) failed: No such file or directory
sm-mta: error: safesasl(/etc/sasl/Sendmail.conf) failed: No such file or directory
sm-mta: starting daemon (8.14.4): SMTP+queueing@00:10:00
sm-mta: STARTTLS: CRLFile missing
sm-mta: STARTTLS=server, Diffie-Hellman init, key=1024 bit (1)
sm-mta: STARTTLS=server, init=1
sm-mta started as: /usr/sbin/sendmail-mta -Am -L sm-mta -bd -q10m

Except for the fact that there are no errors related to STARTTLS, it says init=1, which is the indication it works.

The complaints about files missing in /etc/sasl2/ can be ignored, as I don’t use any kind of authentication (i.e. asking the client for credentials).

Checking the server for real

The easiest way to test the mail server is with CheckTLS.

This tool connects to the server and attempts starting a STARTTLS session (but doesn’t send a mail). The tool shows the details of session with lots of details, along with clarifications on the meaning of those details. So no need to know all the technicalities to get an idea on how you’re doing. If CheckTLS says all is fine, it’s really fine. If it says otherwise, take the remarks seriously (but never mind if MTASTS and DANE aren’t tested and marked yellow).

But even more importantly, CheckTLS gives the details of the certificates that the server provides. This is a good way to verify that the correct certificates are used, and that the certificate chain is valid.

Note however that this only checks how the server behaves when receiving emails. So if something is wrong with how the server sends email, for example it offers a problematic client certificate, that will go undetected.

For a more do-it-yourself approach, the first thing is to check that the server offers STARTTLS:

$ nc localhost 25
220 mx.server.com ESMTP MTA; Wed, 4 Mar 2026 13:14:20 GMT
EHLO there.com
250-mx.server.com Hello localhost.localdomain [127.0.0.1], pleased to meet you
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-8BITMIME
250-SIZE
250-STARTTLS
250-DELIVERBY
250 HELP
^C

It’s easier to do this on the local machine, because some ISP block connections to port 25 (to avoid spamming from their IP addresses). Note that it’s necessary to type an “EHLO” command to get the server saying something.

All fine? Connect to the server (see “man s_client):

$ openssl s_client -connect localhost:25 -starttls smtp < /dev/null
CONNECTED(00000003)
Can't use SSL_get_servername
depth=1 C = US, O = Let's Encrypt, CN = R12
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = mx.server.com
verify return:1
---
Certificate chain
 0 s:CN = mx.server.com
   i:C = US, O = Let's Encrypt, CN = R12
 1 s:C = US, O = Let's Encrypt, CN = R12
   i:C = US, O = Internet Security Research Group, CN = ISRG Root X1

[ ... ]

And then a lot of mumbo-jumbo follows. The certificates sent by the server are listed in the “Certificate chain” section. The part above it shows openssl’s attempts to validate the certification chain. In the case shown above, there was no root certificate available on the server, hence the “unable to get local issuer certificate” error.

The reason I focus on the “Certificate chain” section is that if a root certificate server is present on the computer that runs openssl, it is listed in the section above it. It will then look like this:

CONNECTED(00000005)
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R12
verify return:1
depth=0 CN = mx.server.com
verify return:1
---
Certificate chain
 0 s:CN = mx.server.com
   i:C = US, O = Let's Encrypt, CN = R12
 1 s:C = US, O = Let's Encrypt, CN = R12
   i:C = US, O = Internet Security Research Group, CN = ISRG Root X1
---

So openssl is happy now, but it lists the root certificate which it picked from its own repository. That is a bit confusing when checking the server.

Notes:

  • The exact same test can be run on port 587, if it’s open for connections.
  • The reason for the </dev/null part is that openssl actually opens a netcat-like session, so /dev/null terminates the session right away.
  • The number in parentheses after CONNECTED is just the file descriptor number of the TCP socket. Nothing really interesting, even though it happens to be different when the certificate chain is validated and when it’s not.
  • Add the -showcerts flag to dump all certificates that the server sent, not just the first one. Certificates from the local machine that help validation but weren’t sent from server are not dumped. For example:
    openssl s_client -connect localhost:25 -starttls smtp -showcerts < /dev/null

    As shown on a different post of mine, this allows examining them closer by copying each certificate to a separate file, and going

    openssl x509 -in thecertificate.crt -text
  • If the verfication of the cerficate fails, openssl establishes the session regardless. The status of this verification is indicated on the output line saying “Verify return code: 0 (ok)” for a successful verification, and another number code and message otherwise (this part isn’t shown above). It’s also possible to add the -verify_return_error flag, which causes openssl to abort the session if the certificate chain verification fails, and indicate that with an error exit code.

verify=FAIL? Is that a problem?

The short answer is, no (in my case).

For the longer answer, let’s start with a brief explanation on the establishment of the TLS session. Encrypted mail delivery, regardless of whether it’s initiated with a STARTTLS on port 25 or by connecting to port 587, is based upon the TLS protocol. This is exactly the same protocol used by web browsers when they establish an https connection.

According to the TLS protocol, the server (as in client / server) is required to send its certificate (along with intermediate certificates) in order to prove that it’s indeed the server of the domain requested (domain as in example.com). The purpose is to avoid a man-in-the-middle attack.

The server may request the client to send its certificate (if it has any) in order to identify itself. This might seem odd in a web browser / server connection, but is supported by the commonly used browsers: It’s possible to supply them with a client certificate, which may be used on websites that request such.

This is why the practical step for making a mail server support STARTTLS is to supply it with a certificate. This certificate is part of the TLS handshake. The thing is, that it’s verified by the client, not our server. So we don’t know if the verification was successful or not. The fact that the TLS connection was established doesn’t mean that the client was happy with the certificate it got. It might have continued setting up the encrypted link regardless. To compare with web browsers, they issue a scary warning when they fail to verify the web server’s certificate. But that’s how it is today. In the distant past, the common reaction to an unverified certificate was a slightly different icon in the browser’s address bar, nothing more.

Likewise, a mail client is likely to continue the TLS handshake if the verification of the server’s certificate fails. This is a reasonable choice in particular if the client initiated a STARTTLS session, but would have sent the email in cleartext if this option wasn’t availble: An encrypted session with the possibility of a man-in-the-middle attack is better than sending the mail in cleartext, might be a way to think about it. And when looking at tutorials on the Internet from the early 2000′s on how to set up a mail server, it’s quite often suggested to use self-signed certificates, such that can’t be validated no matter what. This was a reasonable idea before the Let’s Encrypt era, when a certificate was an expensive thing.

It’s somewhat amusing that the TLS protocol doesn’t include a mechanism for the client to say “listen, your certificate stinks, but I’ll continue anyhow”. It could have been useful for server maintainers, but I suppose crypto people didn’t even want to think about this possibility.

Now to the point: The “verify” part in the mail log (as well as in the related Received header row in arriving mails) indicates the result of verifying the other side’s certificate, if such was requested.

As for the meaning of this attribute, here’s a copy-paste from the relevant part in the README file mentioned above:

${verify} holds the result of the verification of the presented cert.
	Possible values are:
	OK	 verification succeeded.
	NO	 no cert presented.
	NOT	 no cert requested.
	FAIL	 cert presented but could not be verified,
		 e.g., the cert of the signing CA is missing.
	NONE	 STARTTLS has not been performed.
	TEMP	 temporary error occurred.
	PROTOCOL protocol error occurred (SMTP level).
	SOFTWARE STARTTLS handshake failed.

Recall from above that I deliberately didn’t give my mail server any root certificates, with the intention that all verifications of certificates will fail.

It’s important to distinguish between two cases:

  • Outbound emails, my server acting as a client, STARTTLS=client in the log: In this case, the certificate is required by the TLS protocol. Had I provided the mail server with root certificates, anything but verify=OK would have indicated a problem, and could have been a reason to terminate the TLS session. As I’ve already mentioned, the reason I didn’t provide the root certificates is to ensure that my server won’t be this picky.
  • Inbound emails, my server acting as server, STARTTLS=server in the log: The certificate isn’t required to establish the TLS connection, but the server is allowed to ask for it.

So there are two normal options:

  • verify=FAIL, meaning that my side got a certificate (as a client in order to establish a TLS session, or as a server because it asked for it), and the other side submitted something in response. And the verification failed, which is normal when you don’t have a root certificate.
  • verify=NO, meaning that the other side didn’t submit any certificate.

Had I supplied root certificates to my server, I should have seen verify=OK most of the time for STARTTLS=client, and possibly verify=NO for STARTTLS=server. Would this information help me? Would this help telling spam servers from legit ones? I doubt that.

It’s a bit questionable why my server asks for certificates it can’t verify in the STARTTLS=server case, but that’s the default setting, and I guess this is how normal servers behave. According to the README file, setting the confTLS_SRV_OPTIONS to ‘V’ tells the server not to ask clients for certificates. Haven’t tried that personally, but I suppose one gets verify=NOT (as opposed to NO).

But what could be the point in asking the client for certificates? One possibility is that Sendmail has an access database, which defines rules for who is allowed to talk with the server. It’s possible to add rules related to TLS that depend on a successful verification of the certificate by making the rules depend on ${verify}, which is the macro containing the result of this verification (should be “OK”). The rules can of course also depend other attributes of the sender. This is more useful for relaying mails inside an organization. More about this in the README file.

To summarize, both verify=FAIL and verify=NO indicate no problem in my case, because the server has no way to validate the client’s certificate, and this validation isn’t necessary anyhow in my setting, as the server isn’t configured to refuse unverified partners by default, and I surely didn’t change that. It might have been a bit more elegant to set the confTLS_SRV_OPTIONS option to ‘V’ to spare clients the pointless task of sending a certificate that isn’t checked, but I stayed with the default configuration. I try to make a few changes as possible.

Should my server provide a certificate as a client?

Short answer, no, because Let’s Encrypt’s certificates don’t support TLS client authentication anymore.

Besides, this feature isn’t required, and opens for a possibility that weird things might happen if the other side doesn’t manage to verify my certificate for some reason. One could argue that sending certificates improves my server’s position as a non-spammer, and that it makes my server behave a bit more like Google’s. However I searched the web for indications that this would improve the spam score, and found none. So even if I had a certificate that allows client authentication, I wouldn’t use it. And the fact that Let’s Encrypt phased it out proves the point: I would most likely not have noticed the change, and my server would have sent an inadequate certificate, and now go figure why my mails aren’t delivered.

That said, it’s possible to configure Gmail (for business?) to accept mails only from servers that have presented a valid certificate.

Let’s get a bit technical about this: This is taken from the output of “openssl x509 -in thecertificate.crt -text” of a Let’s Encrypt certificate, issued in March 2026:

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

Compare with the same part on a certificate from Gmail, when acting as a server:

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

So far, the same “Extended Key Usage”. But when Gmail’s server identifies as a client, the relevant part is this:

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

So “TLS Web Client Authentication” is a thing, and it’s not wise to issue a certificate without this option when identifying as a client.

Fun fact, I dug up the certificate for the same server from October 2023 from a backup. And indeed, client authentication was enabled:

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

Not surprising, given the phase-out message from Let’s Encrypt from the link above. Otherwise, the old and new certificates are pretty much alike.

I’ve attached the printouts of both Google’s certificates below for reference. Anyhow, the main takeaways are:

  • Even though it says “TLS Web Server Authentication”, it’s fine for mail servers. I would otherwise think “Web Server” refers to https. So Let’s Encrypt’s certificates are really legit for a mail server.
  • When Gmail acts as a client, it indeed has the TLS Web Client Authentication option
  • Don’t try using Let’s Encrypt’s certificates for client authentication.

The script splitting certificates

I promised the script that splits the PEM certificate file obtained by bacme from Let’s Encrypt into my own certificate and the intermediate certificate. I do such things in Perl, so here it is:

#!/usr/bin/perl
use warnings;
use strict;

local $/; # Slurp mode

my $cert = <>;

my @chunks = ($cert =~ /(-----BEGIN CERTIFICATE-----.*?-----END CERTIFICATE-----)/gs);

my $found = @chunks;

die("$0: Expected to find two certificates, found $found instead.\n")
  unless ($found == 2);

writefile("my.pem", "$chunks[0]\n");
writefile("ca.pem", "$chunks[1]\n");

exit(0);

sub writefile {
  my ($fname, $data) = @_;

  open(my $out, ">", $fname)
    or die "Can't open \"$fname\" for write: $!\n";
  print $out $data;
  close $out;
}

I run this as a regular user, not root, which is why I’m relatively sloppy with just writing out a couple of files in the current directory. Even though the hardcoded filenames makes this rather safe anyhow.

This script is given the combined PEM file through standard input (or with the file name as the first argument), and emits the two PEM files to the current directory. Deliberately unsophisticated, and deliberately very picky about the existence of exactly two certificates in the input, so I get notified if something in Let’s Encrypt’s setting suddenly changes.

For example, in the old Let’s Encrypt certificate from 2023 I mentioned above, there were three certificates. The third certificate affirmed ISRG Root X1 with the help of DST Root CA X3, the latter considered the root certificate at the time. The former is nowadays an established root certificate by itself, hence a third certificate unnecessary. But it can change, and if it does, I suppose the solution will be to concatenate everything but the first certificate into ca.pem. And if that happens, I want to be aware of the change and verify that the server gives the correct intermediate certificates.

Summary

After all said and done, I could have just split the certificate from Let’s Encrypt into two as shown above, and added the sendmail configuration options mentioned everywhere on tutorials, and everything would have been just fine. And had I used certbot, like everyone else, I would have had the ready-to-used certificate files directly.

As it turns out, there was no real need to delve into the details. Sendmail does the right thing anyhow. But understanding what’s going on under the hood is still better, and worth the effort, I think. In particular with a crucial component like sendmail.

Appendix: The Gmail’s client certificate

As it was quite difficult to obtain this certificate (running tcpdump on my server, feeding the result to Wireshark, exporting the certificate as raw bytes and opening the file with openssl as DER), I thought I’d show its printout:

    Data:
        Version: 3 (0x2)
        Serial Number:
            2b:64:a8:5a:82:a3:d2:c2:10:5b:9b:25:ab:75:c1:af
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Google Trust Services, CN = WR4
        Validity
            Not Before: Feb 10 17:58:28 2026 GMT
            Not After : May 11 17:58:27 2026 GMT
        Subject: CN = smtp.gmail.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:9e:75:cf:b1:84:c9:a8:f2:bb:c8:89:fe:ef:09:
                    ad:71:d7:2a:1e:e2:b0:51:e2:0b:d5:b9:a7:52:70:
                    e8:c1:ff:5b:60:b6:7c:65:c0:b1:8b:90:cb:cd:ab:
                    0c:da:ef:10:8f:17:79:ed:a5:b9:95:57:f2:28:f2:
                    da:3d:d3:1d:ed:03:a2:6f:88:da:7f:0c:cc:b9:f4:
                    f6:44:ac:bc:fa:95:62:c0:7b:31:8d:44:9c:3f:bf:
                    cf:05:66:8b:a2:7d:9a:dd:af:2b:dc:05:16:b8:37:
                    3c:1f:c5:23:9f:4d:2b:15:a4:97:87:ab:a7:70:3a:
                    4a:5d:2a:8d:d4:21:1a:68:48:da:74:89:6e:1a:27:
                    2f:ef:06:4b:38:b5:65:5f:c4:da:49:96:c5:4e:9f:
                    78:7f:cb:2b:6a:61:ff:f7:0f:f6:f3:d4:d0:7d:94:
                    84:a8:0c:21:8a:a2:a4:20:04:f7:83:ac:00:83:85:
                    eb:9e:01:7a:ea:a5:2a:b9:89:3b:ad:94:2d:c4:c1:
                    2f:49:86:17:52:f7:85:1a:97:76:9d:2f:cf:c4:20:
                    a3:9c:c5:7b:74:57:28:f2:35:d8:ab:fa:d8:53:b9:
                    ee:c9:24:cb:f3:aa:d4:0b:f9:1a:8e:3d:b9:ad:16:
                    7c:99:7c:40:ef:3f:25:5a:c7:94:87:e8:20:bb:19:
                    92:6f
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier:
                65:7C:AF:FE:54:FD:A3:0A:53:90:AB:9A:94:E7:AD:DF:DC:B9:8B:58
            X509v3 Authority Key Identifier:
                keyid:9B:C8:11:BC:3D:AA:36:B9:31:8C:4E:8F:44:D5:57:32:2F:C3:C0:61

            Authority Information Access:
                OCSP - URI:http://o.pki.goog/s/wr4/K2Q
                CA Issuers - URI:http://i.pki.goog/wr4.crt

            X509v3 Subject Alternative Name:
                DNS:smtp.gmail.com
            X509v3 Certificate Policies:
                Policy: 2.23.140.1.2.1

            X509v3 CRL Distribution Points: 

                Full Name:
                  URI:http://c.pki.goog/wr4/F-WFK5nQurE.crl

            CT Precertificate SCTs:
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 96:97:64:BF:55:58:97:AD:F7:43:87:68:37:08:42:77:
                                E9:F0:3A:D5:F6:A4:F3:36:6E:46:A4:3F:0F:CA:A9:C6
                    Timestamp : Feb 10 18:58:35.396 2026 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:BE:28:85:4E:52:7D:B5:FC:0C:C7:FA:
                                26:98:AE:D5:C4:86:E1:E1:70:A6:6A:3C:CA:CE:9E:21:
                                17:27:D4:09:BE:02:21:00:89:B7:00:57:51:76:41:FB:
                                D3:73:9B:27:FA:E1:40:2F:51:E1:4F:14:D1:65:18:EE:
                                81:C7:7C:A1:60:BA:6A:BF
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : CB:38:F7:15:89:7C:84:A1:44:5F:5B:C1:DD:FB:C9:6E:
                                F2:9A:59:CD:47:0A:69:05:85:B0:CB:14:C3:14:58:E7
                    Timestamp : Feb 10 18:58:35.450 2026 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:45:02:21:00:B0:B1:6E:A6:C2:1B:49:2A:28:2C:C9:
                                EC:AE:C6:F9:F4:EC:89:64:AC:88:6A:BE:08:86:09:36:
                                17:66:63:49:D0:02:20:5C:CE:E6:21:C3:21:88:15:E1:
                                D9:17:13:D6:0B:E3:F6:54:71:58:C9:55:9F:DA:14:63:
                                F8:69:F1:BC:DD:4B:32
    Signature Algorithm: sha256WithRSAEncryption
         8f:fa:cf:2b:ab:6a:66:07:2a:32:ae:15:39:c8:bf:a6:22:e1:
         b1:55:6d:1f:04:26:4b:34:54:fe:91:cd:61:92:6c:b1:2a:8b:
         47:81:28:84:ee:d1:b7:c2:fc:da:81:fd:74:c4:bf:6e:ba:f1:
         ef:b2:81:77:f1:0b:80:73:78:e1:86:1f:92:c8:92:a7:45:e6:
         26:93:4d:92:a2:2b:d2:02:db:1c:b8:81:4e:56:79:bc:4a:f6:
         8c:6c:f3:2a:a8:09:b2:5f:c2:74:bb:2d:74:0b:ea:3a:50:e7:
         dd:33:61:fa:ed:df:6c:ed:6e:ba:50:8c:54:9d:19:76:03:1b:
         56:7e:55:be:ee:3f:a3:c5:d6:ad:6b:fc:1b:43:ce:aa:50:52:
         af:f6:83:f0:38:f5:62:8d:0b:91:f3:72:f1:b7:10:64:1a:ca:
         02:97:8e:f9:13:a3:5d:1a:1b:ee:5d:01:dd:b0:48:f2:f3:30:
         cf:8d:6a:98:21:8d:83:23:38:c7:80:22:59:97:f0:45:76:fb:
         8c:a9:4e:f8:37:38:de:ba:4e:94:c5:1f:b1:d0:3c:87:69:11:
         ea:90:0d:75:72:82:5a:a3:c3:99:c6:e5:ce:57:05:ed:63:a9:
         2e:20:ab:b6:41:8c:53:e1:92:5c:55:de:bf:3b:d1:d3:ec:08:
         a8:87:9e:c0
-----BEGIN CERTIFICATE-----
MIIFMjCCBBqgAwIBAgIQK2SoWoKj0sIQW5slq3XBrzANBgkqhkiG9w0BAQsFADA7
MQswCQYDVQQGEwJVUzEeMBwGA1UEChMVR29vZ2xlIFRydXN0IFNlcnZpY2VzMQww
CgYDVQQDEwNXUjQwHhcNMjYwMjEwMTc1ODI4WhcNMjYwNTExMTc1ODI3WjAZMRcw
FQYDVQQDEw5zbXRwLmdtYWlsLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC
AQoCggEBAJ51z7GEyajyu8iJ/u8JrXHXKh7isFHiC9W5p1Jw6MH/W2C2fGXAsYuQ
y82rDNrvEI8Xee2luZVX8ijy2j3THe0Dom+I2n8MzLn09kSsvPqVYsB7MY1EnD+/
zwVmi6J9mt2vK9wFFrg3PB/FI59NKxWkl4erp3A6Sl0qjdQhGmhI2nSJbhonL+8G
Szi1ZV/E2kmWxU6feH/LK2ph//cP9vPU0H2UhKgMIYqipCAE94OsAIOF654Beuql
KrmJO62ULcTBL0mGF1L3hRqXdp0vz8Qgo5zFe3RXKPI12Kv62FO57skky/Oq1Av5
Go49ua0WfJl8QO8/JVrHlIfoILsZkm8CAwEAAaOCAlIwggJOMA4GA1UdDwEB/wQE
AwIFoDAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwDAYDVR0TAQH/BAIw
ADAdBgNVHQ4EFgQUZXyv/lT9owpTkKualOet39y5i1gwHwYDVR0jBBgwFoAUm8gR
vD2qNrkxjE6PRNVXMi/DwGEwXgYIKwYBBQUHAQEEUjBQMCcGCCsGAQUFBzABhhto
dHRwOi8vby5wa2kuZ29vZy9zL3dyNC9LMlEwJQYIKwYBBQUHMAKGGWh0dHA6Ly9p
LnBraS5nb29nL3dyNC5jcnQwGQYDVR0RBBIwEIIOc210cC5nbWFpbC5jb20wEwYD
VR0gBAwwCjAIBgZngQwBAgEwNgYDVR0fBC8wLTAroCmgJ4YlaHR0cDovL2MucGtp
Lmdvb2cvd3I0L0YtV0ZLNW5RdXJFLmNybDCCAQUGCisGAQQB1nkCBAIEgfYEgfMA
8QB3AJaXZL9VWJet90OHaDcIQnfp8DrV9qTzNm5GpD8PyqnGAAABnEjrcQQAAAQD
AEgwRgIhAL4ohU5SfbX8DMf6Jpiu1cSG4eFwpmo8ys6eIRcn1Am+AiEAibcAV1F2
QfvTc5sn+uFAL1HhTxTRZRjugcd8oWC6ar8AdgDLOPcViXyEoURfW8Hd+8lu8ppZ
zUcKaQWFsMsUwxRY5wAAAZxI63E6AAAEAwBHMEUCIQCwsW6mwhtJKigsyeyuxvn0
7IlkrIhqvgiGCTYXZmNJ0AIgXM7mIcMhiBXh2RcT1gvj9lRxWMlVn9oUY/hp8bzd
SzIwDQYJKoZIhvcNAQELBQADggEBAI/6zyuramYHKjKuFTnIv6Yi4bFVbR8EJks0
VP6RzWGSbLEqi0eBKITu0bfC/NqB/XTEv2668e+ygXfxC4BzeOGGH5LIkqdF5iaT
TZKiK9IC2xy4gU5WebxK9oxs8yqoCbJfwnS7LXQL6jpQ590zYfrt32ztbrpQjFSd
GXYDG1Z+Vb7uP6PF1q1r/BtDzqpQUq/2g/A49WKNC5HzcvG3EGQaygKXjvkTo10a
G+5dAd2wSPLzMM+NapghjYMjOMeAIlmX8EV2+4ypTvg3ON66TpTFH7HQPIdpEeqQ
DXVyglqjw5nG5c5XBe1jqS4gq7ZBjFPhklxV3r870dPsCKiHnsA=
-----END CERTIFICATE-----

For comparison, this is the certificate obtained on the same day, when Gmail responded as a server:

    Data:
        Version: 3 (0x2)
        Serial Number:
            e9:b6:68:79:fa:91:bf:49:10:8d:b9:1e:cc:e8:63:b0
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Google Trust Services, CN = WR2
        Validity
            Not Before: Feb  2 08:37:58 2026 GMT
            Not After : Apr 27 08:37:57 2026 GMT
        Subject: CN = mx.google.com
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:55:ba:49:43:8f:d9:72:9d:f9:d0:fa:1c:76:ec:
                    73:44:39:69:e7:21:68:49:1f:d0:0e:c4:70:bb:1f:
                    61:15:71:58:a7:44:df:bd:9f:d5:f6:e9:d1:8a:77:
                    73:79:ac:82:e7:30:88:53:95:62:ff:f3:cd:32:71:
                    9e:68:21:a7:62
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier:
                51:B6:13:35:D8:FB:85:27:72:70:77:EE:D7:5B:1D:06:5E:63:FD:51
            X509v3 Authority Key Identifier:
                keyid:DE:1B:1E:ED:79:15:D4:3E:37:24:C3:21:BB:EC:34:39:6D:42:B2:30

            Authority Information Access:
                OCSP - URI:http://o.pki.goog/wr2
                CA Issuers - URI:http://i.pki.goog/wr2.crt

            X509v3 Subject Alternative Name:
                DNS:mx.google.com, DNS:smtp.google.com, DNS:aspmx.l.google.com, DNS:alt1.aspmx.l.google.com, DNS:alt2.aspmx.l.google.com, DNS:alt3.aspmx.l.google.com, DNS:alt4.aspmx.l.google.com, DNS:gmail-smtp-in.l.google.com, DNS:alt1.gmail-smtp-in.l.google.com, DNS:alt2.gmail-smtp-in.l.google.com, DNS:alt3.gmail-smtp-in.l.google.com, DNS:alt4.gmail-smtp-in.l.google.com, DNS:gmr-smtp-in.l.google.com, DNS:alt1.gmr-smtp-in.l.google.com, DNS:alt2.gmr-smtp-in.l.google.com, DNS:alt3.gmr-smtp-in.l.google.com, DNS:alt4.gmr-smtp-in.l.google.com, DNS:mx1.smtp.goog, DNS:mx2.smtp.goog, DNS:mx3.smtp.goog, DNS:mx4.smtp.goog, DNS:aspmx2.googlemail.com, DNS:aspmx3.googlemail.com, DNS:aspmx4.googlemail.com, DNS:aspmx5.googlemail.com, DNS:gmr-mx.google.com
            X509v3 Certificate Policies:
                Policy: 2.23.140.1.2.1

            X509v3 CRL Distribution Points: 

                Full Name:
                  URI:http://c.pki.goog/wr2/oQ6nyr8F0m0.crl

            CT Precertificate SCTs:
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : D1:6E:A9:A5:68:07:7E:66:35:A0:3F:37:A5:DD:BC:03:
                                A5:3C:41:12:14:D4:88:18:F5:E9:31:B3:23:CB:95:04
                    Timestamp : Feb  2 09:38:00.395 2026 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:45:02:20:21:B9:8B:BD:E8:4E:B3:F4:24:46:6B:25:
                                17:CF:53:2E:2E:B7:83:A3:F5:DB:7B:F7:91:70:62:A2:
                                D5:74:B8:20:02:21:00:C9:3D:D4:79:5C:05:59:7C:68:
                                ED:6F:EA:45:59:55:D5:A6:9B:F8:9B:A3:62:AD:8B:2B:
                                30:A0:CC:4A:62:A1:EB
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 96:97:64:BF:55:58:97:AD:F7:43:87:68:37:08:42:77:
                                E9:F0:3A:D5:F6:A4:F3:36:6E:46:A4:3F:0F:CA:A9:C6
                    Timestamp : Feb  2 09:38:00.184 2026 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:44:02:20:3A:11:AE:85:B9:06:AF:A9:EF:88:25:64:
                                EB:2A:F3:4B:07:50:AF:B9:63:0F:4C:7A:B0:13:F4:CA:
                                0E:58:55:B7:02:20:50:81:1C:CF:06:47:39:AF:8A:F3:
                                27:00:78:34:FD:40:3F:1E:36:E3:2E:42:08:8E:14:B0:
                                09:B0:CA:CE:FD:B9
    Signature Algorithm: sha256WithRSAEncryption
         5e:bf:fc:22:aa:45:d9:35:37:c7:f3:9b:95:5a:e1:eb:2d:72:
         70:ba:ea:c5:ce:10:2e:53:b6:da:f0:54:77:f4:f4:7d:43:df:
         ff:fe:45:18:f3:cb:85:1c:ae:df:0d:a3:10:f1:01:7a:6f:81:
         03:af:c8:1c:d9:26:2b:4d:69:c1:4a:ef:bf:e2:98:cb:a8:c6:
         42:fe:78:4f:d9:82:d9:2c:39:fc:3e:d3:c2:6f:de:b8:e6:dc:
         82:51:04:00:0d:13:1d:2b:0e:fd:2f:56:7c:bf:73:a6:35:46:
         85:12:99:99:1f:1e:cb:9c:a5:e3:64:7f:b0:66:45:f5:ba:97:
         f0:ac:88:41:7e:c7:b0:7d:7f:04:15:c6:8b:0f:58:cd:19:1e:
         fb:b2:8c:f4:a6:dd:7f:8c:84:98:12:49:60:1b:20:c8:14:da:
         b1:fe:11:06:09:be:92:6b:cc:33:cd:e1:93:7c:bd:ca:1c:c9:
         70:71:cf:46:60:6c:db:22:72:9c:0d:00:e0:6a:72:bc:32:13:
         11:f0:8d:2f:95:d5:d9:20:76:9b:86:dd:73:10:8f:fc:a9:51:
         de:1c:90:d2:c8:a6:f9:ff:ab:a9:a8:5f:75:56:ae:a9:25:6a:
         7f:37:ff:67:5e:53:4e:2b:b7:c0:72:3c:9c:1b:68:f9:9a:0a:
         ef:60:6f:f2
-----BEGIN CERTIFICATE-----
MIIGxDCCBaygAwIBAgIRAOm2aHn6kb9JEI25HszoY7AwDQYJKoZIhvcNAQELBQAw
OzELMAkGA1UEBhMCVVMxHjAcBgNVBAoTFUdvb2dsZSBUcnVzdCBTZXJ2aWNlczEM
MAoGA1UEAxMDV1IyMB4XDTI2MDIwMjA4Mzc1OFoXDTI2MDQyNzA4Mzc1N1owGDEW
MBQGA1UEAxMNbXguZ29vZ2xlLmNvbTBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IA
BFW6SUOP2XKd+dD6HHbsc0Q5aechaEkf0A7EcLsfYRVxWKdE372f1fbp0Yp3c3ms
gucwiFOVYv/zzTJxnmghp2KjggSvMIIEqzAOBgNVHQ8BAf8EBAMCB4AwEwYDVR0l
BAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAdBgNVHQ4EFgQUUbYTNdj7hSdy
cHfu11sdBl5j/VEwHwYDVR0jBBgwFoAU3hse7XkV1D43JMMhu+w0OW1CsjAwWAYI
KwYBBQUHAQEETDBKMCEGCCsGAQUFBzABhhVodHRwOi8vby5wa2kuZ29vZy93cjIw
JQYIKwYBBQUHMAKGGWh0dHA6Ly9pLnBraS5nb29nL3dyMi5jcnQwggKGBgNVHREE
ggJ9MIICeYINbXguZ29vZ2xlLmNvbYIPc210cC5nb29nbGUuY29tghJhc3BteC5s
Lmdvb2dsZS5jb22CF2FsdDEuYXNwbXgubC5nb29nbGUuY29tghdhbHQyLmFzcG14
LmwuZ29vZ2xlLmNvbYIXYWx0My5hc3BteC5sLmdvb2dsZS5jb22CF2FsdDQuYXNw
bXgubC5nb29nbGUuY29tghpnbWFpbC1zbXRwLWluLmwuZ29vZ2xlLmNvbYIfYWx0
MS5nbWFpbC1zbXRwLWluLmwuZ29vZ2xlLmNvbYIfYWx0Mi5nbWFpbC1zbXRwLWlu
LmwuZ29vZ2xlLmNvbYIfYWx0My5nbWFpbC1zbXRwLWluLmwuZ29vZ2xlLmNvbYIf
YWx0NC5nbWFpbC1zbXRwLWluLmwuZ29vZ2xlLmNvbYIYZ21yLXNtdHAtaW4ubC5n
b29nbGUuY29tgh1hbHQxLmdtci1zbXRwLWluLmwuZ29vZ2xlLmNvbYIdYWx0Mi5n
bXItc210cC1pbi5sLmdvb2dsZS5jb22CHWFsdDMuZ21yLXNtdHAtaW4ubC5nb29n
bGUuY29tgh1hbHQ0Lmdtci1zbXRwLWluLmwuZ29vZ2xlLmNvbYINbXgxLnNtdHAu
Z29vZ4INbXgyLnNtdHAuZ29vZ4INbXgzLnNtdHAuZ29vZ4INbXg0LnNtdHAuZ29v
Z4IVYXNwbXgyLmdvb2dsZW1haWwuY29tghVhc3BteDMuZ29vZ2xlbWFpbC5jb22C
FWFzcG14NC5nb29nbGVtYWlsLmNvbYIVYXNwbXg1Lmdvb2dsZW1haWwuY29tghFn
bXItbXguZ29vZ2xlLmNvbTATBgNVHSAEDDAKMAgGBmeBDAECATA2BgNVHR8ELzAt
MCugKaAnhiVodHRwOi8vYy5wa2kuZ29vZy93cjIvb1E2bnlyOEYwbTAuY3JsMIIB
AwYKKwYBBAHWeQIEAgSB9ASB8QDvAHYA0W6ppWgHfmY1oD83pd28A6U8QRIU1IgY
9ekxsyPLlQQAAAGcHbdWSwAABAMARzBFAiAhuYu96E6z9CRGayUXz1MuLreDo/Xb
e/eRcGKi1XS4IAIhAMk91HlcBVl8aO1v6kVZVdWmm/ibo2KtiyswoMxKYqHrAHUA
lpdkv1VYl633Q4doNwhCd+nwOtX2pPM2bkakPw/KqcYAAAGcHbdVeAAABAMARjBE
AiA6Ea6FuQavqe+IJWTrKvNLB1CvuWMPTHqwE/TKDlhVtwIgUIEczwZHOa+K8ycA
eDT9QD8eNuMuQgiOFLAJsMrO/bkwDQYJKoZIhvcNAQELBQADggEBAF6//CKqRdk1
N8fzm5Va4estcnC66sXOEC5TttrwVHf09H1D3//+RRjzy4Ucrt8NoxDxAXpvgQOv
yBzZJitNacFK77/imMuoxkL+eE/ZgtksOfw+08Jv3rjm3IJRBAANEx0rDv0vVny/
c6Y1RoUSmZkfHsucpeNkf7BmRfW6l/CsiEF+x7B9fwQVxosPWM0ZHvuyjPSm3X+M
hJgSSWAbIMgU2rH+EQYJvpJrzDPN4ZN8vcocyXBxz0ZgbNsicpwNAOBqcrwyExHw
jS+V1dkgdpuG3XMQj/ypUd4ckNLIpvn/q6moX3VWrqklan83/2deU04rt8ByPJwb
aPmaCu9gb/I=
-----END CERTIFICATE-----

Note the long list of alternative names. I wasn’t sure if they are respected by mail servers as well, but here they are. I fetched this certificate from alt1.gmail-smtp-in.l.google.com, actually, and not “directly” from mx.google.com.

Using MGTs in FPGA designs: Why the data is organized in packets

Introduction

I’ll start with a correction: Indeed, application logic transmitting data from one FPGA to another is required to organize the data in some kind of packets or frames, but there’s one exception, which I’ll discuss later on: Xillyp2p. Anyhow, let’s take it from the beginning.

Multi-Gigabit Transceivers (MGTs, sometimes also referred to as RocketIO, GTX, GTH, GTY, GTP, GTM, etc.) have long ago become the de facto standard for serialized data communication between digital components. The most famous use cases are for a computer and its peripheral (often between the CPU’s companion chip and a peripheral), for example, PCIe, SuperSpeed USB (a.k.a. USB 3.x), and SATA. Also related to computers, Gigabit Ethernet (as well as 10GbE) is based upon MGTs, and the DisplayPort protocol can be used for connecting a graphics card with the monitor.

Many FPGAs are equipped with MGTs. These are often used for turning the FPGA into a computer peripheral (with the PCIe protocol, possibly using Xillybus, or with the SuperSpeed USB protocol, possibly using XillyUSB, or as a storage device with SATA). Gigabit Ethernet can also play in, allowing the FPGA to communicate with a computer with this protocol. Another use of MGTs is for connecting to electronic components, in particular ADC/DAC devices with a very high sampling frequency, hence requiring a high data rate.

But what about communication between FPGAs? At times, there are several FPGAs on a PCB that need to exchange information among themselves, possibly at high rates. In other usage scenarios, there’s a physical distance between the FPGAs. For example, test equipment often has a hand-held probe containing one FPGA that collects information, and a second FPGA that resides inside the table-top unit. If the data rate is high, MGTs on both sides make it possible to avoid heavy, cumbersome and error-prone cabling. In fact, a thin fiber-optic cable is a simple solution when MGTs are used anyhow, and in some scenarios it also offers an extra benefit, except for being lightweight: Electrical isolation. This is in particular important in some medical applications (for electrical safety) or when long cables need to be drawn outdoors (avoiding being hit by lightning).

Among the annoying things about MGT communication there’s the fact that the data flow somehow always gets organized in packets (or frames, bursts, pick your name for it), and these packets don’t necessarily align properly with the application data’s natural boundaries. Why is that so?

This post attempts to explain why virtually all protocols (e.g. Interlaken, RapidIO, AMD’s Aurora, and Altera’s SeriaLite) require the application data to be arranged in some kind of packets that are enforced by the protocol. The only exception is Xillyp2p, which presents error-free continuous channels from one FPGA to another (or with packets that are sensible for the application data). This is not to say that packets aren’t used under the hood; it’s just that this packet mechanism is transparent to the application logic.

I’ll discuss a few reasons for the use of packets:

  • Word alignment
  • Error detection and retransmission
  • Clock frequency differences

Reason #1: Word alignment

When working with an MGT, it’s easy to forget that the transmitted data is sent as a serial data stream of bits. The fact that both the transmitting and receiving side have the same data word width might give the false impression that the MGT has some magic way of aligning the word correctly at the receiver side. In reality, there is no such magic. There is no hidden trick allowing the receiver to know which bit is the first or last in a transmitted word. This is something that the protocol needs to take care of, possibly with some help from the MGT’s features.

When 8b/10b encoding is used, the common solution is to transmit a synchronization word, often referred to as a comma, which is known as the K28.5 symbol. This method takes advantage of the fact that the 8b/10b encoding uses 10 bits on the wire for each 8 bits of payload data for transmission. And this allows for a small number of extra codes for transmission, that can’t be just regular data. These extra codes are called K-symbols, and K28.5 is one of them.

Hence if the bit sequence for a K28.5 symbol is encountered on the raw data link, it can’t be a data word. Most MGTs in FPGAs have a feature allowing them to automatically align the K28.5 word to the beginning of a word boundary. So word alignment can be ensured by transmitting a comma symbol. The comma symbol is often used to reset the scrambler as well, if such is used.

Each protocol defines when the comma is transmitted. There are many variations on this topic, but they all boil down to two alternatives:

  • Transmitting comma symbols occasionally and periodically. Or possibly, as part of the marker for the beginning of a packet.
  • Transmitting comma symbols only as part of an initialization of the channel. This alternative is adopted by protocols like SuperSpeed USB and PCIe, which have specific patterns for initializing the channel, referred to as Ordered Sets for Training and Recovery. These patterns include comma symbols, among others.

Truth to be told, if the second approach is taken, the need for word alignment isn’t a reason by itself for dividing the data into packets, as the alignment takes place once and is preserved afterwards. But the concept of initializing the channel is quite complicated, and is not commonly adopted.

There are other methods for achieving word alignment, in particular when 8b/10b encoding isn’t used. The principles remain the same, though.

Reason #2: Error detection and retransmission

When working with an MGT, bit errors must be taken into account. These errors mean simply that a ’0′ is received for a bit that was transmitted as a ’1′, or vice versa. In some hardware setups such errors may occur relatively often (with a rate of say, 10-9, which usually means more than once per second), and with other setups they may practically never occur. If an error in the application data can’t be tolerated, a detection mechanism for these bit errors must be in place at the very least, in order to prevent delivery of incorrectly received data to the application logic. Even if a link appears to be completely error free judging by long-term experience, this can’t be guaranteed in the long run, in particular as electronic components from different manufacturing batches are used.

In order to detect errors, some kind of CRC (or other redundant data) must be inserted occasionally in order to allow the receiver to check if the data has arrived correctly. As the CRC is always calculated on a segment (whether it has a fixed length or not), the information must be divided into packets, even if just for the purpose of attaching a CRC to each.

And then we have the question of what to do if an error is detected. There are mainly two possibilities:

  • Requesting a retransmission of the faulty packet. This ensures that an error-free channel is presented to the application logic.
  • Informing the application logic about the error, possibly halting the data flow so that faulty data isn’t delivered. This requires the application logic to somehow recover from this state and restart its operation.

High-end protocols like PCIe, SATA and SuperSpeed USB take the first approach, and ensure that all packets arrive correctly by virtue of a retransmission mechanism.

Gigabit Ethernet takes the second approach — there’s a CRC on the Ethernet packets, but the Ethernet protocol itself doesn’t intervene much if a packet arrives with an incorrect CRC. Such a packet is simply discarded (either by the hardware implementing the protocol or by software), so faulty data doesn’t go further. Even the IP protocol, which is usually one level above, does nothing special about the CRC error and the packet loss that occurred as a result of it. It’s only the TCP protocol that eventually detects the packet loss by virtue of a timeout, and requests retransmission.

What about FPGA-to-FPGA protocols, then? Well, each protocol takes its own approach. Xillyp2p is special in that it requests retransmissions when the physical link is bidirectional, but if the link is unidirectional it only discards the faulty data and halts everything until the application logic resumes operation — a retransmission request is impossible in the latter case.

Reason #3: Clock frequency differences

Clock frequency differences should have been the first topic, because it’s the subtle detail that prevents the solution that most FPGA engineers would consider at first for communication between two FPGAs: One FPGA sends a stream of data words at a regular pace, and the other FPGA receives and processes it. Simple and clean.

But I put it third and last, because it’s the most difficult to deal with, and the explanations became really long. So try to hang on. And if you don’t, here’s the short version: The transmission of data can’t be continuous, because the receiver’s clock might be just a few ppm slower. Hence the rate at which the receiver can process arriving data might be slightly lower than the transmitter’s rate, if it keeps sending data non-stop. So to avoid the receiver from being overflowed with data, the transmitter must pause the flow of application data every now and then to let the receiver catch up. And if there are pauses, the segments between these pauses are some kind of packets.

And now, to the long explanation, starting with the common case: The data link is bidirectional, and the data content in both directions is tightly related. Even if application data goes in one direction primarily, there is often some kind of acknowledgement and/or status information going the other way. All “classic” protocols for computers (PCIe, USB 3.x and SATA) are bidirectional, for bidirectional data as well as acknowledge packets, and there is usually a similar need when connecting two FPGAs.

The local and CDR clocks

I’ll need to make a small detour now and discuss clocks. Tedious, but necessary.

In most applications, each of the two involved FPGAs uses a different reference clock to drive its MGT, and the same reference clock is often used to drive the logic around it. These reference clocks of the two FPGAs have the same frequency, except for a small tolerance. Small, but causes big trouble.

Each MGT transmits data based upon its own reference clock (I’ll explain below why it’s always this way). The logic in the logic fabric that produces the data for transmission is usually driven by a clock derived from the same reference clock. In other words, the entire transmission chain is derived from the local reference clock.

The natural consequence is that the data which the MGT receives is based upon the other side’s reference clock. The MGT receiving this data stream locks a local clock oscillator on the data rate of the arriving data stream. This mechanism is referred to as clock data recovery, CDR. The MGT’s logic that handles the arriving data stream is clocked by the CDR clock, and is hence synchronized with this data stream’s bits.

Unlike most other IP blocks in an FPGA, the clocks that are used to interface with the MGT are outputs from the MGT block. In other words, the MGT supplies the clock to the logic fabric, and not the other way around. This is a necessary arrangement, not only because the MGT generates the CDR clock: The main reason is that the MGT is responsible for handling the clocks that run at the bit rate, having a frequency of several GHz, which is far above what the logic fabric can handle. Also, the reference clock used to generate these GHz clocks must be very “clean” (low jitter), so the FPGA’s regular clock resources can’t be used. Frequency dividers inside the MGT generate the clock or clocks used to interface with the logic fabric.

In particular, the data words that are transferred from the logic fabric into the MGT for transmission, as well as data words from the MGT to the logic fabric (received data), are clocked by the outputs of these frequency dividers. The fact that these clocks are used in the interface with the logic fabric makes it possible to apply timing constraints on paths between the MGT’s internal logic and the logic fabric.

For the purpose of this discussion, let’s forget about the clocks inside the MGT, and focus only on those accessible by the logic fabric. It’s already clear that there are two clocks involved, one generated from the local oscillator, based upon the local reference clock (“local” clock), and the CDR clock, which is derived from the arriving data stream. Two clocks, two clock domains.

Clock or clocks used for implementing the protocol

As there are two clocks involved, the question is which clock is used by the logic that processes the data. This is the logic that implements the protocol. The answer is obviously one of the two clocks supplied by the MGT. It’s quite pointless to implement the protocol in a foreign clock domain.

In principle, the logic (in the logic fabric) implementing the protocol could be clocked by both clocks, however the vast majority is usually clocked only by one of them: It’s difficult to implement a protocol across two clock domains, so even if both clocks are used, the actual protocol implementation is always clocked by one of the clocks, and the other clock is used by a minimal amount of logic.

In all practical implementations, the protocol is implemented on the local clock’s domain (the clock used for transmission). The choice is almost obvious: Given that one needs to choose one of the two clocks, the choice is naturally inclined towards the local clock, which is always present and always stable.

The logic running on the CDR clock usually does some minimal processing on the arriving data, and then pushes it into the local clock domain. And this brings us naturally to the next topic.

Crossing clock domains

Every FPGA engineer knows (or should know) that a dual-clock FIFO is the first solution to consider when a clock domain crossing is required. And indeed, this is the most common solution for crossing the clock domain from the CDR clock towards the local clock. It’s the natural choice when the only need is to hand over the arriving data to the local clock domain.

Therefore, several protocol implementations are clocked only by the local clock, and only this clock is exposed by the MGT. The dual-clock FIFO is implemented inside the MGT, and is usually called an “elastic buffer”. This way, all interaction with the MGT is done in one clock domain, which simplifies the implementation.

It’s also possible to implement the protocol with both clocks, and perform the clock domain crossing in the logic fabric, most likely with the help of a FIFO IP provided by the FPGA tools.

To reiterate, it boils down to two options:

  • Doing the clock domain crossing inside the MGT with an “elastic buffer”, and clock the logic fabric only with the local clock.
  • Using both clocks in the logic fabric, and accordingly do the clock domain crossing in the logic fabric.

Preventing overflow / underflow

As mentioned earlier, the two clocks usually have almost the same frequency, with a difference that results from the oscillators’ frequency tolerance. To illustrate the problem, let’s take an example with a bidirectional link of 1 Gbit/s, and the clock oscillators have a tolerance of 10 ppm each, which is considered pretty good. If the transmitter’s clock frequency is 10 ppm above, and the receiver’s frequency is 10 ppm below, there is a 20 ppm difference in the 1 Gbit/s data rate. In other words, the receiver gets 20,000 bits more than it can handle every second: No matter which of the two options mentioned above for clock domain crossing is chosen, there’s a FIFO whose write clock runs 20 ppm faster than the read clock. And soon enough, it overflows.

It can also be the other way around: If the write clock is slower than the read clock, this FIFO becomes empty every now and then. This scenario needs to be addressed as well.

There are several solutions to this problem, and they all boil down to that the transmitter pauses the flow of application data with regular intervals, and inserts some kind of stuffing inbetween to indicate these pauses. There is no possibility to stop the physical data stream, only to send data words that are discarded by the receiver instead of ending up in the FIFO. Recall that the protocol is almost always clocked by the local clock, which is the clock reading from the FIFO. So for example, just inserting some idle time between transmitted packets is not a solution in the vast majority of cases: The packets’ boundaries are detected by the logic that reads from the FIFO, not on the side writing to it. Hence most protocols resort to much simpler ways to mark these pauses.

The most famous mechanism is called skip ordered sets, or skip symbols. It’s the common choice when 8b/10b encoding is used. It takes advantage of the fact mentioned above, that when 8b/10b is used, it’s possible to send K-symbols that are distinguishable from the regular data flow. For example, a SuperSpeed USB transmitter emits two K28.1 symbols with regular intervals. The logic before the FIFO at the receiver discards K28.1 symbols rather than writing them into the FIFO.

It’s also common that the logic reading from the FIFO injects K28.1 symbols when the FIFO is empty. This allows a continuous stream of data towards the protocol logic, even if the local clock is faster than the CDR clock. It’s then up to the protocol logic to discard K28.1 symbols.

There are of course other solutions, in particular when 8b/10b isn’t used. The main point is however that the transmitting side can’t just transmit data continuously. At the very least, there must be some kind of pauses. And as already said, when there are pauses, there are packets between them, even if they don’t have headers and CRCs.

But why not transmit with the CDR clock?

This can sound like an appealing solution, and it’s possible at least in theory: Let one side (“master”) transmit data based upon its local clock, just as described above, and let the other side (“slave”) transmit data based upon the CDR clock. In other words, the slave’s transmission clock follows the master’s clock, so they have exactly the same frequency.

First, why it’s a bad idea to use the CDR clock directly for transmission: Jitter. I’ve already used the word jitter above, but now it deserves an explanation: In theory, a clock signal has a fixed time period between each transition. In practice, the time between each such transition varies randomly. It’s a slight variation, but it can have a devastating effect on the data link’s reliability: As each clock transitions sets the time point at which a new bit is presented on the physical link, by virtue of changing the voltage between the wires, a randomness of the timing has an effect similar to adding noise.

This is why MGTs should always be driven by “clean” reference clocks, meaning oscillators that are a bit more expensive, a bit more carefully placed on the PCB, and have been designed with focus on low jitter.

So what happens if the slave side uses the CDR clock to transmit data? Well, the transmitter’s clock already has a certain amount of jitter, which is the result of the reference clock’s own jitter, plus the jitter added by the PLL that created bit-rate clock from it. The CDR creates a clock based upon the arriving data stream, which usually adds a lot of jitter. That too has the same effect as adding noise to its input, because the receiver samples the analog signal using the CDR clock. However, this effect is inevitable. In order to mitigate this effect, the PLL that generates the CDR clock is often tuned to produce as little jitter as possible, while still being able to lock on the master’s frequency.

As the CDR clock has a relatively high jitter due to how it’s created, using it directly to transmit data is equivalent to adding noise to the physical channel, and is therefore a bad idea.

It’s however possible to take a divided version of the CDR clock (most likely the CDR clock as it appears on the MGT’s output port) and drive one of the FPGA’s output pins with it. That output goes to a “jitter cleaner” component on the PCB, which returns the same clock, but with much less jitter. And the latter clock can then be used as a reference clock to transmit data.

I’ve never heard of anyone attempting the trick with a “jitter cleaner”, let alone tried this myself. I suppose a few skip symbols are much easier than playing around with clocks.

But if the link is unidirectional?

If there’s a physical data link only in one direction, the CDR clock can be used on the receiving side to clock the protocol logic without any direct penalty. But it’s still a foreign clock. The MGT at the receiving side still needs a local reference clock in order to lock the CDR on the arriving data stream.

And as things usually turn around, the same local reference clock becomes the reference for all logic on the FPGA. So using the local clock for receiving data often saves a clock domain crossing between the protocol logic and the rest of it. It becomes a question of where the clock domain crossing occurs.

Conclusion

If data is transmitted through an MGT, it will most likely end up divided into packets. At least one of the reasons mentioned above will apply.

It’s possible to avoid the encapsulation, stripping, multiplexing and error checking of packets by using Xillyp2p. Unlike other protocol cores, this IP core takes care of these tasks, and presents the application logic with error-free and continuous application data channels. The packet-related tasks aren’t avoided, but rather taken care of by the IP core instead of the application logic.

This is comparable with using raw Ethernet frames vs TCP/IP: There is no way around using packets for getting information across a network. Choosing raw Ethernet frames requires the application to chop up the data into frames and ensure that they arrive correctly. If TCP/IP is chosen, all this is done and taken care of.

One way or another, there will be packets on wire.

Messy jots on AppSheets

Introduction

These are the jots I wrote down as I learned my way through AppSheet. So this a messy and not so coherent post. Info is scattered here, not necessarily in a sensible order. This is not a tutorial.

My interest comes from the fact that I’m looking for an apartment, and I want to go through a lot of details in each apartment I visit. So I looked for a tool that allows me to organize the information I collect, from the ad, through the thoughts I have about the apartment before visiting it, and most importantly, to efficiently examine the apartment.

This can be done with a plain paper form, of course, but when there are many details and stages, it’s easy to miss one. So the crucial part of my application is to get a quick final view that ensures me I’ve done everything, just before thanking for the visit. This is the most important part, and it’s also the most difficult to implement.

In hindsight, it took 9 days, which is much more time and effort than I intended and wanted to invest on this. I had the “almost done” feeling more or less from the beginning, but every next step turned out much more difficult (and annoying) than I imagined. AppSheet makes you feel like a code monkey that doesn’t know coding. And if AppSheet is used for a long-term project that is maintained over time, be sure to have a QA team that tests everything in every possible combination. The framework invites silly bugs everywhere.

General

  • In a nutshell, AppSheet takes data from a database (preferably represented as a spreadsheet in Google Sheets) and presents this data in different Views, allowing both displaying and changing the data. The other side of AppSheet is called Automation, its ability to generate files, send emails / SMSes, submit HTTP requests etc. with the information in the database, based upon a template. This can be a pdf for an invoice, or a summary report of an inspection. Or an XML or JSON file, pushed to a web server to initiate some action on the information. So Views are primarily intended to feed data into the system, and Automation covers the consumption of the data for a useful purpose.
  • AppSheet at no cost is unlimited in time and covers every feature of the platform with up to 10 test users. One user, myself, is enough. In free mode, the app can is limited to Prototype status, which has no significance when used only by myself.
  • AppSheet reminds of Drupal in many ways, and it’s not necessarily a good thing. Lots of click-click-click around, everything is graphical, and one ends up feeling like a machine doing a silly task a gazillion times because it’s the same as configuring a Windows Server. Every. Single. Detail. Must. Be. Configured. Manually. With a prayer that there will never be a need to change all those settings that have been repeatedly configured. Plus, small bugs creep in everywhere, as there are a whole lot of details that one must keep track on when making every little change.
  • Another thing common with Drupal is that every little task takes a lot of time, and sometimes the answer is simply “no, that’s impossible”. At least with Drupal, one can hack the PHP code. For example, the one thing that I considered important, but didn’t implement, was to get a list of the items in a checklist that I didn’t check. I didn’t want to implement the checklist as individual Yes/No (boolean) columns, because adding an item would require modifying the table, regenerating the schema, editing the View and then the report. So I did that with an EnumList, but the only information it emits is what was checked. So all in all, there is no way to implement this rather important feature without turning the checklist into a nightmare to maintain in the long run.
  • There are certain limitations on databases and tables, in particular 1000 rows per database in the Free plan and for all plans there are 20 tables per database and 100 columns per table. Don’t worry, you’ll lose your sanity before getting even near these limits.
  • Press the “Save” button to the upper right to make changes visible (e.g. when checking with mobile app). This also clears existing warnings, whether they were tended to or not.
  • Offline use works great: No problem using the mobile phone even with no Intenet coverage. Update to cloud occurs later (may be required to open the app for that). Actually, the only reason I use AppSheets instead of a JavaScript home-cooked application is the ability to work offline.
  • Backing up an app: Impossible. There’s no way to download the app’s own settings, and deleting an app accidentally is easy. The best solution I found for this is to share the app (or better, a copy of it) with another user (being yourself) and make a copy of the app as the other user (note to self: It’s the “obscura” account). And never log in as the other user again. This way, the copy at the other user is safe from accidental tampering. I speculate that the reason for not being able to download the app itself is that AppSheet’s business model is pay-per-user, and it’s free for less than 10 users. So had it been possible to download and restore the app from a file, a lot of no-cost single-user accounts would have been created and deployed.

Random insights

  • Each time “Save” is clicked, a new version is saved. It’s possible to go back to older versions. Version history is saved 7 days back by default.
  • It’s possible to run a preview of the app by right-clicking the three dots to the right of the app’s description in the list of apps. Doing that, the specific version of the app is previewed, ignoring subsequent new saved versions of the app. There’s a misleading “refresh” icon on the app preview. It probably only updates data, not the app itself.
  • If a “Map” view is generated automatically, delete it. Otherwise it will show up as the default view for everything (push marketing, it’s called), and it’s as annoying as it’s pointless.

Sources of info

Creating a new app

This describes how to create an app based upon a spreadsheet. The spreadsheet’s first row consists of the description of each column, and the following rows contain data. Multiple sheets are treated as separate tables, each sheet’s name being the AppSheet table name (so don’t rename the sheets in the spreadsheet).

Upload the .ods or .xlsx file to Google Drive, or start a new Google Sheet.

if you’re already registered with AppSheet, right-click the file in Google Drive, pick Open With and choose AppSheet. An app is generated automatically from the data in the file. The name of the app will be the same as the originating spreadsheet’s file name. The column types (text, number, price etc.) are automatically assigned, but can be changed later on in the Data section. It’s even possible to change the type to Image, in which case pictures that are uploaded or taken with the mobile phone are stored in a subdirectory to where the Google Sheet item is saved with a name like appname_Images/. The value in the spreadsheet is the path to the image file, e.g. appname_Images/J1.Silly.140258.jpg.

It’s best to delete the .ods or .xlsx from Google drive at this stage to avoid confusion: A Google Sheets entry is created in the same directory. Changes made with AppSheet go to this entry, not to the file used to create the app.

Doing the same from within AppSheet (not clear if this is better):

  • Navigate to the page listing apps (AppSheet’s main page), click “+ Create”, pick “App” and then “Start with existing data”.
  • Give the app a name. For category, I choose “Inspection & Surveys”, and then click “Choose your data”.
  • Select an existing Google Sheets entry in your Google Drive account as the database (Google Sheets).
  • Trying to use an .ods file as the data source caused AppSheet to hang (wait indefinitely to complete creating the app).

It’s also possible to use an AppSheet database for this purpose, but that seems to be a leftover from the days before AppSheet became part of Google. There’s no reason I’m aware of to prefer AppSheet database, and there’s no obvious way to make a backup of it, nor to restore. Possible with exporting to CSV with Automation, I suppose, but never tried it.

A directory named “appsheet” is created in Google Drives’ home directory. It contains a directory structure with an empty file called “empty.txt”. A subdirectory is created for each AppSheet app, and files generated by the App go into that subdirectory. It’s possible to select a different target for output files, however (see below).

When an app is created, you get an email with an invitation to use it or develop it.

To do after a while

After playing around with an app for a while, it’s a good idea to make a few fixes. Go to the Views section (click on the third icon from the top) and then on the settings icon (a flywheel).

  • In Views > General, choose the starting View.
  • Also in Views > General, set the Image Upload size to the desired level (High for 1600x900 with my phone, of Full).
  • Be sure to have enabled “Allow drawing on images” on all Image columns (unless you don’t like that option).
  • In Offline mode, enable “Store content for offline use”.
  • Go do the Data view, choose the main table, and opt out deleting rows (the flywheel icon to the top right for a table’s settings). Delete rows directly on the spreadsheet if needed.
  • Change the Default app folder to /theprojectname/appsheet-data or something like that, which should be a valid path in Google Drive. So that the files land somewhere sensible. This is done in the Setting section (flywheel icon to the left), under “Information”.

Views

  • The View displays (or allows editing) a subset of column values of a row of a table. In some cases, several rows are listed in one View. Which columns are displayed, how they are displayed, and if they are editable, is what the View’s configuration is about. But no matter how you twist and turn it, a View shows values of columns (possibly along with the column names). If you want to display anything, turn it into a column (possibly a Virtual column, defined only inside AppSheet, see below).
  • The relation between the AppSheet’s table, which is used in Views and expressions, is not 1:1 with the table in the spreadsheet: Virtual columns are added, either by AppSheet itself (e.g. _RowNumber and Row ID), or by the user. In the latter case, the value of the Virtual Column is an expression that the user supplies. This can be a constant, for example if the purpose is to create a menu button in a View. The expression can also be a formula, which depends on the value of other columns of the same or other tables. In these expressions, [X] is the value of column X of the same row.
  • Clicking / tapping on an displayed item on a View initiates an Action, which often consists of opening a view for a specific row in a table for read-only access or editing. These are system-generated view with names like “main_detail”, “main_form” and “main_inline” for a table named “main”. These Views can be modified like any other view. And a whole lot of other actions can be added and used as the response.
  • It’s often difficult to figure out which View appears in response to clicks, as it’s often a system-generated one. Be sure to have the Edit option on (top-right on the development screen). Hover over the relevant area, wait for the pencil icon to appear, click it and pick e.g. “Edit View”.
  • If a system-generated view is deleted, it’s created anew after saving, apparently with default settings.
  • If a column is added to the spreadsheet’s table, it’s not available for use in the AppSheet immediately. The AppSheet’s table schema needs to be regenerated for this to happen (click the Data icon, click the table, three dots, and then “Regenerate Schema”). Virtual columns are not deleted, despite the scary warning one gets.
  • When clicking / tapping an item can result in an detailed view or the possibility to edit, these are system-generated views, that appear at the bottom left in the Views subsection. These Views can be modified.
  • Views listed under “Primary Navigation” are accessible directly at the bottom part of the screen. Those under “Menu Navigation” are accessible through the menu. “Reference Views” are invisible to the user, except for when requested from other views, for example in an Action (i.e. a response to clicking / tapping an item).
  • There are Format Rules allowing to change colors of items etc depending on boolean expressions. In most cases, they apply to columns and how they are displayed (with a different color, or with a colored icon added). Unfortunately, it’s impossible (as far as I know) to write catch-all rules for several columns, as the expression for activating the rule doesn’t play ball with the [_THIS] expression, which means “this column”. So for example, if you want a rule that marks unfilled columns with a red dot, add (or duplicate) one rule for each and every column, and be sure that the rule for column X doesn’t by mistake change the display format of column Y. It’s just an invitation to make silly bugs.
  • The column’s name is displayed above the column’s value in forms and detail Views by default. This can be changed by opening the column’s settings in the Data section, and change the “Display name”. So use concise names in the spreadsheet.
  • A column of type Show is useful for text, URLs, images etc, that appear in forms for instructions, setting page breaks etc. These are best added as virtual columns with literal values in the expression. Don’t forget to turn off the “Editable” attribute of this column. It’s quite unfortunate that instruction text appears as an extra column along with the spreadsheet’s data, but that’s the way it’s done.
  • To navigate to another view as a result of clicking on a column (possibly a Virtual Column acting as a menu entry):
    • First, an Action needs to be created. Click on the Action icon (looks like electric zap), click on the “+” at the top left, and add an action
    • Pick the table related to the view that this action shall work on, and pick “App: go to another view within this app”.
    • As for the Target, choose the expression LINKTOVIEW([targetview]) (given that “targetview” is the name of the column containing the name of the view to navigate to, as a plain text column).
    • Go to the relevant item in the View, and choose the newly created action under the Behavior section.
    • If the item is a Virtual Column just for the purpose of being a menu item, change its Display Name to ” ” (a normal white space inside quotes) so that the column’s name doesn’t appear on the menu, which is both ugly and consumes space. Choosing an empty string, “”, has no effect.

There are several types of Views, but the most important for a plain spreadsheet-based app are:

  • Form: All columns listed in the “Column order” are shown and editable. The title of each input (text box, drop-down menu etc.) is the name of the column in the database or spreadsheet, unless overridden in the column’s configuration (in the Data section). Unfortunately, a back-swipe on a mobile phone is interpreted as “Cancel”, and there’s no way around this. One must explicitly tap on Save, or else all is info is lost.
  • Details: Like Form, but read-only, with an icon at the bottom right for editing the content, which switches to a Form with the same set of fields. Unfortunately, blank fields are not shown unless their Show property is set to the expression TRUE (just checking the checkbox isn’t good enough, it has to be done with an expression).
  • Table: As the name implies, like Details, but with the data shown as a table (with rows being rows, columns being columns). Selecting a row brings to a Details view of it. It’s possible to configure which columns appear in the table, possibly only one. So this can be a concise way to display a list of rows. It appears like the column marked as Label won’t appear in a Table View even if chosen to do so. Why is unclear.
  • Card and Deck: Each row gets a small pane with two or a few selected column values shown. Selecting one brings to a Details view.
  • Gallery: Shows the image associated with each row (if such exists) and the value of the column marked as Label.
  • Dashboard: A View containing other Views.

Adding a new table

  • In Appsheet, click the Data icon (the second icon from the top).
  • Click on the “+” (Add new data)
  • Navigate to the relevant Google Sheet.
  • Navigate to the related table, and add it.

Settings for each column in a table

In the Data section, there are several attributes one should look at in a table’s configuration for a table:

  • Name: Must match the name in the spreadsheet’s header (or the column name of a database’s table). The app fails to load otherwise.
  • Type: Number, text, image, reference, price, there are a lot of options. The automatic choice is usually correct for the simple textual columns. For more sophisticated input (drop-down menus, images, references etc.) this is the place to configure that.
  • Key: This checkbox is checked for one column only, selecting the key column in the database sense. Relevant in particular when using references.
  • Label: This checkbox is checked for one column only (plus, possibly, an image), selecting which column appears in several views as the row’s main description.
  • Formula: Left blank for columns taken from databases, must contain something for a Virtual Column. When set to something, the value of the column is the expression in this formula.
  • SHOW?: As its name implies, it controls if the column should be shown in views. This checkbox should usually be checked: If a column isn’t desired in a view, it should be removed from there. If the column should be displayed in Details Views even when blank, checking this isn’t enough: The Show propery must be set to TRUE as an expression.
  • EDITABLE?: Can the value of the column change?
  • REQUIRE?: Must the column have a value to allow finishing an edit session containing this column?
  • Initial value: The column’s initial value when a new row is created.
  • Display name: If left blank, the Name field from above appears above the value of this column in forms etc. Display name overrides this otherwise.
  • Description: For internal documentation, and is also used instead of the column name if “Display Name” is left blank.
  • Search: Is this column involved in searches? This is a tricky one: Enabling this means that the row will appear in searches even if the column isn’t displayed in the View, and vice versa: Even if a column is displayed, the search doesn’t take it into consideration unless the Search option is enabled for the column. It would, of course, had made more sense to define this option per View, but that’s not the way it works, unfortunately.
  • Scan, NFC, PII: Can this column accept data from these input methods?(scan means barcode scanning)

Modifying the spreadsheet

It’s possible to modify the spreadsheet even after the app has been created, including adding a column in the middle. The immediate effect is an error (“unable to fetch app definition” on the mobile app, or just an error on the development console). To resolve this, pick the “Data” icon (second from the top”) on the development console, and then click the round arrow (as in “reload”) to regenerate the structure. A scary warning pops up in response to that, but just go ahead.

Note that the connection between AppSheet and the spreadsheet is through the names of the columns as given in the first row, as well as the name of the sheet. These should not be modified (unless the manipulation is intentional).

References etc.

AppSheet definitely supports relational databases. In fact, by choosing names of columns where one is the plural form of the other (“owner” column and “owners” table) the relation is possibly set up automatically. This is however not necessarily a good idea, because the tools choose which column’s value identifies the entire row — and getting this wrong could mess up things later, and it’s difficult to fix it afterwards.

Rather, if a spreadsheet (or virtual) column (in “table A”) is assigned the type “Ref”, it’s also required to assign the table referred to (“table B”). The value of the column selects the row that has a key column equal to it. In the View showing this column, the label column of the referred table is shown. Clicking / tapping on the label leads to a View of that row: its columns’ values as well as other rows referencing it. In order to access a specific column, add a virtual column to Table A with a formula like [the_refcol_in_table_A].[the_col_in_table_B]. It’s not as difficult as is sounds, as the GUI starts suggesting completions when it sees the brackets and the dot.

Note that in Appsheet, a reference actually means the whole row. In many practical situations, only the label column is displayed, but it’s important to remember that conceptually, a Ref column represents the entire row it refers to.

And this brings me to back-reference. When a table B is referenced by a column somewhere in table A, a virtual column is automatically added to table B. This column has the type List and its value is something like REF_ROWS(“table A”, “referring_column”). The arguments are the name of table A and the name of the column with type Ref in that table. So this column lists all rows in table A that refer to the row in table B. Consider deselecting its “SHOW” checkbox in the Data view, if this back and forth is undesired.

Also, note that the back-reference is shown as a list of inline elements on the View of Table B. In order to determine how it appears, configure the system-generated view with name e.g. tableA_Inline (for example, if it’s a table or a deck). Even more important, it allows choosing what happens when one of the elements is clicked: View details (e.g. tableA_Detail view), edit (e.g. tableA_Form) or do something completely different with an Action?

And when references exist, it’s important to get the keys right. It’s possible to choose anything that is guaranteed to be unique, but because references relate to the key, there might be weird side-effects if the value of the key column changes. For example, if we choose a person’s email address as the key, all references to that row are lost if this person changes this address.

What we really want is a solid reference to the row itself. This is best done by adding a dedicated column for that in the table (i.e. in the spreadsheet), and assign it with a unique value when a new row is created: Use UNIQUEID() as this column’s initial value. When AppSheet adds a new row, it will calculate a unique ID and put the value there. Make this column non-editable (possibly also turn off “SHOW” too). In fact, AppSheet does this automatically when I give a column the name “key_id”.

Note that calculating UNIQUEID() in a virtual column is pointless. That’s not a solution for creating a unique ID for a row.

Other insights:

  • Allowing upload of multiple images (or adding other items): Create a table consisting of “key_id” (used as key), “ref”, “image” and “weight” columns. The “ref” column is declared as a Ref type to the table in which the images will be stored. Enable “is part of”, which makes it possible to add new items from the main view. So all in all, there is a list of images in the main view, (of table B). Each row in the view points to a row in table A. This makes it possible to add an arbitrary number of items from a view of a row in the main view. Each item becomes a new row in table A, with a reference to table B. “weight” is a number, allowing to control the order when displaying the image. Using “key_id” as the key, and not “image” (which is the name to the file containing the image) makes it possible to show the same image in different Views, even from different tables.
  • Drop-down menus with text taken from somewhere else: Prepare a table (a sheet, and import it as a table) with keys as numbers and text (or anything else) as values. Then, on the table where we want the drop-down, set the column’s type to Ref, and point to the said tables as the reference. Select the input type to Buttons or Drop-down. In order to allow adding new possible values, set it to Enum (or EnumList for multiple values), and the Base type to Ref. Choose the said table as the target for the reference. Note however that in the latter method, all already existing keys must be added manually to the column’s definitions, exactly like any Enum type. The only difference is that the keys (possibly numbers) are fed manually, not the text. So if adding new possibilities is required, plain Enum is probably the best way.
  • It’s impossible to get a list of options not chosen in an EnumList. It’s possible to write an expression that reduces the chosen elements from a given list with those selected by an EnumList, but that requires keeping this expression and the EnumList’s value in sync. And if one forgets to update the expression after adding it to the EnumList, it’s a nasty bug.

One long row or relational database?

As there are a lot of pieces of information about an apartment, which can be divided into categories and will be handled by different Views, my database manager self was inclined towards a relational database approach. One row with a few columns for each apartment in the “main table”, and then several tables, one for each information category (i.e. View), referenced by the “main table”.

The alternative approach is one table for all information (except for images and other utilities) and to make one long row with all info. It’s up to the Views to select which columns to show: All columns in the original table (or spreadsheet sheet) don’t have to be visible. It’s possible to make several views of the same table. But this means a lot of columns to handle (add virtual columns to that), making it difficult to keep track of the whole thing.

I went for the one long row approach. The main reason was that this approach makes it easier to reorganize the Views if needed. For example, if fields are added over time, and a View becomes too crowded, it’s not a big deal to split it into two Views, if they both refer to the same table anyhow. Or to move a field to another View.

That said, the separate table table approach would definitely have worked: When a new item is added, that means a row in the main table. A new row is created on each separate table by virtue of a menu button (a virtual column with a specified Action). LINKTOFORM() allows opening a form with specific initial values, so the key column can be set in the child table to the parent table’s key column value, ensuring a 1:1 relationship (and prevents duplicate rows).

For reports, the child table is shown as a back-reference to the main tables with a Start/End pair of template tags (maybe there’s a simpler way to do this).

In hindsight, I should have taken the separate table approach, but in a way I didn’t think about at first: There should have been a table with one row for each room, regardless of its type. This would have reduced the number of columns in the main table, and the amount of work and headache is linear to the number of columns. The trick would have been not to show all columns of a room’s table in the form View, but only those suitable for the specific type of room. The rules for which types of rooms exist, and which columns should be shown in the View would be listed in a separate table. Hence the “Show if” rule would look up the column in this rule table, in relation to the room’s type.

Use slices?

The question is how to display a subset of the columns in different Views.

The seemingly natural (and less preferred) way is by using slices, which is a virtual table, calculated on the fly, consisting of a subset of rows and columns as configured. This allows displaying only a subset of rows, columns and actions, filtered with a condition on the data (or just not show all columns). Slices are configured with the “+” icon on each data set (inside the “Data” section) as they are considered some form of table. Alternatively, slices are hinted on the View’s configuration as well.

This would have been the preferred way, had it not been for an annoying caveat: If a new column is added to the database (i.e. sheet), it appears in all slices. This makes slices a no-go for the purpose of selecting columns for a view.

Instead, the columns to display should be selected in the “Column order” part, which can be found in the “View Options” section for each View configuration. Note that “Column order” isn’t present for Views that don’t display columns, e.g. Deck and Gallery.

It makes sense to start from generating an empty view, and copying it for each time a new view is created.

Creating a pdf file from the data of a row

…or for that matter, a CSV / JSON / XML / HTML / XLSX file.

The important point is that this is done with Automation, and that the execution is triggered by an Event, and not as an Action. In other words, clicking something on a View won’t trigger the generation of the report directly. Rather, clicking the entry in the View causes an Action that modifies a row in a table, or to add a row a table, and that, in turn, triggers the Event.

There is no direct connection between the Action and the task it requests — the generation of a file, in this case. It can take 20-30 seconds from the triggering event until the file is created.

One option to request the generation of a file from an app: First, add two columns to the table for which the report is desired: “do_report” and “last_report”. In AppSheet, regenerate the schema if necessary, and set the columns as follows:

  • do_report: Type is “Yes/No” (sweet talk for Boolean), turn off “show”, initial value “FALSE”
  • last_report: Type is “Text”, and set Initial Value to “Never”. “Editable” must remain enabled, even though it shouldn’t be accessible by forms: Its intention is to be set by the Bot, and this is possible only if the column is editable.
  • Add a virtual column named “generate_report”, “Text” type, with expression as follows:
    CONCATENATE("Generate report
    Last report: ", [last_report])

    Note that there is a new line in the middle of the string.

  • Create a new Action (Bzzt icon): For “Do this” pick “Data: set the values of some columns in this row”. Pick the “do_report” column and set the expression to TRUE. I’m going to use the Virtual Column defined above as a menu item, so for Position, pick “Inline” and at “Attach to column”, pick “generate_report”. In Behavior, set “Only if this condition is true” NOT([do_report]), and optionally also enable the request for confirmation.
  • Add the “generate_report” column to the View (or Views) from which the report should be requested.
  • Optionally (and recommended), add a Format rule for generate_report. The condition is [do_report] and if so, the text should be grey (color #888888) and in italics. For “Format these columns and actions” pick generate_report, and possibly also the action created just above. Without this, there is no immediate visual feedback to “pushing the button”.

Now to the part that actually generates the report.

  • Go to the Automation section (robot icon) and create a new Bot.
  • Click “Configure Event” and give it a name.
  • In the pane to the right, the Event source should be “App”. Pick the table from which the Event is triggered.
  • Select only “Updates” for Data change type.
  • Condition: [do_report]
  • Next, define what should be executed — the Process. Add Steps, which is more or less like a line in a computer program: Each of them can perform something, it can check a condition and branch, or wait for the condition to be true. Or call a process. These are steps to define (give each some name, as usual) and configure them on the pane to the right.
    • Type: “Run a data action”. In the right pane, pick “Set row values”, and set “do_report” to FALSE.
    • Type: “Run a task”. Set up the task as follows in the right pane:
      • Pick Create a new file
      • HTTP Content Type: pdf
      • Template: Click Create (or pick an existing template, if you have one from previous sessions). A “View” button appears after this is completed. More on templates in a separate section below.
      • Optional: Change File Folder Path to “/” (with the quotes), File Name Prefix to “report” (once again, with the quotes) and opt-in Disable Timestamp. This way, report.pdf is created at the project’s root directory (not Google Drive’s root directory in most cases). Expressions can be used for these, of course.
    • Type: “Run a data action”. In the right pane, pick “Set row values”, and set last_report to TEXT(NOW(), “DD/MM/YYYY HH:MM:SS”) for British time format.

Note that it’s possible to create a pdf file, but the template file is an HTML file. This makes it much easier to handle the template, and it’s always possible to go back to the HTML file to see what went wrong. The disadvantage with an HTML template is that the file generation fails on the tiniest HTML syntax error. For example, a missing </tr> tag. Or even more ridiculous, a missing </p> tag.

In order to view the generated file by clicking a menu item in a View, create a Virtual Row for this purpose, and assign it with an Action when selected. This action should be “External: Open a file”. The “File” entry is a URL to where the file is stored on Google drive. Obtain this link with the Share feature, so the expression becomes something like “https://drive.google.com/file/d/1d8AWHjLJdK9cyhYGBPnuVJJ0asmLmZVw/view?usp=sharing” (the quotes included). This is not a security risk as one chooses to share only with oneself (when obtaining the link from Google Drive).

If the execution fails due to a malformed template file (something inside the markups didn’t work fine), the app’s View doesn’t always get updated until a restart of the app (or reload the entire page on a web browser).

Actually, the way the whole thing behaves is that the menu button gets greyed out immediately after being pressed, and then gets back to normal after the file has been generated. But the color is determined by do_report’s status, which is turned back to FALSE before the attempt to generate a file. So if the file generation takes time, it may return to normal before the file is updated — but this has never had any practical consequence for me. But why doesn’t it go back to normal when the file generation fails? do_report is FALSE either way.

To view error messages: Go to the “Monitor” section (icon at the left side, one from the bottom), and pick “Launch Log Analyzer” in the “Audit History” part. After some graphs, there’s a table with the last tasks initiated. Where there is an error, click the binocular icon, for details. A JSON text entry appears. The “Errors” key is where the error message is shown. It can take a few minutes before the entry with the error is shown, and before than, there may be entries showing success even if an error has occurred. So wait a few minutes before ruling out a failed run.

Templates

A template is a file in Doc or HTML format. Everything is passed through to the output document, except for markups with the <<expression>> form (or, for an HTML template, &lt;&lt;expression&gt;&gt;. The expression is evaluated exactly in the same way as the value of a Virtual Column, and the markup is substituted with the expression’s result. Hence to obtain a substitution with the value of a column named “thecol”, just insert <<[thecol]>>.

No matter if the output is pdf or HTML, when using HTML templates, be sure that the HTML file has all the classic wrapper tags: <html>, <head>, <body> etc. or else AppSheet ignores all <<something>> (or actually, &lt;&lt;something&gt;&gt;) markups altogether and the output is the template file itself.

There are special kinds of markups:

  • Start/End for looping on a list of keys to a table
  • If/EndIf markups, allowing a segment to be included or not. Apparently not available with HTML templates, as it’s not listed explicitly, and I failed to use it (so I used the IF() function instead).

The automatically created template is simple and not necessarily very helpful: This template encompasses all real columns (but not the virtual ones), showing the column’s name (not its display name) and their plain values. Its most valuable part is where it shows the use of Start/End (when relevant, i.e. when there are back-references to the displayed table). Note that as shown in the template, the Start and End tags must be inside a <p> block. They can’t be just a floating piece of code, or within another couple of tags (<div> doesn’t work, for example). Otherwise one gets a misleading error message saying the table with name “Start:” can’t be found + a lot of error messages for everything between the two tags.

The template is annoying in particular in relation to Ref types: The actual value, which is the key itself, is shown for references, and not the referenced value, as seen on the app’s Views. It would have been sweet of AppSheet to look up which of the referenced table’s columns is displayed in the View, and show that. But there’s another reason for this: Say, that the Ref column is “choice” and the column to display on the referenced table is “label”. The expression for the value to display is [choice].[label]. But if “choice” happens to be blank, and this expression appears anywhere in the template, the file generation fails. So be sure to assign an initial value to all Ref columns when creating a new row. Plus, possibly make a simple sanity check on the column, just in case:

<<IF(ISBLANK([choice]), "Error!", [choice].[label])>>

Not sure if this is bullet-proof, but it solves the most obvious problem.

I’ve also got the impression that if the result of the expression of an AppSheet markup expression returns anything containing HTML markups (e.g. <b> and </b>) the file generation fails completely.

Notes to self

These are cryptic notes to myself, for my own apps’ housekeeping tasks.

First of all, the account I’ve shared stuff with is “obscura”.

Adding/removing a new item to a checklist

  • Update the related checkcnt expression with the new number of items, so that the report says all is done when it really is.

Adding a new column

  • Be sure to regenerate the schema of the related table in AppSheet.
  • Possibly add (or duplicate) an “Unfilled X” Format rule (e.g. ISBLANK([streetaddress]) ), if it matters that the field is empty. Mark with Cyan dot, and don’t forget to set the affected column correctly.
  • Set the SHOW property of the column to the expression TRUE if it’s supposed to appear in the detailed view, even if it’s blank. Just clicking on the “SHOW” checkbox isn’t good enough.
  • Set the Display property to something friendly

The striked-out bullets were relevant when I thought about using Detail view as the summary of all. I used a pdf report now, so they’re not relevant.

Adding a new View Form

  • There should be a status_* column in relation to this in the spreadsheet. Its setting inside AppSheet: Ref to const_group_status with Buttons (not “part of”), enable Show, Editable, Require, Initial Value is “0″ (with quotes), Display name is “Status”. The four checkboxes to the right are left unchecked.
  • In the main table, add a Virtual Column with the same name as the Form View it relates to, e.g. view_base for status_base (so it’s easier to match between them later). Its type is Text, and its value is the relevant menu item’s description.
  • Only now, create a new Form view. In Column order, pick Manual, select all, unselect those not required. Be sure to keep the relevant status_* column.
  • Go to the “Actions” section (Bzzzt icon) and add a new action (actually, duplicate a similar one). Name it e.g. “Jump to base form”. For a record of “main”, Do this is “App: go to another view within this app”. Set Target to LINKTOROW([key_id], “view_base”) where “view_base” is the name of the View. Set Position to Inline, and Attach to column as the related Virtual Column. Pick a suitable icon too. Unfortunately, there is no catch-all action for navigating.
  • Duplicate a couple of “Mark incomplete” format rules, and adapt them to the new View: Change the expression to match the correct status_* column, and also the “Format these columns” to the relevant “view_*” with the same name. And test it. There’s no way around a couple of rules for each View.

Un-ignore /usr/lib/systemd/ in .gitignore with git repo on root filesystem

Actually, this is about un-ignoring any subdirectory that is grandchild to an ignored directory.

Running Linux Mint 22.2 (based upon Ubuntu 24.04), and having a git repository on root filesystem to keep track of the computer’s configuration, the vast majority of directories are ignored. One of the is /lib, however /lib/systemd/ should not be ignored, as it contains crucial files for the system’s configuration.

On other distributions, the relevant part in .gitignore usually goes:

[ ... ]
bin/
boot/
dev/
home/
lib/*
!lib/systemd/
lib64/
lib32/
libx32/
lost+found/
media/
mnt/
opt/
proc/
root/
run/
sbin/
[ ... ]

So lib/ isn’t ignored as a directory, but all its content, including subdirectories is. That allows for un-ignoring lib/systemd/ on the following row. That’s why lib/ isn’t ignore-listed like the other ones.

But on Linux Mint 22.2, /lib is a symbolic link to /usr/lib. And since git treats a symbolic link just like a file, /lib/systemd/ is treated as /usr/lib/systemd. Ignoring /lib as a directory has no effect, and un-ignoring /lib/systemd has no effect, because to git, this directory doesn’t even exist.

So go

$ man gitignore

and try to figure out what to do. It’s quite difficult actually, but it boils down to this:

usr/*
!usr/lib/
usr/lib/*
!usr/lib/systemd/

It’s a bit tangled, but the point is that /usr/lib is un-ignored, then all its files are ignored, and then /usr/lib/systemd is un-ignored.

The only good part about this solution is that it works.

Footnote and whole-page layout with wkhtmltopdf

This HTML code makes wkhtmltopdf create a single page with a footnote. If the external <div> is duplicated, separate pages are generated.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<div style="height: 1350px; display: flex; flex-direction: column; break-inside: avoid; border:1px solid #668;">
This is just a test.
<div style="margin-top: auto;">
This is a footnote
</div>
</div>
</body>
</html>

So how does it work? The important part is the “style” attribute of the outer <div> tag:

  • height: 1350px: This sets the <div> block’s height to a full A4 page. Why 1350 pixels? I don’t know. I just tweaked with this figure until it got right. It’s possible another figure is needed on a different version of wkhtmltopdf. I’ve tried to set this with cm as well as pt units, but none corresponded to the standard figures for an A4 page. So I went with pixels, which clarifies that it’s a wild guess.
  • display: flex; flex-direction: column: This turns this <div> block into a Flexbox container, with vertical packing. This is needed to push the footnote’s block to the bottom.
  • break-inside: avoid: This tells wkhtmltopdf to avoid page breaks in the middle of the block. This makes no difference for a single page, but if this <div> block is repeated, this style attribute ensures that each block gets a separate page (unless any of the pages exceeds a page’s height).
  • border:1px solid #668: This generates a border around the <div> block’s occupied area. Used only for finding the correct height attribute, and should should be removed afterwards (unless this border is desired on every page).

The footnote is pushed to the bottom of the page by virtue of the margin-top: auto style attribute and the fact that the <div> block having this attribute is within a vertical packed Flexbox container.

Notes:

  • This was done with wkhtmltopdf 0.12.4, without the “wkhtmltopdf patches” according to the man page.
  • If the height is too large on any page, all break-inside are ignored. In other words, the whole pdf document gets garbled, not just around the page that went wrong.
  • I tried changing the resolution on my X11 display, and it didn’t make any difference. This might sound like a silly thing to check, but wkhtmltopdf depends on the X11 server.

Notes on installing Linux Mint 22.2 (full disk encryption)

Introduction

These are my notes to self for the next time I install a Linux system. As if I read my previous posts before attempting.

So I installed Linux Mint 22.2 (kernel 6.14.0-29) on a Lenovo V14 G4 IRU laptop. With Cinnamon, of course, not that it’s relevant.

All that I wanted was a full-disk encryption, but being allowed to choose the setup of the partitions explicitly, and not let the installation wizard make the choices for me. In particular, I wanted a swap partition with the size I choose, and even more important: Almost all disk space in a /storage mount, so that one can fill the hard disk with junk without risking a system failure because root partition is full.

Cutting the cake

/boot/efi is where the BIOS reads from. It’s natural to put it as the first partition, and it can be very small (even 1 MB can be enough in some cases, but let’s not push it like I eventually did). But if you make it really small, it’s a FAT16 partition, not FAT32, and that’s OK. Don’t force it into FAT32, because the system won’t boot with it if it has less clusters than required.

So it goes like this:

  • Create three partitions:
    • First one for /boot/efi (e.g. nvme0n1p1), 10 MB. This must be FAT32 or FAT16 (the latter for 10MB). Note that 10MB is a bit too small, because the BIOS won’t have room for its own backup this way.
    • Second one for /boot, will contain initramfs images, so ~500 MB. Any filesystem that GRUB can read (so ext4 is definitely OK)
    • Third partition for LUKS
  • In the LUKS partition, create an LVM with partitions for / (100G) and swap. The rest is for /storage.

It’s somewhat confusing that /boot/efi is a subdirectory of /boot, but that’s the way it is.

Running the installation wizard

  • Unlock the encrypted partition, if it’s not already (e.g. with the “Disks” GUI utility). This requires giving the passphrase, of course.
  • Double-click the “Install Linux Mint” icon on the desktop.
  • When reaching the “Installation type”, pick “Something else”.
  • Set the following mount points:
    • Set the small FAT partition (nvme0n1p1 in my case) as “EFI System Partition”
    • /boot on the partition allocated for that (non-encrypted partition, possibly ext4).
    • / on the relevant LVM partition inside the encrypted block
    • Set the swap partition
  • Set the “Device for boot loader installation” to the one allocated for “EFI System Partition” (nvme0n1p1) in my case. One may wonder why this isn’t done automatically. Note that it’s the first partition (/dev/nvme0n1p1) and not the entire disk (/dev/nvme0n1).
  • Don’t do anything with the planned /storage partition. As I don’t want to assign it with a mounting point, handle it after the installation is done.

If the installation ends with a failure to install GRUB, run “journalctl” on a terminal window and look for error messages from the grub installer. Don’t ask ChatGPT to help you with solving any issues, and don’t ask me why I know it’s a bad idea.

When I insisted on FAT32

Sometimes I’m too much of a control freak, and when the Disks utility formatted the EFI partition into FAT16, I thought, oh no, it should be FAT32, what if the BIOS won’t play ball?

Well, that was silly of me, and also silly to ignore the warning about a FAT32 filesystem with just 10 MB having too few clusters.

So even though the installer wizard finished successfully, there was no option to boot from the disk. Secure boot was disabled, of course. And yet, there was no suitable option in the BIOS’ boot menu. There was a “UEFI” option there, which is always in black (not possible to select), but that doesn’t seem to be relevant.

Following the warm advice of ChatGPT, I added an entry while in Live USB mode:

# efibootmgr -c -d /dev/nvme0n1p1 -p 1 -L "Linux Mint" -l '\EFI\ubuntu\grubx64.efi' 
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0003,2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC
Boot0003* Linux Mint	HD(1,GPT,12345678-aaaa-bbbb-cccc-dddddddddddd,0x800,0x5000)/File(\EFI\ubuntu\grubx64.efi)

(some identifying numbers replaced trivially)

Cute, heh? But it made no difference. After rebooting with a Live USB again:

# efibootmgr
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC

So the entry was gone.

I changed the EFI partition to FAT16, ran through the installation all the way again. And immediately after the installation was done (before booting to start from disk for the first time):

# efibootmgr
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0003,2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot0003* Ubuntu	HD(1,GPT,12345678-aaaa-bbbb-cccc-dddddddddddd,0x800,0x5000)/File(\EFI\ubuntu\shimx64.efi)
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC

This time, when the laptop went on, the BIOS came up with a “Please don’t power off while completing system update”. What it actually did was to write its own backup file into the EFI partition, which appears as /boot/efi/BackupSbb.bin. Actually, it doesn’t seem like it was successful, as the space in the partition ran out. So I deleted this file and turned off the “BIOS Self-Healing” option in the BIOS’ configuration (it will be much worse if it attempts to self-heal on a faulty backup file).

At this point, there was an “ubuntu” entry in the list of boot options in BIOS’ boot menu (Not “Ubuntu”, but “ubuntu”, probably referring to the directory and not the name). And the black “UEFI” option remained in the option list, not possible to choose. So this is why I don’t think it’s relevant.

Asking for a passphrase is too much to ask?

Having reached this far, I got a nice Linux Mint logo on the screen, however nothing happened, and then I got thrown into a initramfs rescue shell. In other words, no attempt to unlock the encrypted partition.

So I ran the live USB again, unlocked the root partition and mounted it as /mnt/root/.

Then, as root (sudo su), bind-mounted the essential directories into the root filesystem:

# for d in /dev /dev/pts /proc /sys /run; do mount --bind $d /mnt/root/$d ; done

And then chrooted into it.

# chroot /mnt/root

Of course, there was no /etc/crypttab, so no wonder that the installation didn’t take unlocking the encrypted partition into account.

So I followed my own instruction from a previous post. First, mount /boot and /boot/efi with

# mount -a

and then check for the UUID of the encrypted partition:

# cryptsetup luksUUID /dev/nvme0n1p2
11223344-5566-7788-99aa-bbccddeeff00

and then add /etc/crypttab reading

luks-11223344-5566-7788-99aa-bbccddeeff00 UUID=11223344-5566-7788-99aa-bbccddeeff00 none luks

Note that the luks-{UUID} part is as the name of the partition as it appears in /dev/mapper. In this case, this was what the Disks GUI utility chose. Had I done this with command line, I could have chosen a shorter name. But who cares.

And finally, edit /etc/default/grub for your preferences, update initramfs and GRUB, exactly as already mentioned in that post:

# update-initramfs -u
# update-grub
# grub-install

It was really exactly the same as the previous post. And then reboot, and all was finally fine.

And by the way, the initrd file is 77 MB. Running update-initramfs again didn’t make it smaller. Not a big deal with a flash disk, anyhow.

But GRUB can open LUKS too!

GRUB has cryptodisk and luks modules which can open an encrypted partition, so in principle it can read the kernel from an encrypted root partition. However there is no mechanism I’m aware of to pass over the unlocked encrypted partition to the kernel, so it would be necessary to supply the passphrase twice when booting.

This is why I went for two partitions for booting. I guess this still is the only sane way.

Converting vtt to srt subtitles with a simple Perl script

I tried to use ffmpeg to convert an vtt file to srt, but that didn’t work at all:

$ ffmpeg -i in.vtt out.srt
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)

I tried a whole lot of suggestions from the Internet, and eventually I gave up.

So I wrote a simple Perl script to get the job done. It took about 20 minutes, because I made a whole lot of silly mistakes:

#!/usr/bin/perl

use warnings;
use strict;

my $n = 1;
my $l;

my $timestamp_regex = qr/[0-9]+:[0-9]+:[0-9:\.]+/; # Very permissive

while (defined ($l = <>)) {
  my ($header) = ($l =~ /^($timestamp_regex --> $timestamp_regex)/);
  next unless (defined $header);

  $header =~ s/\./,/g;

  print "$n\n";
  print "$header\n";

  $n++;

  while  (defined ($l = <>)) {
    last unless ($l =~ /[^ \t\n\r]/); # Nothing but possibly whitespaces

    print $l;
  }
  print "\n";
}

$n--;
print STDERR "Converted $n subtitles\n";

Maybe not a piece of art, and it can surely be made more accurate, but it does the job with simply

$ ./vtt2srt.pl in.vtt > out.srt
Converted 572 subtitles

And here’s why Perl is a pearl.

Long list of IP addresses attacking a phpBB forum in May 2025

A good-old phpBB forum that I run had a massive surge of traffic somewhere in May 2025. This had already started a couple of months earlier, but in May it knocked down the web server due to the number of apache instances. Plus the logs reached ~ 3 GB.

This doesn’t seem to be a DDoS attack, mainly because the access from each IP address was far more scarce than a legit bot: There would be several minutes, if not hours between each request. With the huge amount of bots involved, it would be easy to completely knock out the website with a moderate access pattern from each IP address. Besides, there’s no reason to attack the specific forum. It more looks like a very brute-force attempt to hack the forum itself for the purpose of spamming or something.

One could turn to Cloudflare in situations like this, but I tried the DIY approach. Namely, to block the IP addresses of the bots by virtue of the firewall, as I discuss in a separate post of mine. The strategy with these bots was somewhat different: As the phpBB related bots occasioanally did something that no regular user would do (what exactly I’ll keep to myself) it was enough to detect this event and blacklist the IP address right away.

Which I did. After a couple of day, the list landed on ~78,000 IP addresses. Note that this is larger than ipset’s default number of allowed elements, which is 65536. So when creating the ipset, be sure to use maxelem with a larger number (I used 524288 when I didn’t know how bad the situation was).

I checked a few of these addresses, and they appear to origin from Asia (Vietnam, Indonesia etc.).

The nice thing is that 24-48 hours after I enabled the related blacklisting mechanism, the attack virtually stopped. The firewall recorded no more than ~30 dropped packets per minute on the ipset-matching rule. True victory.

The list of IPs can be downloaded here, in case it helps someone understand what’s going on. If you do have an idea, please comment below.

It’s a plain zip of a list of IP addresses, one address per row. It’s possible that a few IP unrelated addresses appear on this list, which could be other kinds of bots that were blacklisted on the same server. Maybe a dozen of these.

Ehm, no

The silence was short-lived. A couple of days later, the bots came again, this time in a much larger scale. Within a few days, I collected ten times as many IP addresses, that is ~840,000 of them. That’s a really large number. This larger list of IPs can be downloaded here. It was obtained exactly the same as the previous one.

But after this swarm, it got silent again. For a while. In the days that followed, the same traffic pattern came back, IPs were blacklisted (~250k of them at some point) and so it went on. But with the blacklisting mechanism, the load on the server was quite small, so for now I consider the problem solved. Let’s hope it stays this way.

Measuring how much RAM a Linux service eats

Introduction

Motivation: I wanted to move a service to another server that is dedicated only to that service. But how much RAM does this new server need? RAM is $$$, so too much is a waste of money, too little means problems.

The method is to run the service and expose it to a scenario that causes it to consume RAM. And then look at the maximal consumption.

This can be done with “top” and similar programs, but these show the current use. I needed the maximal RAM use. Besides, a service may spread out its RAM consumption across several processes. It’s the cumulative consumption that is interesting.

The appealing solution is to use the fact that systemd creates a cgroup for the service. The answer hence lies in the RAM consumption of the cgroup as a whole. It’s also possible to create a dedicated cgroup and run a program within that one, as shown in another post of mine.

This method is somewhat crude, because this memory consumption includes disk cache as well. In other words, this method shows how much RAM is consumed when there’s plenty of memory, and hence when there’s no pressure to reclaim any RAM. Therefore, if the service runs on a server with less RAM (or the service’s RAM consumption is limited in the systemd unit file), it’s more than possible that everything will work just fine. It might run somewhat slower due to disk access that was previously substituted by the cache.

So using a server with as much memory as measured by the test described below (plus some extra for the OS itself) will result in quick execution, but it might be OK to go for less RAM. A tight RAM limit will cause a lot of disk activity at first, and only afterwards will processes be killed by the OOM killer.

Where the information is

All said in this post relates to Linux kernel v4.15. Things are different with later kernels, not necessarily for the better.

There are in principle two versions of the interface with cgroup’s memory management: First, the one I won’t use, which is cgroup-v2 (or maybe this doc for v2 is better?). The sysfs files for this interface for a service named “theservice” reside in /sys/fs/cgroup/unified/system.slice/theservice.service.

I shall be working with the memory control of cgroup-v1. The sysfs files in question are in /sys/fs/cgroup/memory/system.slice/theservice.service/.

If /sys/fs/cgroup/memory/ doesn’t exist, it might be necessary to mount it explicitly. Also, if system.slice doesn’t exist under /sys/fs/cgroup/memory/ it’s most likely because systemd’s memory accounting is not in action. This can be enabled globally, or by setting MemoryAccounting=true on the service’s systemd unit (or maybe any unit?).

Speaking of which, it might be a good idea to set MemoryMax in the service’s systemd unit in order to see what happens when the RAM is really restricted. Or change the limit dynamically, as shown below.

And there’s always the alternative of creating a separate cgroup and running the service in that group. I’ll refer to my own blog post again.

Getting the info

All files mentioned below are in /sys/fs/cgroup/unified/system.slice/theservice.service/ (assuming that the systemd service in question is theservice).

The maximal memory used: memory.max_usage_in_bytes. As it’s name implies this is the maximal amount of RAM used, measured in bytes. This includes disk cache, so the number is higher than what appears in “top”.

The memory currently used: memory.usage_in_bytes.

For more detailed info about memory use: memory.stat. For example:

$ cat memory.stat 
cache 1138688
rss 4268224512
rss_huge 0
shmem 0
mapped_file 516096
dirty 0
writeback 0
pgpgin 36038063
pgpgout 34995738
pgfault 21217095
pgmajfault 176307
inactive_anon 0
active_anon 4268224512
inactive_file 581632
active_file 401408
unevictable 0
hierarchical_memory_limit 4294967296
total_cache 1138688
total_rss 4268224512
total_rss_huge 0
total_shmem 0
total_mapped_file 516096
total_dirty 0
total_writeback 0
total_pgpgin 36038063
total_pgpgout 34995738
total_pgfault 21217095
total_pgmajfault 176307
total_inactive_anon 0
total_active_anon 4268224512
total_inactive_file 581632
total_active_file 401408
total_unevictable 0

Note the “cache” part at the beginning. It’s no coincidence that it’s first. That’s the most important part: How much can be reclaimed just by flushing the cache.

On a 6.1.0 kernel, I’ve seen memory.peak and memory.current instead of memory.max_usage_in_bytes and memory.usage_in_bytes. memory.peak wasn’t writable however (neither in its permissions nor was it possible to write to it), so it wasn’t possible to reset the max level.

Setting memory limits

It’s possible to set memory limits in systemd’s unit file, but it can be more convenient to do this on the fly. In order to set the hard limit of memory use to 40 MiB, go (as root)

# echo 40M > memory.limit_in_bytes

To disable the limit, pick an unreasonably high number, e.g.

# echo 100G > memory.limit_in_bytes

Note that restarting the systemd service has no effect on these parameters (unless a memory limit is required in the unit file). The cgroup directory remains intact.

Resetting between tests

To reset the maximal value that has been recorded for RAM use (as root)

# echo 0 > memory.max_usage_in_bytes

But to really want to start from fresh, all disk cache needs to be cleared as well. The sledge-hammer way is going

# echo 1 > /proc/sys/vm/drop_caches

This frees the page caches system-wide, so everything running on the computer will need to re-read things again from the disk. There’s a slight and temporary global impact on the performance. On a GUI desktop, it gets a bit slow for a while.

A message like this will appear in the kernel log in response:

bash (43262): drop_caches: 1

This is perfectly fine, and indicates no error.

Alternatively, set a low limit for the RAM usage with memory.limit_in_bytes, as shown above. This impacts the cgroup only, forcing a reclaim of disk cache.

Two things that have no effect:

  • Reducing the soft limit (memory.soft_limit_in_bytes). This limit is relevant only when the system is in a shortage of RAM overall. Otherwise, it does nothing.
  • Restarting the service with systemd. It wouldn’t make any sense to flush a disk cache when restarting a service.

It’s of course a good idea to get rid of the disk cache before clearing memory.max_usage_in_bytes, so the max value starts without taking the disk cache into account.