Upgrading from Ubuntu 18.04 bionic to 20.04 focal

I recently upgraded from Ubuntu 18.04.5 (bionic) to 20.04.1 (focal) and it was one of the roughest Ubuntu upgrades I've gone through in a while. Here are the notes I took on avoiding or fixing the problems I ran into.

Preparation

Before going through the upgrade, I disabled the configurations which I know interfere with the process:

  • Enable etckeeper auto-commits before install by putting the following in /etc/etckeeper/etckeeper.conf:

      AVOID_COMMIT_BEFORE_INSTALL=0
    
  • Remount /tmp as exectuable:

      mount -o remount,exec /tmp
    

Another step I should have taken but didn't, was to temporarily remove safe-rm since it caused some problems related to a Perl upgrade happening at the same time:

apt remove safe-rm

Network problems

After the upgrade, my network settings weren't really working properly and so I started by switching from ifupdown to netplan.io which seems to be the preferred way of configuring the network on Ubuntu now.

Then I found out that netplan.io is not automatically enabling the systemd-resolved handling of .local hostnames.

I would be able to resolve a hostname using avahi:

$ avahi-resolve --name machine.local
machine.local   192.168.1.5

but not with systemd:

$ systemd-resolve machine.local
machine.local: resolve call failed: 'machine.local' not found

$ resolvectl mdns
Global: no
Link 2 (enp4s0): no

The best solution I found involves keeping systemd-resolved and its /etc/resolv.conf symlink to /run/systemd/resolve/stub-resolv.conf.

I added the following in a new /etc/NetworkManager/conf.d/mdns.conf file:

[connection]
connection.mdns=1

which instructs NetworkManager to resolve mDNS on all network interfaces it manages but not register a hostname since that's done by avahi-daemon.

Then I enabled mDNS globally in systemd-resolved by setting the following in /etc/systemd/resolved.conf:

MulticastDNS=yes

before restarting both services:

systemctl restart NetworkManager.service systemd-resolved.service

With that in place, .local hostnames are resolved properly and I can see that mDNS is fully enabled:

$ resolvectl mdns
Global: yes
Link 2 (enp4s0): yes

Boot problems

For some reason I was able to boot with the kernel I got as part of the focal update, but a later kernel update rendered my machine unbootable.

Adding some missing RAID-related modules to /etc/initramfs-tools/modules:

raid1
dmraid
md-raid1

and then re-creating all initramfs:

update-initramfs -u -k all

seemed to do the trick.

Making an Apache website available as a Tor Onion Service

As part of the #MoreOnionsPorFavor campaign, I decided to follow brave.com's lead and make my homepage available as a Tor onion service.

Tor daemon setup

I started by installing the Tor daemon locally:

apt install tor

and then setting the following in /etc/tor/torrc:

SocksPort 0
SocksPolicy reject *
HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 80 [2600:3c04::f03c:91ff:fe8c:61ac]:80
HiddenServicePort 443 [2600:3c04::f03c:91ff:fe8c:61ac]:443
HiddenServiceVersion 3
HiddenServiceNonAnonymousMode 1
HiddenServiceSingleHopMode 1

in order to create a version 3 onion service without actually running a Tor relay.

Note that since I am making a public website available over Tor, I do not need the location of the website to be hidden and so I used the same settings as Cloudflare in their public Tor proxy.

Also, I explicitly used the external IPv6 address of my server in the configuration in order to prevent localhost bypasses.

After restarting the Tor daemon to reload the configuration file:

systemctl restart tor.service

I looked for the address of my onion service:

$ cat /var/lib/tor/hidden_service/hostname 
ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion

Apache configuration

Next, I enabled a few required Apache modules:

a2enmod mpm_event
a2enmod http2
a2enmod headers

and configured my Apache vhosts in /etc/apache2/sites-enabled/www.conf:

<VirtualHost *:443>
    ServerName fmarier.org
    ServerAlias ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion

    Protocols h2, http/1.1
    Header set Onion-Location "http://ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion%{REQUEST_URI}s"
    Header set alt-svc 'h2="ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion:443"; ma=315360000; persist=1'
    Header add Strict-Transport-Security: "max-age=63072000"

    Include /etc/fmarier-org/www-common.include

    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/fmarier.org/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/fmarier.org/privkey.pem
</VirtualHost>

<VirtualHost *:80>
    ServerName fmarier.org
    Redirect permanent / https://fmarier.org/
</VirtualHost>

<VirtualHost *:80>
    ServerName ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion
    Include /etc/fmarier-org/www-common.include
</VirtualHost>

Note that /etc/fmarier-org/www-common.include contains all of the configuration options that are common to both the HTTP and the HTTPS sites (e.g. document root, caching headers, aliases, etc.).

Finally, I restarted Apache:

apache2ctl configtest
systemctl restart apache2.service

Testing

In order to test that my website is correctly available at its .onion address, I opened the following URLs in a Brave Tor window:

I also checked that the main URL (https://fmarier.org/) exposes a working Onion-Location header which triggers the display of a button in the URL bar (recently merged and available in Brave Nightly):

Testing that the Alt-Svc is working required using the Tor Browser since that's not yet supported in Brave:

  1. Open https://fmarier.org.
  2. Wait 30 seconds.
  3. Reload the page.

On the server side, I saw the following:

2a0b:f4c2:2::1 - - [14/Oct/2020:02:42:20 +0000] "GET / HTTP/2.0" 200 2696 "-" "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0"
2600:3c04::f03c:91ff:fe8c:61ac - - [14/Oct/2020:02:42:53 +0000] "GET / HTTP/2.0" 200 2696 "-" "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0"

That first IP address is from a Tor exit node:

$ whois 2a0b:f4c2:2::1
...
inet6num:       2a0b:f4c2::/40
netname:        MK-TOR-EXIT
remarks:        -----------------------------------
remarks:        This network is used for Tor Exits.
remarks:        We do not have any logs at all.
remarks:        For more information please visit:
remarks:        https://www.torproject.org

which indicates that the first request was not using the .onion address.

The second IP address is the one for my server:

$ dig +short -x 2600:3c04::f03c:91ff:fe8c:61ac
hafnarfjordur.fmarier.org.

which indicates that the second request to Apache came from the Tor relay running on my server, hence using the .onion address.

Using a Let's Encrypt TLS certificate with Asterisk 16.2

In order to fix the following error after setting up SIP TLS in Asterisk 16.2:

asterisk[8691]: ERROR[8691]: tcptls.c:966 in __ssl_setup: TLS/SSL error loading cert file. <asterisk.pem>

I created a Let's Encrypt certificate using certbot:

apt install certbot
certbot certonly --standalone -d hostname.example.com

To enable the asterisk user to load the certificate successfuly (it doesn't have permission to access the certificates under /etc/letsencrypt/), I copied it to the right directory:

cp /etc/letsencrypt/live/hostname.example.com/privkey.pem /etc/asterisk/asterisk.key
cp /etc/letsencrypt/live/hostname.example.com/fullchain.pem /etc/asterisk/asterisk.cert
chown asterisk:asterisk /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
chmod go-rwx /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key

Then I set the following variables in /etc/asterisk/sip.conf:

tlscertfile=/etc/asterisk/asterisk.cert
tlsprivatekey=/etc/asterisk/asterisk.key

Automatic renewal

The machine on which I run asterisk has a tricky Apache setup:

  • a webserver is running on port 80
  • port 80 is restricted to the local network

This meant that the certbot domain ownership checks would get blocked by the firewall, and I couldn't open that port without exposing the private webserver to the Internet.

So I ended up disabling the built-in certbot renewal mechanism:

systemctl disable certbot.timer certbot.service
systemctl stop certbot.timer certbot.service

and then writing my own script in /etc/cron.daily/certbot-francois:

#!/bin/bash
TEMPFILE=`mktemp`

# Stop Apache and backup firewall.
/bin/systemctl stop apache2.service
/usr/sbin/iptables-save > $TEMPFILE

# Open up port 80 to the whole world.
/usr/sbin/iptables -D INPUT -j LOGDROP
/usr/sbin/iptables -A INPUT -p tcp --dport 80 -j ACCEPT
/usr/sbin/iptables -A INPUT -j LOGDROP

# Renew all certs.
/usr/bin/certbot renew --quiet

# Restore firewall and restart Apache.
/usr/sbin/iptables -D INPUT -p tcp --dport 80 -j ACCEPT
/usr/sbin/iptables-restore < $TEMPFILE
/bin/systemctl start apache2.service

# Copy certificate into asterisk.
cp /etc/letsencrypt/live/hostname.example.com/privkey.pem /etc/asterisk/asterisk.key
cp /etc/letsencrypt/live/hostname.example.com/fullchain.pem /etc/asterisk/asterisk.cert
chown asterisk:asterisk /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
chmod go-rwx /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key

# Commit changes to etckeeper and restart asterisk.
pushd /etc/ > /dev/null
/usr/bin/git add letsencrypt asterisk
DIFFSTAT="$(/usr/bin/git diff --cached --stat)"
if [ -n "$DIFFSTAT" ] ; then
    /usr/bin/git commit --quiet -m "Renewed letsencrypt certs." letsencrypt asterisk
    echo "$DIFFSTAT"
    /bin/systemctl restart asterisk.service
fi
popd > /dev/null
Copying a GnuBee's root partition onto a new drive

Here is the process I followed when I moved my GnuBee's root partition from one flaky Kingston SSD drive to a brand new Samsung SSD.

It was relatively straightforward, but there are two key points:

  1. Make sure you label the root partition GNUBEE-ROOT.
  2. Make sure you copy the network configuration from the SSD, not the tmpfs mount.

Copy the partition table

First, with both drives plugged in, I replicated the partition table of the first drive (/dev/sda):

# fdisk -l /dev/sda
Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: KINGSTON SA400S3
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 799CD830-526B-42CE-8EE7-8C94EF098D46

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   8390655   8388608     4G Linux swap
/dev/sda2  8390656 234441614 226050959 107.8G Linux filesystem

onto the second drive (/dev/sde):

# fdisk /dev/sde

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xd011eaba.

Command (m for help): g
Created a new GPT disklabel (GUID: 83F70325-5BE0-034E-A9E1-1965FEFD8E9F).

Command (m for help): n
Partition number (1-128, default 1): 
First sector (2048-488397134, default 2048): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-488397134, default 488397134): +4G

Created a new partition 1 of type 'Linux filesystem' and of size 4 GiB.

Command (m for help): t
Selected partition 1
Partition type (type L to list all types): 19
Changed type of partition 'Linux filesystem' to 'Linux swap'.

Command (m for help): n
Partition number (2-128, default 2): 
First sector (8390656-488397134, default 8390656): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (8390656-488397134, default 488397134): 234441614

Created a new partition 2 of type 'Linux filesystem' and of size 107.8 GiB.

Command (m for help): p
Disk /dev/sde: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 860 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 83F70325-5BE0-034E-A9E1-1965FEFD8E9F

Device       Start       End   Sectors   Size Type
/dev/sde1     2048   8390655   8388608     4G Linux swap
/dev/sde2  8390656 234441614 226050959 107.8G Linux filesystem

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

I wasted a large amount of space on the second drive, but that was on purpose in case I decide to later on move to a RAID-1 root partition with the Kingston SSD.

Format the partitions

Second, I formated the new partitions:

# mkswap /dev/sde1 
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=7a85fbce-2493-45c1-a548-4ec6e827ec29

# mkfs.ext4 /dev/sde2 
mke2fs 1.44.5 (15-Dec-2018)
Discarding device blocks: done                            
Creating filesystem with 28256369 4k blocks and 7069696 inodes
Filesystem UUID: 732a76df-d369-4e7b-857a-dd55fd461bbc
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done   

and labeled the root partition so that the GnuBee can pick it up as it boots up:

e2label /dev/sde2 GNUBEE-ROOT

since GNUBEE-ROOT is what uboot will be looking for.

Copy the data over

Finally, I copied the data over from the original drive to the new one:

# umount /etc/network
# mkdir /mnt/root
# mount /dev/sde2 /mnt/root
# rsync -aHx --delete --exclude=/dev/* --exclude=/proc/* --exclude=/sys/* --exclude=/tmp/* --exclude=/mnt/* --exclude=/lost+found/* --exclude=/media/* --exclude=/rom/* /* /mnt/root/
# sync

Note that if you don't unmount /etc/network/, you'll be copying the override provided at boot time instead of the underlying config that's on the root partition. The reason that this matters is that the script that renames the network interfaces to ethblack and ethblue expects specific files in order to produce a working network configuration. If you copy the final modified config files then you end up with an bind-mounted empty directory as /etc/network, and the network interfaces can't be brought up successfully.