RSS Atom Add a new post titled:
Removing a corrupted data pack in a Restic backup

I recently ran into a corrupted data pack in a Restic backup on my GnuBee. It led to consistent failures during the prune operation:

incomplete pack file (will be removed): b45afb51749c0778de6a54942d62d361acf87b513c02c27fd2d32b730e174f2e
incomplete pack file (will be removed): c71452fa91413b49ea67e228c1afdc8d9343164d3c989ab48f3dd868641db113
incomplete pack file (will be removed): 10bf128be565a5dc4a46fc2fc5c18b12ed2e77899e7043b28ce6604e575d1463
incomplete pack file (will be removed): df282c9e64b225c2664dc6d89d1859af94f35936e87e5941cee99b8fbefd7620
incomplete pack file (will be removed): 1de20e74aac7ac239489e6767ec29822ffe52e1f2d7f61c3ec86e64e31984919
hash does not match id: want 8fac6efe99f2a103b0c9c57293a245f25aeac4146d0e07c2ab540d91f23d3bb5, got 2818331716e8a5dd64a610d1a4f85c970fd8ae92f891d64625beaaa6072e1b84
github.com/restic/restic/internal/repository.Repack
        github.com/restic/restic/internal/repository/repack.go:37
main.pruneRepository
        github.com/restic/restic/cmd/restic/cmd_prune.go:242
main.runPrune
        github.com/restic/restic/cmd/restic/cmd_prune.go:62
main.glob..func19
        github.com/restic/restic/cmd/restic/cmd_prune.go:27
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra/command.go:838
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra/command.go:943
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra/command.go:883
main.main
        github.com/restic/restic/cmd/restic/main.go:86
runtime.main
        runtime/proc.go:204
runtime.goexit
        runtime/asm_amd64.s:1374

Thanks to the excellent support forum, I was able to resolve this issue by dropping a single snapshot.

First, I identified the snapshot which contained the offending pack:

$ restic -r sftp:hostname.local: find --pack 8fac6efe99f2a103b0c9c57293a245f25aeac4146d0e07c2ab540d91f23d3bb5
repository b0b0516c opened successfully, password is correct
Found blob 2beffa460d4e8ca4ee6bf56df279d1a858824f5cf6edc41a394499510aa5af9e
 ... in file /home/francois/.local/share/akregator/Archive/http___udd.debian.org_dmd_feed_
     (tree 602b373abedca01f0b007fea17aa5ad2c8f4d11f1786dd06574068bf41e32020)
 ... in snapshot 5535dc9d (2020-06-30 08:34:41)

Then, I could simply drop that snapshot:

$ restic -r sftp:hostname.local: forget 5535dc9d
repository b0b0516c opened successfully, password is correct
[0:00] 100.00%  1 / 1 files deleted

and run the prune command to remove the snapshot, as well as the incomplete packs that were also mentioned in the above output but could never be removed due to the other error:

$ restic -r sftp:hostname.local: prune
repository b0b0516c opened successfully, password is correct
counting files in repo
building new index for repo
[20:11] 100.00%  77439 / 77439 packs
incomplete pack file (will be removed): b45afb51749c0778de6a54942d62d361acf87b513c02c27fd2d32b730e174f2e
incomplete pack file (will be removed): c71452fa91413b49ea67e228c1afdc8d9343164d3c989ab48f3dd868641db113
incomplete pack file (will be removed): 10bf128be565a5dc4a46fc2fc5c18b12ed2e77899e7043b28ce6604e575d1463
incomplete pack file (will be removed): df282c9e64b225c2664dc6d89d1859af94f35936e87e5941cee99b8fbefd7620
incomplete pack file (will be removed): 1de20e74aac7ac239489e6767ec29822ffe52e1f2d7f61c3ec86e64e31984919
repository contains 77434 packs (2384522 blobs) with 367.648 GiB
processed 2384522 blobs: 1165510 duplicate blobs, 47.331 GiB duplicate
load all snapshots
find data that is still in use for 15 snapshots
[1:11] 100.00%  15 / 15 snapshots
found 1006062 of 2384522 data blobs still in use, removing 1378460 blobs
will remove 5 invalid files
will delete 13728 packs and rewrite 15140 packs, this frees 142.285 GiB
[4:58:20] 100.00%  15140 / 15140 packs rewritten
counting files in repo
[18:58] 100.00%  50164 / 50164 packs
finding old index files
saved new indexes as [340cb68f 91ff77ef ee21a086 3e5fa853 084b5d4b 3b8d5b7a d5c385b4 5eff0be3 2cebb212 5e0d9244 29a36849 8251dcee 85db6fa2 29ed23f6 fb306aba 6ee289eb 0a74829d]
remove 190 old index files
[0:00] 100.00%  190 / 190 files deleted
remove 28868 old packs
[1:23] 100.00%  28868 / 28868 files deleted
done
Recovering from a corrupt MariaDB index page

I ran into a corrupt MariaDB index page the other day and had to restore my MythTV database from the automatic backups I make as part of my regular maintainance tasks.

Signs of trouble

My troubles started when my daily backup failed on this line:

mysqldump --opt mythconverg -umythtv -pPASSWORD > mythconverg-200200923T1117.sql

with this error message:

mysqldump: Error 1034: Index for table 'recordedseek' is corrupt; try to repair it when dumping table `recordedseek` at row: 4059895

Comparing the dump that was just created to the database dumps in /var/backups/mythtv/, it was clear that it was incomplete since it was about 100 MB smaller.

I first tried a gentle OPTIMIZE TABLE recordedseek as suggested in this StackExchange answer but that caused the database to segfault:

mysqld[9141]: 2020-09-23 15:02:46 0 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace mythconverg/recordedseek page [page id: space=115871, page number=11373]. You may have to recover from a backup.
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
mysqld[9141]:  len 16384; hex 06177fa70000...
mysqld[9141]:  C     K     c      {\;
mysqld[9141]: InnoDB: End of page dump
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 102203303, calculated checksums for field1: crc32 806650270, innodb 1139779342,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 102203303, calculated checksums for field2: crc32 806650270, innodb 3322209073, none 3735928559,  page LSN 148 2450029404, low 4 bytes of LSN at page end 2450029404, page number (if stored to page already) 11373, space id (if created with >= MySQL-4.1.1 and stored already) 115871
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Page may be an index page where index id is 697207
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Index 697207 is `PRIMARY` in table `mythconverg`.`recordedseek`
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
mysqld[9141]: 200923 15:02:46 2020-09-23 15:02:46 0 [ERROR] InnoDB: Failed to read file './mythconverg/recordedseek.ibd' at offset 11373: Page read from tablespace is corrupted.
mysqld[9141]: [ERROR] mysqld got signal 11 ;
mysqld[9141]: Core pattern: |/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h ...
kernel: [820233.893658] mysqld[9186]: segfault at 90 ip 0000557a229f6d90 sp 00007f69e82e2dc0 error 4 in mysqld[557a224ef000+803000]
kernel: [820233.893665] Code: c4 20 83 bd e4 eb ff ff 44 48 89 ...
systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
systemd[1]: mariadb.service: Failed with result 'signal'.
systemd-coredump[9240]: Process 9141 (mysqld) of user 107 dumped core.#012#012Stack trace of thread 9186: ...
systemd[1]: mariadb.service: Service RestartSec=5s expired, scheduling restart.
systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 1.
mysqld[9260]: 2020-09-23 15:02:52 0 [Warning] Could not increase number of max_open_files to more than 16364 (request: 32186)
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=638234502026
...
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Recovered page [page id: space=115875, page number=5363] from the doublewrite buffer.
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Starting final batch to recover 2 pages from redo log.
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Waiting for purge to start
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Recovering after a crash using tc.log
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Starting crash recovery...
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Crash recovery finished.

and so I went with the nuclear option of dropping the MythTV database and restoring from backup.

Dropping the corrupt database

First of all, I shut down MythTV as root:

kilall mythfrontend
systemctl stop mythtv-status.service
systemctl stop mythtv-backend.service

and took a full copy of my MariaDB databases just in case:

systemctl stop mariadb.service
cd /var/lib
apack /root/var-lib-mysql-20200923T1215.tgz mysql/
systemctl start mariadb.service

before dropping the MythTV databse (mythconverg):

$ mysql -pPASSWORD

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| mythconverg        |
| performance_schema |
+--------------------+
4 rows in set (0.000 sec)

MariaDB [(none)]> drop database mythconverg;
Query OK, 114 rows affected (25.564 sec)

MariaDB [(none)]> quit
Bye

Restoring from backup

Then I re-created an empty database:

mysql -pPASSWORD < /usr/share/mythtv/sql/mc.sql

and restored the last DB dump prior to the detection of the corruption:

sudo -i -u mythtv
/usr/share/mythtv/mythconverg_restore.pl --directory /var/backups/mythtv --filename mythconverg-1350-20200923010502.sql.gz

In order to restart everything properly, I simply rebooted the machine:

systemctl reboot
Copying a GnuBee's root partition onto a new drive

Here is the process I followed when I moved my GnuBee's root partition from one flaky Kingston SSD drive to a brand new Samsung SSD.

It was relatively straightforward, but there are two key points:

  1. Make sure you label the root partition GNUBEE-ROOT.
  2. Make sure you copy the network configuration from the SSD, not the tmpfs mount.

Copy the partition table

First, with both drives plugged in, I replicated the partition table of the first drive (/dev/sda):

# fdisk -l /dev/sda
Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: KINGSTON SA400S3
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 799CD830-526B-42CE-8EE7-8C94EF098D46

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   8390655   8388608     4G Linux swap
/dev/sda2  8390656 234441614 226050959 107.8G Linux filesystem

onto the second drive (/dev/sde):

# fdisk /dev/sde

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xd011eaba.

Command (m for help): g
Created a new GPT disklabel (GUID: 83F70325-5BE0-034E-A9E1-1965FEFD8E9F).

Command (m for help): n
Partition number (1-128, default 1): 
First sector (2048-488397134, default 2048): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-488397134, default 488397134): +4G

Created a new partition 1 of type 'Linux filesystem' and of size 4 GiB.

Command (m for help): t
Selected partition 1
Partition type (type L to list all types): 19
Changed type of partition 'Linux filesystem' to 'Linux swap'.

Command (m for help): n
Partition number (2-128, default 2): 
First sector (8390656-488397134, default 8390656): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (8390656-488397134, default 488397134): 234441614

Created a new partition 2 of type 'Linux filesystem' and of size 107.8 GiB.

Command (m for help): p
Disk /dev/sde: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 860 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 83F70325-5BE0-034E-A9E1-1965FEFD8E9F

Device       Start       End   Sectors   Size Type
/dev/sde1     2048   8390655   8388608     4G Linux swap
/dev/sde2  8390656 234441614 226050959 107.8G Linux filesystem

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

I wasted a large amount of space on the second drive, but that was on purpose in case I decide to later on move to a RAID-1 root partition with the Kingston SSD.

Format the partitions

Second, I formated the new partitions:

# mkswap /dev/sde1 
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=7a85fbce-2493-45c1-a548-4ec6e827ec29

# mkfs.ext4 /dev/sde2 
mke2fs 1.44.5 (15-Dec-2018)
Discarding device blocks: done                            
Creating filesystem with 28256369 4k blocks and 7069696 inodes
Filesystem UUID: 732a76df-d369-4e7b-857a-dd55fd461bbc
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done   

and labeled the root partition so that the GnuBee can pick it up as it boots up:

e2label /dev/sde2 GNUBEE-ROOT

since GNUBEE-ROOT is what uboot will be looking for.

Copy the data over

Finally, I copied the data over from the original drive to the new one:

# umount /etc/network
# mkdir /mnt/root
# mount /dev/sde2 /mnt/root
# rsync -aHx --delete --exclude=/dev/* --exclude=/proc/* --exclude=/sys/* --exclude=/tmp/* --exclude=/mnt/* --exclude=/lost+found/* --exclude=/media/* --exclude=/rom/* /* /mnt/root/
# sync

Note that if you don't unmount /etc/network/, you'll be copying the override provided at boot time instead of the underlying config that's on the root partition. The reason that this matters is that the script that renames the network interfaces to ethblack and ethblue expects specific files in order to produce a working network configuration. If you copy the final modified config files then you end up with an bind-mounted empty directory as /etc/network, and the network interfaces can't be brought up successfully.

Using a Let's Encrypt TLS certificate with Asterisk 16.2

In order to fix the following error after setting up SIP TLS in Asterisk 16.2:

asterisk[8691]: ERROR[8691]: tcptls.c:966 in __ssl_setup: TLS/SSL error loading cert file. <asterisk.pem>

I created a Let's Encrypt certificate using certbot:

apt install certbot
certbot certonly --standalone -d hostname.example.com

To enable the asterisk user to load the certificate successfuly (it doesn't permission to access to the certificates under /etc/letsencrypt/), I copied it to the right directory:

cp /etc/letsencrypt/live/hostname.example.com/privkey.pem /etc/asterisk/asterisk.key
cp /etc/letsencrypt/live/hostname.example.com/fullchain.pem /etc/asterisk/asterisk.cert
chown asterisk:asterisk /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
chmod go-rwx /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key

Then I set the following variables in /etc/asterisk/sip.conf:

tlscertfile=/etc/asterisk/asterisk.cert
tlsprivatekey=/etc/asterisk/asterisk.key

Automatic renewal

The machine on which I run asterisk has a tricky Apache setup:

  • a webserver is running on port 80
  • port 80 is restricted to the local network

This meant that the certbot domain ownership checks would get blocked by the firewall, and I couldn't open that port without exposing the private webserver to the Internet.

So I ended up disabling the built-in certbot renewal mechanism:

systemctl disable certbot.timer certbot.service
systemctl stop certbot.timer certbot.service

and then writing my own script in /etc/cron.daily/certbot-francois:

#!/bin/bash
TEMPFILE=`mktemp`

# Stop Apache and backup firewall.
/bin/systemctl stop apache2.service
/usr/sbin/iptables-save > $TEMPFILE

# Open up port 80 to the whole world.
/usr/sbin/iptables -D INPUT -j LOGDROP
/usr/sbin/iptables -A INPUT -p tcp --dport 80 -j ACCEPT
/usr/sbin/iptables -A INPUT -j LOGDROP

# Renew all certs.
/usr/bin/certbot renew --quiet

# Restore firewall and restart Apache.
/usr/sbin/iptables -D INPUT -p tcp --dport 80 -j ACCEPT
/usr/sbin/iptables-restore < $TEMPFILE
/bin/systemctl start apache2.service

# Copy certificate into asterisk.
cp /etc/letsencrypt/live/hostname.example.com/privkey.pem /etc/asterisk/asterisk.key
cp /etc/letsencrypt/live/hostname.example.com/fullchain.pem /etc/asterisk/asterisk.cert
chown asterisk:asterisk /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
chmod go-rwx /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
/bin/systemctl restart asterisk.service

# Commit changes to etckeeper.
pushd /etc/ > /dev/null
/usr/bin/git add letsencrypt asterisk
DIFFSTAT="$(/usr/bin/git diff --cached --stat)"
if [ -n "$DIFFSTAT" ] ; then
    /usr/bin/git commit --quiet -m "Renewed letsencrypt certs."
    echo "$DIFFSTAT"
fi
popd > /dev/null
Making an Apache website available as a Tor Onion Service

As part of the #MoreOnionsPorFavor campaign, I decided to follow brave.com's lead and make my homepage available as a Tor onion service.

Tor daemon setup

I started by installing the Tor daemon locally:

apt install tor

and then setting the following in /etc/tor/torrc:

SocksPort 0
SocksPolicy reject *
HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 80 [2600:3c04::f03c:91ff:fe8c:61ac]:80
HiddenServicePort 443 [2600:3c04::f03c:91ff:fe8c:61ac]:443
HiddenServiceVersion 3
HiddenServiceNonAnonymousMode 1
HiddenServiceSingleHopMode 1

in order to create a version 3 onion service without actually running a Tor relay.

Note that since I am making a public website available over Tor, I do not need the location of the website to be hidden and so I used the same settings as Cloudflare in their public Tor proxy.

Also, I explicitly used the external IPv6 address of my server in the configuration in order to prevent localhost bypasses.

After restarting the Tor daemon to reload the configuration file:

systemctl restart tor.service

I looked for the address of my onion service:

$ cat /var/lib/tor/hidden_service/hostname 
ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion

Apache configuration

Next, I enabled a few required Apache modules:

a2enmod mpm_event
a2enmod http2
a2enmod headers

and configured my Apache vhosts in /etc/apache2/sites-enabled/www.conf:

<VirtualHost *:443>
    ServerName fmarier.org
    ServerAlias ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion

    Protocols h2, http/1.1
    Header set Onion-Location "http://ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion%{REQUEST_URI}s"
    Header set alt-svc 'h2="ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion:443"; ma=315360000; persist=1'
    Header add Strict-Transport-Security: "max-age=63072000"

    Include /etc/fmarier-org/www-common.include

    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/fmarier.org/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/fmarier.org/privkey.pem
</VirtualHost>

<VirtualHost *:80>
    ServerName fmarier.org
    Redirect permanent / https://fmarier.org/
</VirtualHost>

<VirtualHost *:80>
    ServerName ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion
    Include /etc/fmarier-org/www-common.include
</VirtualHost>

Note that /etc/fmarier-org/www-common.include contains all of the configuration options that are common to both the HTTP and the HTTPS sites (e.g. document root, caching headers, aliases, etc.).

Finally, I restarted Apache:

apache2ctl configtest
systemctl restart apache2.service

Testing

In order to test that my website is correctly available at its .onion address, I opened the following URLs in a Brave Tor window:

I also checked that the main URL (https://fmarier.org/) exposes a working Onion-Location header which triggers the display of a button in the URL bar (recently merged and available in Brave Nightly):

Testing that the Alt-Svc is working required using the Tor Browser since that's not yet supported in Brave:

  1. Open https://fmarier.org.
  2. Wait 30 seconds.
  3. Reload the page.

On the server side, I saw the following:

2a0b:f4c2:2::1 - - [14/Oct/2020:02:42:20 +0000] "GET / HTTP/2.0" 200 2696 "-" "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0"
2600:3c04::f03c:91ff:fe8c:61ac - - [14/Oct/2020:02:42:53 +0000] "GET / HTTP/2.0" 200 2696 "-" "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0"

That first IP address is from a Tor exit node:

$ whois 2a0b:f4c2:2::1
...
inet6num:       2a0b:f4c2::/40
netname:        MK-TOR-EXIT
remarks:        -----------------------------------
remarks:        This network is used for Tor Exits.
remarks:        We do not have any logs at all.
remarks:        For more information please visit:
remarks:        https://www.torproject.org

which indicates that the first request was not using the .onion address.

The second IP address is the one for my server:

$ dig +short -x 2600:3c04::f03c:91ff:fe8c:61ac
hafnarfjordur.fmarier.org.

which indicates that the second request to Apache came from the Tor relay running on my server, hence using the .onion address.

Upgrading from Ubuntu 18.04 bionic to 20.04 focal

I recently upgraded from Ubuntu 18.04.5 (bionic) to 20.04.1 (focal) and it was one of the roughest Ubuntu upgrades I've gone through in a while. Here are the notes I took on avoiding or fixing the problems I ran into.

Preparation

Before going through the upgrade, I disabled the configurations which I know interfere with the process:

  • Enable etckeeper auto-commits before install by putting the following in /etc/etckeeper/etckeeper.conf:

      AVOID_COMMIT_BEFORE_INSTALL=0
    
  • Remount /tmp as exectuable:

      mount -o remount,exec /tmp
    

Another step I should have taken but didn't, was to temporarily remove safe-rm since it caused some problems related to a Perl upgrade happening at the same time:

apt remove safe-rm

Network problems

After the upgrade, my network settings weren't really working properly and so I started by switching from ifupdown to netplan.io which seems to be the preferred way of configuring the network on Ubuntu now.

Then I found out that netplan.io is not automatically enabling the systemd-resolved handling of .local hostnames.

I would be able to resolve a hostname using avahi:

$ avahi-resolve --name machine.local
machine.local   192.168.1.5

but not with systemd:

$ systemd-resolve machine.local
machine.local: resolve call failed: 'machine.local' not found

$ resolvectl mdns
Global: no
Link 2 (enp4s0): no

The best solution I found involves keeping systemd-resolved and its /etc/resolv.conf symlink to /run/systemd/resolve/stub-resolv.conf.

I added the following in a new /etc/NetworkManager/conf.d/mdns.conf file:

[connection]
connection.mdns=1

which instructs NetworkManager to resolve mDNS on all network interfaces it manages but not register a hostname since that's done by avahi-daemon.

Then I enabled mDNS globally in systemd-resolved by setting the following in /etc/systemd/resolved.conf:

MulticastDNS=yes

before restarting both services:

systemctl restart NetworkManager.service systemd-resolved.service

With that in place, .local hostnames are resolved properly and I can see that mDNS is fully enabled:

$ resolvectl mdns
Global: yes
Link 2 (enp4s0): yes

Boot problems

For some reason I was able to boot with the kernel I got as part of the focal update, but a later kernel update rendered my machine unbootable.

Adding some missing RAID-related modules to /etc/initramfs-tools/modules:

raid1
dmraid
md-raid1

and then re-creating all initramfs:

update-initramfs -u -k all

seemed to do the trick.

Repairing a corrupt ext4 root partition

I ran into filesystem corruption (ext4) on the root partition of my backup server which caused it to go into read-only mode. Since it's the root partition, it's not possible to unmount it and repair it while it's running. Normally I would boot from an Ubuntu live CD / USB stick, but in this case the machine is using the mipsel architecture and so that's not an option.

Repair using a USB enclosure

I had to pull the shutdown the server and then pull the SSD drive out. I then moved it to an external USB enclosure and connected it to my laptop.

I started with an automatic filesystem repair:

fsck.ext4 -pf /dev/sde2

which failed for some reason and so I moved to an interactive repair:

fsck.ext4 -f /dev/sde2

Once all of the errors were fixed, I ran a full surface scan to update the list of bad blocks:

fsck.ext4 -c /dev/sde2

Finally, I forced another check to make sure that everything was fixed at the filesystem level:

fsck.ext4 -f /dev/sde2

Fix invalid alternate GPT

The other thing I noticed is this messge in my dmesg log:

scsi 8:0:0:0: Direct-Access     KINGSTON  SA400S37120     SBFK PQ: 0 ANSI: 6
sd 8:0:0:0: Attached scsi generic sg4 type 0
sd 8:0:0:0: [sde] 234441644 512-byte logical blocks: (120 GB/112 GiB)
sd 8:0:0:0: [sde] Write Protect is off
sd 8:0:0:0: [sde] Mode Sense: 31 00 00 00
sd 8:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 8:0:0:0: [sde] Optimal transfer size 33553920 bytes
Alternate GPT is invalid, using primary GPT.
 sde: sde1 sde2

I therefore checked to see if the partition table looked fine and got the following:

$ fdisk -l /dev/sde
GPT PMBR size mismatch (234441643 != 234441647) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.
Disk /dev/sde: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: KINGSTON SA400S3
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 799CD830-526B-42CE-8EE7-8C94EF098D46

Device       Start       End   Sectors   Size Type
/dev/sde1     2048   8390655   8388608     4G Linux swap
/dev/sde2  8390656 234441614 226050959 107.8G Linux filesystem

It turns out that all I had to do, since only the backup / alternate GPT partition table was corrupt and the primary one was fine, was to re-write the partition table:

$ fdisk /dev/sde

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

GPT PMBR size mismatch (234441643 != 234441647) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.

Command (m for help): w

The partition table has been altered.
Syncing disks.

Run SMART checks

Since I still didn't know what caused the filesystem corruption in the first place, I decided to do one last check: SMART errors.

I couldn't do this via the USB enclosure since the SMART commands aren't forwarded to the drive and so I popped the drive back into the backup server and booted it up.

First, I checked whether any SMART errors had been reported using smartmontools:

smartctl -a /dev/sda

That didn't show any errors and so I kicked off an extended test:

smartctl -t long /dev/sda

which ran for 30 minutes and then passed without any errors.

The mystery remains unsolved.

Setting up and testing an NPR modem on Linux

After acquiring a pair of New Packet Radio modems on behalf of VECTOR, I set it up on my Linux machine and ran some basic tests to check whether it could achieve the advertised 500 kbps transfer rates, which are much higher than AX25) packet radio.

The exact equipment I used was:

Radio setup

After connecting the modems to the power supply and their respective antennas, I connected both modems to my laptop via micro-USB cables and used minicom to connect to their console on /dev/ttyACM[01]:

minicom -8 -b 921600 -D /dev/ttyACM0
minicom -8 -b 921600 -D /dev/ttyACM1

To confirm that the firmware was the latest one, I used the following command:

ready> version
firmware: 2020_02_23
freq band: 70cm

then I immediately turned off the radio:

radio off

which can be verified with:

status

Following the British Columbia 70 cm band plan, I picked the following frequency, modulation (bandwidth of 360 kHz), and power (0.05 W):

set frequency 433.500
set modulation 22
set RF_power 7

and then did the rest of the configuration for the master:

set callsign VA7GPL_0
set is_master yes
set DHCP_active no
set telnet_active no

and the client:

set callsign VA7GPL_1
set is_master no
set DHCP_active yes
set telnet_active no

and that was enough to get the two modems to talk to one another.

On both of them, I ran the following:

save
reboot

and confirmed that they were able to successfully connect to each other:

who

Monitoring RF

To monitor what is happening on the air and quickly determine whether or not the modems are chatting, you can use a software-defined radio along with gqrx with the following settings:

frequency: 433.500 MHz
filter width: user (80k)
filter shape: normal
mode: Raw I/Q

I found it quite helpful to keep this running the whole time I was working with these modems. The background "keep alive" sounds are quite distinct from the heavy traffic sounds.

IP setup

The radio bits out of the way, I turned to the networking configuration.

On the master, I set the following so that I could connect the master to my home network (192.168.1.0/24) without conflicts:

set def_route_active yes
set DNS_active no
set modem_IP 192.168.1.254
set IP_begin 192.168.1.225
set master_IP_size 29
set netmask 255.255.255.0

(My router's DHCP server is configured to allocate dynamic IP addresses from 192.168.1.100 to 192.168.1.224.)

At this point, I connected my laptop to the client using a CAT-5 network cable and the master to the ethernet switch, essentially following Annex 5 of the Advanced User Guide.

My laptop got assigned IP address 192.168.1.225 and so I used another computer on the same network to ping my laptop via the NPR modems:

ping 192.168.1.225

This gave me a round-trip time of around 150-250 ms.

Performance test

Having successfully established an IP connection between the two machines, I decided to run a quick test to measure the available bandwidth in an ideal setting (i.e. the two antennas very close to each other).

On both computers, I installed iperf:

apt install iperf

and then setup the iperf server on my desktop computer:

sudo iptables -A INPUT -s 192.168.1.0/24 -p TCP --dport 5001 -j ACCEPT
sudo iptables -A INPUT -s 192.168.1.0/24 -u UDP --dport 5001 -j ACCEPT
iperf --server

On the laptop, I set the MTU to 750 in NetworkManager:

and restarted the network.

Then I created a new user account (npr with a uid of 1001):

sudo adduser npr

and made sure that only that account could access the network by running the following as root:

# Flush all chains.
iptables -F

# Set defaults policies.
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP

# Don't block localhost and ICMP traffic.
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p icmp -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

# Don't re-evaluate already accepted connections.
iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

# Allow connections to/from the test user.
iptables -A OUTPUT -m owner --uid-owner 1001 -m conntrack --ctstate NEW -j ACCEPT

# Log anything that gets blocked.
iptables -A INPUT -j LOG
iptables -A OUTPUT -j LOG
iptables -A FORWARD -j LOG

then I started the test as the npr user:

sudo -i -u npr
iperf --client 192.168.1.8

Results

The results were as good as advertised both with modulation 22 (360 kHz bandwidth):

$ iperf --client 192.168.1.8 --time 30
------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58462 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-34.5 sec  1.12 MBytes   274 Kbits/sec

------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58468 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-42.5 sec  1.12 MBytes   222 Kbits/sec

------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58484 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-38.5 sec  1.12 MBytes   245 Kbits/sec

and modulation 24 (1 MHz bandwitdh):

$ iperf --client 192.168.1.8 --time 30
------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58148 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-31.1 sec  1.88 MBytes   506 Kbits/sec

------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58246 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-30.5 sec  2.00 MBytes   550 Kbits/sec

------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58292 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-30.0 sec  2.00 MBytes   559 Kbits/sec

Setting the default web browser on Debian and Ubuntu

If you are wondering what your default web browser is set to on a Debian-based system, there are several things to look at:

$ xdg-settings get default-web-browser
brave-browser.desktop

$ xdg-mime query default x-scheme-handler/http
brave-browser.desktop

$ xdg-mime query default x-scheme-handler/https
brave-browser.desktop

$ ls -l /etc/alternatives/x-www-browser
lrwxrwxrwx 1 root root 29 Jul  5  2019 /etc/alternatives/x-www-browser -> /usr/bin/brave-browser-stable*

$ ls -l /etc/alternatives/gnome-www-browser
lrwxrwxrwx 1 root root 29 Jul  5  2019 /etc/alternatives/gnome-www-browser -> /usr/bin/brave-browser-stable*

Debian-specific tools

The contents of /etc/alternatives/ is system-wide defaults and must therefore be set as root:

sudo update-alternatives --config x-www-browser
sudo update-alternatives --config gnome-www-browser

The sensible-browser tool (from the sensible-utils package) will use these to automatically launch the most appropriate web browser depending on the desktop environment.

Standard MIME tools

The others can be changed as a normal user. Using xdg-settings:

xdg-settings set default-web-browser brave-browser-beta.desktop

will also change what the two xdg-mime commands return:

$ xdg-mime query default x-scheme-handler/http
brave-browser-beta.desktop

$ xdg-mime query default x-scheme-handler/https
brave-browser-beta.desktop

since it puts the following in ~/.config/mimeapps.list:

[Default Applications]
text/html=brave-browser-beta.desktop
x-scheme-handler/http=brave-browser-beta.desktop
x-scheme-handler/https=brave-browser-beta.desktop
x-scheme-handler/about=brave-browser-beta.desktop
x-scheme-handler/unknown=brave-browser-beta.desktop

Note that if you delete these entries, then the system-wide defaults, defined in /etc/mailcap, will be used, as provided by the mime-support package.

Changing the x-scheme-handler/http (or x-scheme-handler/https) association directly using:

xdg-mime default brave-browser-nightly.desktop x-scheme-handler/http

will only change that particular one. I suppose this means you could have one browser for insecure HTTP sites (hopefully with HTTPS Everywhere installed) and one for HTTPS sites though I'm not sure why anybody would want that.

Summary

In short, if you want to set your default browser everywhere (using Brave in this example), do the following:

sudo update-alternatives --config x-www-browser
sudo update-alternatives --config gnome-www-browser
xdg-settings set default-web-browser brave-browser.desktop
Extending GPG key expiry

Extending the expiry on a GPG key is not very hard, but it's easy to forget a step. Here's how I did my last expiry bump.

Update the expiry on the main key and the subkey:

gpg --edit-key KEYID
> expire
> key 1
> expire
> save

Upload the updated key to the keyservers:

gpg --export KEYID | curl -T - https://keys.openpgp.org
gpg --keyserver keyring.debian.org --send-keys KEYID