How I Audited 41 Servers for CVE-2026-31431 "Copy Fail"

Last week, Canonical disclosed CVE-2026-31431, nicknamed “Copy Fail” — a HIGH severity (CVSS 7.8) local privilege escalation in the Linux kernel. Any unprivileged local user on an affected system can gain full root access. If you run Ubuntu servers, this one matters.

I manage 41 Ubuntu servers across versions 20.04, 22.04, and 24.04. I spent an evening auditing every single one, found one actively exposed, applied the mitigation, and ran a forensic compromise check. This post documents the entire process.

What Is CVE-2026-31431 “Copy Fail”?

The vulnerability lives in algif_aead — a Linux kernel module that exposes AEAD (Authenticated Encryption with Associated Data) cryptographic operations to userspace via AF_ALG sockets. Think AES-GCM, AES-CCM, ChaCha20-Poly1305 — the ciphers that secure TLS, WireGuard, and disk encryption.

How the Exploit Works

The attack chains three kernel components together:

AF_ALG Socket: An unprivileged user opens a kernel crypto socket bound to authencesn(hmac(sha256),cbc(aes)) and performs AEAD decryption.
Splice Mechanism: The exploit uses splice() to feed page cache pages from readable system files (like /usr/bin/su) directly into the crypto subsystem’s scatterlist — without copying the data.
Scratch Space Abuse: During decryption, authencesn performs a 4-byte write past the AEAD tag into adjacent page cache pages, crossing into the target binary’s cached memory.

The attacker constructs multiple sendmsg/splice pairs, each writing 4 bytes of shellcode into the in-memory image of a system binary, then executes it. Root gained.

Why It’s Stealthy

This is what makes “Copy Fail” particularly nasty: the on-disk file is never modified. The exploit corrupts only the kernel’s page cache — the in-memory representation of the file. The dirty page is never written back to disk. Traditional file integrity tools checking on-disk checksums will find nothing wrong.

Which Ubuntu Versions Are Affected?

Ubuntu Version	Status
20.04 LTS (Focal)	✅ Affected
22.04 LTS (Jammy)	✅ Affected
24.04 LTS (Noble)	✅ Affected
26.04 (Resolute)	❌ Not affected

All three Ubuntu LTS versions I run in production are affected. The patched kernel from Canonical is rolling out — until it arrives, the recommended interim fix is to disable the vulnerable module.

Auditing 41 Servers in Under 3 Minutes

The first question: is the algif_aead module even loaded? It only loads on-demand — if nothing has ever opened an AF_ALG socket requesting AEAD operations, the server isn’t exposed even though the vulnerability exists in the kernel.

# Check if the vulnerable module is loaded
grep -qE '^algif_aead ' /proc/modules && echo "STATUS: LOADED (VULNERABLE)" || echo "STATUS: NOT LOADED (safe)"

# Check for active AF_ALG sockets
ss -a | grep alg

# Check if anything is using it
lsof 2>/dev/null | grep algif

# Check if OpenSSL is configured to use AF_ALG engine
openssl engine -v af_alg 2>/dev/null && echo "AF_ALG engine present" || echo "No AF_ALG engine"

I ran this across all 41 servers simultaneously using parallel SSH subshells:

for IP in "${SERVERS[@]}"; do
    (
        RESULT=$(ssh root@$IP 'grep -qE "^algif_aead " /proc/modules && echo "LOADED" || echo "safe"' 2>&1)
        echo "$IP: $RESULT"
    ) &
done
wait

Results: 40 Safe, 1 Exposed

Result	Count
NOT LOADED (safe)	40 servers
LOADED (vulnerable)	1 server

One server came back exposed — a busy Ubuntu 24.04 machine running dozens of Docker containers and CI/CD runners. The module was loaded but — critically — zero active AF_ALG sockets were open and no processes were using it. The attack surface existed, but it hadn’t been touched.

Workloads That Actually Use algif_aead

Before disabling anything, it’s worth understanding what legitimately depends on this module. The answer for most web stacks: very little.

Workload	Risk	Why
IPSec / StrongSwan	High	Heavy AEAD user via kernel crypto
WireGuard	Medium	May use kernel crypto path
OpenSSL with AF_ALG engine	Medium	Only if engine explicitly configured
OpenVPN	Low	Uses userspace OpenSSL, not AF_ALG sockets
Nginx / Apache with TLS	Low	Only if AF_ALG engine enabled in OpenSSL
Standard web apps	None	No direct AF_ALG usage

I also run several OpenVPN servers. All were completely safe — OpenVPN handles its crypto entirely in userspace via OpenSSL and never touches AF_ALG sockets. Verified by checking /proc/modules, ss -a | grep alg, and the OpenSSL engine on each one. All clean.

Applying the Mitigation

Until the patched kernel ships, disable algif_aead entirely. No reboot required.

Step 1 — Block it from loading on boot

echo "install algif_aead /bin/false" | sudo tee /etc/modprobe.d/disable-algif_aead.conf

This tells modprobe: whenever anything requests algif_aead, run /bin/false instead. It exits with failure every time, so the module never loads — even if another module or the kernel itself tries to pull it in.

Step 2 — Unload from the running kernel

sudo rmmod algif_aead 2>/dev/null

Step 3 — Verify

grep -qE '^algif_aead ' /proc/modules \
  && echo "Module is STILL loaded" \
  || echo "Module is NOT loaded — safe"

Checking for Compromise

Because “Copy Fail” only corrupts the page cache — never disk — standard file integrity tools won’t catch it. A reboot clears the page cache. But if an attacker already gained root and established persistence before the reboot, that survives. So I checked for persistence artifacts.

Page Cache Integrity Check

Compare the in-memory hash of critical binaries against their on-disk hash. A mismatch is the smoking gun:

for bin in /usr/bin/su /usr/bin/sudo /bin/bash /usr/bin/passwd; do
    disk_md5=$(md5sum "$bin" | awk '{print $1}')
    mem_md5=$(cat "$bin" | md5sum | awk '{print $1}')
    if [ "$disk_md5" = "$mem_md5" ]; then
        echo "$bin: OK"
    else
        echo "$bin: MISMATCH — POSSIBLE COMPROMISE"
    fi
done

Persistence Indicators

# Only root should have UID 0
awk -F: '$3==0 {print}' /etc/passwd

# Check for unauthorized SSH keys
cat /root/.ssh/authorized_keys

# New systemd services created recently
find /etc/systemd /lib/systemd -name "*.service" -newer /etc/hostname -ls

# Suspicious files dropped in temp directories
find /tmp /dev/shm /var/tmp -type f -newer /proc/1/exe -ls

# Who is currently logged in and recent login history
who && last | head -20

My Results

Check	Result
Page cache integrity (su, sudo, bash, passwd)	✅ All match disk
UID 0 accounts	✅ Only root
Unauthorized SSH keys	✅ All known team keys
New systemd services	✅ All pre-existing
Active AF_ALG sockets (ever opened)	✅ None
Suspicious temp files	✅ Explained (eBPF artifacts from a CI/CD pipeline)

Verdict: Not compromised. The module was loaded opportunistically but never actively exploited.

Clearing the Page Cache Fleet-Wide

As a final step, I flushed the page cache on all 41 servers. This ensures any potential in-memory corruption — even undetected — is wiped clean. Files will be re-read from disk on next access. It’s safe, causes no data loss, and takes milliseconds:

sync; echo 3 > /proc/sys/vm/drop_caches

41/41 confirmed OK.

Summary

Action	Status
Audited all 41 servers for module exposure	✅ Done
Identified 1 exposed server	✅ Done
Verified VPN servers are safe	✅ Done
Applied interim mitigation on exposed server	✅ Done
Forensic compromise check	✅ Clean
Flushed page cache fleet-wide	✅ Done
Patched kernel	⏳ Pending — auto-update scheduled

Quick Reference — All Commands

# 1. Check if module is loaded
grep -qE '^algif_aead ' /proc/modules && echo "VULNERABLE" || echo "safe"

# 2. Block from loading on boot
echo "install algif_aead /bin/false" | sudo tee /etc/modprobe.d/disable-algif_aead.conf

# 3. Unload from running kernel
sudo rmmod algif_aead 2>/dev/null

# 4. Verify it's gone
grep -qE '^algif_aead ' /proc/modules && echo "Still loaded" || echo "Safe"

# 5. Flush page cache (safe, no data loss)
sync; echo 3 > /proc/sys/vm/drop_caches

# 6. Permanent fix — apply patched kernel then reboot
sudo apt update && sudo apt install --only-upgrade linux-image-generic

References: Ubuntu Security Blog · Ubuntu CVE Tracker · xint.io Technical Analysis

If you are running these servers on AWS EC2, you may also need to expand disk capacity as part of routine maintenance — see the guide on how to resize an AWS EBS volume without downtime.

How I Audited 41 Ubuntu Servers for CVE-2026-31431 “Copy Fail” — and What I Found