Pi‑hole Zero 2 W: Beyond NIC Flapping — The Freeze Mystery

TL;DR — Early on I fixed a flaky USB NIC (Realtek r8152) that caused short dropouts. Weeks later the box began hard freezing. Logs from the previous boot showed SD host timeouts (sdhost-bcm2835 ... timeout waiting for hardware interrupt) and ext4 read stalls. The fix: back up, move to a new endurance SD card, and restore Pi‑hole + Tailscale configs.

If you missed the first post about the USB NIC:
Quick Fix for NIC flapping · Full write‑up


What was happening

  • Pi‑hole (zero.local) would hard freeze every few days; only a power‑cycle revived it.
  • Power & thermals were clean (vcgencmd get_throttled=0x0, temps ~40 °C).
  • The earlier NIC flapping was mitigated (100 Mb/s lock + USB autosuspend off) but freezes kept happening.

How I proved it

  1. Enable persistent logs (so “previous boot” is readable):
    sudo mkdir -p /var/log/journal
    sudo sed -i 's/^#\?Storage=.*/Storage=persistent/' /etc/systemd/journald.conf
    sudo systemctl restart systemd-journald
    
  2. After a freeze + reboot, check the previous boot:
    sudo journalctl --list-boots
    sudo journalctl -b -1 -p 0..3 --no-pager
    sudo journalctl -b -1 -k --no-pager | \
      grep -Ei 'r8152|eth0|carrier|reset high-speed|under-volt|mmc|i/o error|ext4|panic|oom|hung task|soft lockup'
    
  3. The smoking gun showed up repeatedly:
    sdhost-bcm2835 ... timeout waiting for hardware interrupt
    EXT4 ... readpages / mmc_rescan backtraces
    

    No undervoltage, no OOM, no kernel panic → storage I/O stalls.

Back up before it dies

Small, safe backups that matter:

# Pi-hole & Tailscale configs (portable)
sudo tar czf /root/backup/pihole_etc_$(date +%F).tgz \
  /etc/pihole /etc/dnsmasq.d /var/lib/tailscale /etc/hosts /etc/hostname

# Full card image, compressed, streamed to NAS over SSH
sudo dd if=/dev/sdX bs=4M status=progress | gzip | \
  ssh root@nas 'cat > /mnt/archive/pi-backups/pihole-$(date +%F).img.gz'

Tip: move the tarball out of /root so you can scp it as a normal user.

Rebuild & restore (clean slate)

  1. Flash Raspberry Pi OS Lite onto a new A1/A2 or Endurance SD card.
  2. Install Pi‑hole + Tailscale:
    curl -sSL https://install.pi-hole.net | bash
    sudo apt install tailscale -y
    sudo tailscale up --accept-dns=false
    
  3. Restore configs and restart:
    scp pihole_etc_YYYY-MM-DD.tgz admin@newpi:/home/admin/
    ssh admin@newpi 'sudo tar xzf /home/admin/pihole_etc_*.tgz -C /'
    ssh admin@newpi 'sudo systemctl restart pihole-FTL tailscaled'
    

Hardening (nice extras)

  • Reduce SD writes: shorter Pi‑hole query retention; consider tmpfs for logs.
  • Watchdogs (auto‑restart FTL if it dies) and a tiny net probe timer to log WAN/GW loss.
  • Keep the NIC tweaks if you use a USB adapter:
    # Lock to 100 Mb/s and disable EEE if supported
    sudo ethtool -s eth0 speed 100 duplex full autoneg off || true
    sudo ethtool --set-eee eth0 eee off || true
    

Lessons learned

  • Two issues overlapped: NIC flaps (annoying, recoverable) and later SD I/O stalls (fatal).
  • journalctl -b -1 is gold for post‑mortems.
  • SD cards are consumables. Use high‑endurance media for 24/7 boxes like Pi‑hole.

Quick Fix box (copy/paste)

# After reboot from a freeze
sudo journalctl -b -1 -p 0..3 --no-pager
sudo journalctl -b -1 -k --no-pager | grep -Ei 'mmc|sdhost|i/o error|ext4'
# If you see sdhost/ext4 timeouts → back up now and replace the card.