I had a bit of a scare this week. My setup is HAOS running on Proxmox. I have a Sonoff USB Zigbee gateway. (also a coral for Frigate, and a USB SSD attached to Proxmox)
Friday night, the server stops for no reason. I dig it out from the cupboard and I can hear the fan short cycling. I disconnect everything and take it to a screen so I can see what’s happening - it boots fine, WTH?
Must be a USB thing. Add them back one by one and when I connect the gateway back problem is back. Now I get worried. Switch USB port and remap to HAOS and boom! back up and running. Panic over, cold house (radiators are zigbee) and angry wife and children avoided.
All of which has lead me to consider that my HA set up is really ‘Mission Critical’ and I need some recovery strategies beyond a daily backup. I think the gateway can be swapped but I’m not sure if the key to the zigbee mesh is hardware encoded or software.
This is the question - What are your recovery strategies? Do they include hardware or just software? I’m thinking maybe I need a second dongle and a couple of low powered machines in the Proxmox cluster. I won’t be able to get my homely back up immediately, but if I can get HA running again on a different node with a backup dongle I’d be OK.
This reminds me that it’s a new month, and time for a backup. Thanks!
As long as you have backups, the gateway can be swapped, and the Zigbee gateway swapped without a problem if using ZHA or Z2M by having the software run it’s built-in recovery steps.
I haven’t run into any other issues in the hardware<>software configs that concern me personally. I have a bunch of Matter devices now, so we’ll see what a recovery looks like when it comes time for that.
I was thinking about matter yesterday, I like the idea of being able to have multiple controllers. My house is half wifi devices and half zigbee. I’d been favouring zigbee recently because I don’t want to swamp my network with device packets, but maybe that needs a rethink. At the very least my wifi devices all have esp home configs that could be configured to fall back to defaults.
I don’t really have an recovery strategy in place, but what I do have is that all the smart stuff in my home can be controlled manually too. Light switches work just like dumb ones, thermostats have manual buttons and so on. So even if the server goes down I can still control everything manually. Obviously automations won’t work, but the house isn’t crippled if that single raspberry decides to go belly up.
I’ve got a decent number of local manual controls, but not all of them. For example, some of my wall switches operate the relay because they are just turning on and off the power. Others I have disabled the relay on because the lights themselves are WW/CW tuneable and HA controls the colour during the day.
I’m wondering about having another look at zigbee groups and commands for the simpler automations in the house. I avoided these because they aren’t really visible to HA and I didn’t like having two automation ‘languages’ at the same time.
Overall, how long do you think you could cope without your HA platform before it becomes an issue?
Overall, how long do you think you could cope without your HA platform before it becomes an issue?
It will never become an issue. As I mentioned, all the smart things I have can still be controlled manually. Sure, things like timing energy consumption to cheaper hours and turning on outside lights when it gets dark either stop working or needs to be manually controlled, but it would be more an annoyance than a issue.
And when planning for expansions I’m pretty strict that things stay that way. Everything has to work without HA, internet connectivity or anything at all besides obviously having electricity. Automations are just icing on the cake and they can save a few bucks here and there and offer quality of life functionality, but I’d never rely on those alone. Manual override has to be always an option.