Community discussions

MikroTik App
 
Josephny
Member
Member
Topic Author
Posts: 495
Joined: Tue Sep 20, 2022 12:11 am

Watchdog, or alternative?

Thu Mar 28, 2024 11:53 am

I have one location that occasionally loses power and (I don't know why but) when the power is restored, connectivity to the MT device is not.

It requires physically going to the location and power cycling the MT device.

I was thinking of using Watchdog to have the device reboot itself. Maybe use a public DNS server as the target IP to test for connectivity.

Is this the best way, or is there a better alternative?

Thanks!
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 3591
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: Watchdog, or alternative?

Thu Mar 28, 2024 4:09 pm

I think watchdog with ping is a good idea if you have dynamic WANs like PPPoE or LTE. BUT you do NOT want to set it's timeout too aggressively (e.g. use 3-10 minutes) — otherwise if it take a while to reboot/connect/etc... it could reboot again before it finishes. And you have some time to access to fix some config error remotely if it's just the ping check that fails...

What you may want to do is enable supout & email notification on it — that track when this happens & you'd have some data from the supout.rif provided via email. This may provide some clues on the why/when it's happening.
 
Josephny
Member
Member
Topic Author
Posts: 495
Joined: Tue Sep 20, 2022 12:11 am

Re: Watchdog, or alternative?

Thu Mar 28, 2024 4:12 pm

Great, I'll do exactly that.

Thank you.
 
Josephny
Member
Member
Topic Author
Posts: 495
Joined: Tue Sep 20, 2022 12:11 am

Re: Watchdog, or alternative?

Sat Mar 30, 2024 3:23 pm

Does this look good:
/system watchdog
set auto-send-supout=yes ping-start-after-boot=10m ping-timeout=2m send-email-from=joseph@xxxxx.com \
    send-email-to=joseph@xxxx.com send-smtp-server=smtp.gmail.com watch-address=1.1.1.1
Do I understand correctly that by including the watch-address I have enabled both the "system unresponsive" watchdog as well as the "unable to ping 1.1.1.1" watchdog?

I chose 10m for the start of each of the 6 pings necessary to cause a reboot as an intentionally high number. My understanding is that this means that the router would have to be unable to ping 1.1.1.1 for 60 consecutive minutes before the "ping" watchdog causes the router to reboot itself. Is this correct?

Thanks.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 19610
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Watchdog, or alternative?

Sat Mar 30, 2024 3:35 pm

How long does power go out? A UPS may be better.
 
Josephny
Member
Member
Topic Author
Posts: 495
Joined: Tue Sep 20, 2022 12:11 am

Re: Watchdog, or alternative?

Sat Mar 30, 2024 3:48 pm

LOL! Yes indeed, a UPS is a good thing.

The problem is that I don't know what exactly is going on.

I just had someone go again this morning to power cycle the entire network environment, and now it is up and running.

Just trying to add as many layers of protection as possible.

Disagree?
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 3591
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: Watchdog, or alternative?

Sat Mar 30, 2024 4:06 pm

Fair point @anav. More LTE can be finicky for a variety of reasons. And watchdog will capture the logs in supout.rif – kinda ready for a support case since LTE should recover after a hard powerfail. I kinda view watchdog as something that really shouldn't happen. But with LTE, there is a lot of things going on between RouterOS and the internet.

On this point, the most common why LTE interface don't "come back" is version mismatches (or running older version that may have some bug etc). All three — RouterOS software, RouterBOOT "BIOS", and LTE firmware — should be at stable, unless you have good reason not to be. That may be why it's not recovering. And other problem include modem has some issue with a specific carrier, etc. etc. or protocol/modem logic errors.

Yes, by adding a watch-address that enables the ping feature.

I think your setting look fine. What the ping-timeout=2m does is send 6 pings over two minutes. After 6 failed pings in a row (so 2 minutes), it will reboot. So basically every 12 minutes it reboot if LTE/ping wasn't not working. You might lower ping-start-after-boot since LTE should take under 1 minute to come up in normal case, perhaps ping-start-after-boot=5m. That mean reboot every 7 minutes (5m + 2m).

Basically you want to make you have enough time to access the route IF LTE worked... but just ping to 1.1.1.1 fails (say route to CloudFlare was down/misconfigured/etc or throttled ping).
 
Josephny
Member
Member
Topic Author
Posts: 495
Joined: Tue Sep 20, 2022 12:11 am

Re: Watchdog, or alternative?

Sat Mar 30, 2024 4:18 pm

My situation does not use LTE. Mine are Verizon FIOS and Charter Spectrum cable connections.

I'd really like to play around with LTE connectivity, but I could never get a clear answer that there is a straightforward way to get this done here in the USA.

But now I'm confused about how the ping-timeout works.

Do I understand correctly that the ping-timeout value is the total time during which the watch-address is not pingable before the MT device reboots? And, that it will try 6 times during that period?

And, ping-start-after-boot is the amount of time before the first ping-timeout clock starts ticking?

So, with (for example), a ping-start-after-boot of 10m and a ping-timeout of 2m, the system will wait 10m after a boot, then start pinging the watch-address. If after 6 attempts in 2 minutes the watch-address does not respond to the ping, the system reboots. Correct?
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 3591
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: Watchdog, or alternative?

Sat Mar 30, 2024 5:20 pm

So, with (for example), a ping-start-after-boot of 10m and a ping-timeout of 2m, the system will wait 10m after a boot, then start pinging the watch-address. If after 6 attempts in 2 minutes the watch-address does not respond to the ping, the system reboots. Correct?
Yup.

My situation does not use LTE. Mine are Verizon FIOS and Charter Spectrum cable connections.
I get confused on the posts sometimes. Just generally LTE is what needs watchdog help more. Watchdog is still a good idea IMO.

But that's potential clue. DOCSIS cable modems typically go through some 192.168.100.x local network and need to register the MAC address. I don't know which comes up first, but I suspect it some time issue when power comes backup. Still it should recover I'd imagine...

FIOS I have less of clue, AFAIK the ONT passes a public IP or subnet without much complication. But been a while with Verizon FIOS.

I'd really like to play around with LTE connectivity, but I could never get a clear answer that there is a straightforward way to get this done here in the USA.
The Mikrotik LTE units work with AT&T, or best the can since their older modem (on AT&T, the -LTE6-US things will do CA with Band 2+12). T-Mobile should work too just less bands overlapping. And Verizon, well, not so much (no Band 13 and Verizon won't turn on a new line using the Mikrotik LTE modem's IMEI#). Note the refreshed 2023 and 2024 LTE models will NOT work in the US. Only the ones marked US.

You can use your own modem in devices with miniPCIe, which is what I do, but it PITA for just one or two.

On routers with USB, some carrier hotspots will work to add a LTE modem via USB and be picked up as an LTE interface.

Who is online

Users browsing this forum: Amazon [Bot], okw and 33 guests