Caution: nerd stuffs
Baby’s first while
loop.
So my server has been rebooting recently for seemingly no reason. I thought I had it nailed down to a collection of corrupt files, but I’m not so sure anymore. In the time being I wanted to monitor when the box goes down, so I conjured up an idea for a simple looping script to send me a notification when my box goes down. I’ve become pretty handy with scripting but never had a reason to use a loop. now I do. And this will leave me an email record of every time the box went down.
This is pretty much how I learn everything. I “know that it exists” but can’t quite commit it to memory just by pining over it. it’s not till I have an actual application for it that I can learn it. I try to implement it and fail several times, and in that process I learn “how shit’s done, s0n.” Code behind the cut if you’re interested.
Script runs from an external machine, obviously. Firewall in my case. the script sends out a handfull of pings, and emails if they fail. the column awk
singles out the value of “packets received”. I shouldn’t get any dropped requests since it’s on a LAN, but presumably I could make it ping say 20 times, and change the test to “-lt 10” to allow for some packet loss. the script then goes into a sleep loop, checking periodically if the device is still down (the commented line is in case I want a log of how many times it missed polls for some stupid reason) without bugging me with emails. once the device comes back up, the test fails and the loop continues as normal, resuming notification next time the device goes down. please excuse the Strongbad reference.
#!/usr/local/bin/bash
# systemdown - a script that pings $target and sends an email when it goes down
target="192.168.2.2"
admin="iggdawg@gmail.com"
while true; do
if [ $(ping -c 3 $target | grep packets | awk '{print $4}') -lt 3 ] ; then
echo "$target is down" | mail -s "The system is down" $admin
while [ $(ping -c 3 $target | grep packets | awk '{print $4}') -lt 3 ] ; do
# echo "$target still down" >&2
sleep 300
done
else
sleep 180
fi
done
exit 0
The system is down