Monitoring

monitoring of the pi park

What happened before

I tried several apps like PiHelper or ServerAlarms. I installed Nagios. All of them can show nice dashboards with gauges for temperature, cpu usage, memory. Some can show a lot more. The thing they have in common is that they require me to navigate somewhere, login somewhere. Then I do that an I see all is fine.

What I want

I prefer a little red dot with a white number in it that indicates: come and take a look here, you need to do something here. But dont show the annoying red thingy when nothing is wrong. monitoring

What is now the solution

It never happened that the temperature was too high, and also, I would not know what to do. So stick to the topics that matter: * get an alert when a raspberry pi can not be reached * get an alert when a process is not active * get an alert when a logfile contains error or is supposed to have a last modification date of yesterday, but it is older.

On every Pi create a program for pinging the other Raspberries.

import subprocess

def pingfunction(address):
    res = subprocess.call (['ping', '-c', '3', address])
    if res == 0:
        print ("ping to", address"OK")
    elif res == 2:
        print ("no response from", address)
        message = "xxxx cannot ping to "+address
        subprocess.Popen(["sudo","python3","/home/pi/scripts/telegram.py",message])
    else:
        print ("ping to", address, "failed!")
        message = "xxxx cannot ping to "+address
        subprocess.Popen(["sudo","python3","/home/pi/scripts/telegram.py",message])


adresses = ["192.168.5.1", "192.168.5.2","192.168.5.3","192.168.5.4"]
for host in address:
    pingfunction(host

On every Pi create a program for checking the status of the processes running on that pi, that perform a critical function. Or maybe not critical, but a process that you want to restart manually and or further investigate when it has stopped running.

import os 
import sys 
import time 
import subprocess

if (len(sys.argv) < 3): exit()

service = sys.argv[1] # the service of which the status will be checked 
check = sys.argv[2] # value report means output when active , otherwise alert  
retryafter = 30 # check again after x seconds. maybe just some slowness 
host = "xxxx" # show in message on which machine we are checking

status = os.system('systemctl is-active --quiet '+service)
#print(status) # will return 0 for active else inactive.
print(host+" start monitor process "+service)
if (status != 0): 
   print(service + " not active") 
   time.sleep(retryafter) 
   repeatstatus = os.system('systemctl is-active --quiet '+service) 
   if (repeatstatus != 0):
       #print("echt mis")
       alert= "'"+host + " " +service+" not active'" 
       subprocess.Popen(["sudo","python3","/home/pi/scripts/telegram.py",alert]) 
else:
    print(service + " active") 
    if (check ==  "report"):
       alert= "'"+host+" "+service+" active'"
       subprocess.Popen(["sudo","python3","/home/pi/scripts/telegram.py",alert])
 print(host+" end monitor process "+service)

On every Pi where there are batchjobs running periodically, create a program for checking last modification date of a logfile

import subprocess 
import os.path, time 
from datetime import datetime, timedelta 

def checklogdate(path, logfile): 
    now = datetime.utcnow() file_time = datetime.utcfromtimestamp(os.path.getmtime(path+logfile))
    #print (file_time)
    if (now - file_time) > timedelta(1): 
        print ("log from before yesterday", logfile) 
        message = logfile+" on host xxx is from before yesterday" 
        subprocess.Popen(["sudo","python3","/home/pi/scripts/telegram.py",message])


path = "/home/pi/log/" 
logfiles = ["mvphotos.log", "imresize.log"]
for logfile in logfiles: 
    checklogdate(path,logfile)

In the situation that needs attention, call a program to post a message to a Telegram channel

import requests 
import sys 
def telegram_bot_sendtext(bot_message): 
    bot_token = 'the token' 
    bot_chatID = 'the id of the channel' 
    send_text = 'https://api.telegram.org/bot' +  bot_token + '/sendMessage?chat_id='+ bot_chatID + '&parse_mode=Markdown&text=' + bot_message 
    response = requests.get(send_text) 

    return response.json()

msg = sys.argv[1] 
test = telegram_bot_sendtext(msg) 
print(test)

What to expect in the future

Selfhealing

monitoring

comments powered by Disqus