쉘 스크립트로 오픈스택 모니터링하기

티스토리 뷰

OpenStack

쉘 스크립트로 오픈스택 모니터링하기

naleejang 2022. 12. 9. 13:56

이 스크립트 하나만 있으면 오픈스택 배포를 위한 디렉터 노드부터 컨트롤러, 컴퓨트 노드의 하드웨어 상태를 점검하고, 서비스 상태까지 한꺼번에 점검을 할 수 있다. 물론 단점은 글씨로 나온다는 것이지만, 엔지니어가 퀵하게 오버클라우드 전 노드를 살펴보는용으로는 매우 훌륭하다고 생각한다. 게다가 로그도 남겨주니 1석 2조가 아닌가?

#!/bin/bash

CON="ctrl01 ctrl02 ctrl03"
COM="cn01 cn02 cn03 cn04 cn05"
LOG_FILE=""
#------------------------
# Make Log File
#------------------------
function make_logs()
{
  DATE=$(DATE +%Y%m%d%H%M)
  LOG_FILE="/var/log/daily_chk/chk_overcloud_$DATE.log"
  sudo touch $LOG_FILE
  sudo chmod 777 $LOG_FILE
}

#------------------------
# Print message
#------------------------
function print_msg()
{
  Message=$1
  Date=$(date "+%Y-%m-%d %H:%M")
  echo "$Date [Daily_chk] $Message" >> $LOG_FILE
  echo "$Date $Message"
}

#------------------------
# Print message1
#------------------------
function print_msg1()
{
  Message=$1
  Date=$(date "+%Y-%m-%d %H:%M")
  echo "$Message" >> $LOG_FILE
  echo "$Message"
}

make_logs

print_msg "#----------------------------"
print_msg "# Check IDM Ping"
print_msg "#----------------------------"

idm_ping$(ping -c 1 idm-host | grep icmp_seq | wc -l)
if [$idm_ping -eq 0 ]; then
  print_msg "IDM ping status is normal"
fi

print_msg "#----------------------------"
print_msg "# Check Director Network"
print_msg "#----------------------------"

net_stat=$ip a | grep "state UP" | grep mq | wc -l)
if [ $net_stat -eq 4 ]
then
  print_msg "Network status is normal"
else
  print_msg "Please check network status"
  print_msg "$(ip a)"
fi 

print_msg "#----------------------------"
print_msg "# Check Director Service logs"
print_msg "#----------------------------"
log_stat=$(sudo sh chk-log.sh | wc -l)

if [ $log_stat -eq 0 ]
then
  print_msg "No error service logs. This system status is normal."
else
  error_msg=$(sudo sh chk-log.sh)
  print_msg "Please check system logs and container status."
  print_msg "$error_msg"
fi

print_msg "#----------------------------"
print_msg "# Check Overcloud Power"
print_msg "#----------------------------"

for i in {1..3}
do
  print_msg "ctrl0$i"
  power_stat=$(fence_rhevm -o status -a 192.168.1.15 -l admin@internal -p passwd -n ctrl0$i --shell-timeout=30 --ssl-insecure -z --disable-http-filter)
  print_msg "$power_stat"
done

for i in {31..35}
do
  print_msg "cn0$i"
  power_stat=$(ipmitool -H 192.168.141.15 -l lanplus -U admin -P passwd power status)
  print_msg "$power_stat"
done

print_msg "#----------------------------"
print_msg "# Controller"
print_msg "#----------------------------"

for i in $CON
do

  print_msg "#----------------------------"
  print_msg "# Check Network"
  print_msg "#----------------------------"
  net_stat=(ssh -q heat-admin@$i ip a | grep "state UP" | grep mq | wc -l)

  if [ $net_stat -eq 7 ]
  then
    print_msg "Network status is normal"
  else
    pring_msg "Please check network status"
    print_msg "$(ssh -q heat-admin@$i sudo ip a)"
  fi

  if [ $i = "adm-vps-ctrl01" ]
  then
    print_msg "#----------------------------"
    print_msg "# Check Clustering"
    print_msg "#----------------------------"
    cluster_stat=$(ssh -q heat-admin@$i sudo pcs status | grep -i 'failed' | wc -l)

    if [ $cluster_stat -eq 0 ]
    then
      print_msg "Pacemaker status is normal"
    else
      print_msg "Please check pacemaker"
      print_msg "$(ssh -q heat-admin@$i sudo pcs status)"
    fi
  fi

  print_msg "#----------------------------"
  print_msg "# Check CPU"
  print_msg "#----------------------------"
  cpu_stat=$(ssh -q heat-admin@$i sudo mpstat | grep all | awk '{print $4}')
  print_msg "CPU usage is $cpu_stat. If CPU usage is high, please check system CPU status"

  print_msg "#----------------------------"
  print_msg "# Check Memory"
  print_msg "#----------------------------"
  mem_stat=$(ssh -q heat-admin@$i sudo free -h | grep -i mem | awk '{print $4}')
  print_msg "Free memory amount is $mem_stat. If free memory amount is low, please check system memory status"  

  print_msg "#----------------------------"
  print_msg "# Check Container"
  print_msg "#----------------------------"
  container_stat=$(ssh -q heat-admin@$i sudo systemctl list-units tripleo_* | grep failed | wc -l")

  if [ $container_stat -eq 0 ]
  then
    print_msg "Container status is normal."
  else
    print_msg "Please check container status"
    print_msg "$(ssh -q heat-admin@$i 'sudo systemctl list-units tripleo_*')"
  fi

  print_msg "#----------------------------"
  print_msg "# Check NFS - glance"
  print_msg "#----------------------------"
  nfs_stat=$(ssh -q heat-admin@$i sudo dh -h | grep glance | wc -l)
  
  if [ $nfs_stat -eq 1 ]
  then
    print_msg "NFS status is normal."
  else
    print_msg "Please check network status and nfs status"
  fi

  print_msg "#----------------------------"
  print_msg "# Check Service logs"
  print_msg "#----------------------------"
  log_stat=$(ssh heat-admin@$i sudo sh chk-log.sh | wc -l)

  if [ $log_stat -eq 0 ]
  then
    print_msg "No error service logs. This system status is normal."
  else
    error_msg=$(ssh -q heat-admin@$i sudo sh chk-log.sh)
    print_msg "Please check system logs and container status."
    print_msg "$error_msg"
  fi
done

print_msg "#----------------------------"
print_msg "# Compute"
print_msg "#----------------------------"

for i in $COM
do
  print_msg ">>>>>> $i <<<<<<<<"

  print_msg "#----------------------------"
  print_msg "# Check Network"
  print_msg "#----------------------------"
  net_stat=(ssh -q heat-admin@$i ip a | grep "state UP" | grep mq | wc -l)

  if [ $net_stat -eq 10 ]
  then
    print_msg "Network status is normal"
  else
    pring_msg "Please check network status"
    print_msg "$(ssh -q heat-admin@$i sudo ip a)"
  fi

  print_msg "#----------------------------"
  print_msg "# Check CPU"
  print_msg "#----------------------------"
  cpu_stat=$(ssh -q heat-admin@$i sudo mpstat | grep all | awk '{print $4}')
  print_msg "CPU usage is $cpu_stat. If CPU usage is high, please check system CPU status"

  print_msg "#----------------------------"
  print_msg "# Check Memory"
  print_msg "#----------------------------"
  mem_stat=$(ssh -q heat-admin@$i sudo free -h | grep -i mem | awk '{print $4}')
  print_msg "Free memory amount is $mem_stat. If free memory amount is low, please check system memory status"

  print_msg "#----------------------------"
  print_msg "# Check Container"
  print_msg "#----------------------------"
  container_stat=$(ssh -q heat-admin@$i sudo systemctl list-units tripleo_* | grep failed | wc -l")

  if [ $container_stat -eq 0 ]
  then
    print_msg "Container status is normal."
  else
    print_msg "Please check container status"
    print_msg "$(ssh -q heat-admin@$i 'sudo systemctl list-units tripleo_*')"
  fi
  
  print_msg "#----------------------------"
  print_msg "# Check Service logs"
  print_msg "#----------------------------"
  log_stat=$(ssh heat-admin@$i sudo sh chk-log.sh | wc -l)

  if [ $log_stat -eq 0 ]
  then
    print_msg "No error service logs. This system status is normal."
  else
    error_msg=$(ssh -q heat-admin@$i sudo sh chk-log.sh)
    print_msg "Please check system logs and container status."
    print_msg "$error_msg"
  fi
done

source /home/stack/overcloudrc

print_msg "#----------------------------"
print_msg "# Overcloud compute service"
print_msg "#----------------------------"
print_msg1 "$(openstack compute service list -c Binary -c Host -c Zone -c Status -c 'Updated At' --sort-column Host)"

print_msg "#----------------------------"
print_msg "# Overcloud volume service"
print_msg "#----------------------------"
print_msg1 "$(openstack volume service list)"

print_msg "#----------------------------"
print_msg "# Overcloud network service"
print_msg "#----------------------------"
print_msg1 "$(openstack network agent list -c Host -c 'Agent Type' -c Alive -c State -c 'vCPUs Used' -c vCPUs -c 'Memroy MB Used' -c 'Memory MB' --sort-column 'Hypervisor Hostname')"

print_msg "#----------------------------"
print_msg "# Overcloud hypervisor service"
print_msg "#----------------------------"
print_msg1 "$(openstack hypervisor list --long -c 'Hypervisor Hostname' -c 'Host IP' -c State -c 'vCPUs Used' -c vCPUs -c 'Memory MB Used' -c 'Memory MB' --sort-column 'Hypervisor HostName')"

print_msg "#----------------------------"
print_msg "# Instance count per hypervisor"
print_msg "#----------------------------"
print_msg1 "$(openstack server list --all --long --status ACTIVE -c Host --sort-column Host -f value | uniq -c)"

내용이 다소 길기는 하지만, 다음에 시간이 되면 중복되는 기능들을 함수로 변경하는 작업을 하면 좋을 것 같기는 하다. 그래도 이렇게 한번 스크립트를 짜 놓으면 모든 노드에 매번 들어가서 정보를 확인하지 않아도 되서 매우 편리하다.

저작자표시

'OpenStack' 카테고리의 다른 글

[이벤트] Red Hat OpenStack 17 공동 제본 이벤트 (2)	2023.03.06
[공유] Red Hat OpenStack 17 설치 템플릿 (3)	2023.02.01
컴퓨트 노드 별 인스턴스 개수 세기 (0)	2022.12.07
컴퓨트 노드에서 DHCP 가상 네트워크 찾기 (2)	2022.11.29
[공유] Red Hat OpenStack 16.2 설치 템플릿 (5)	2022.01.26

공지사항

예제 소스 및 명령어 관련 안내

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

글 보관함

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Nalee와 함께 떠나는 IT이야기

티스토리 뷰

쉘 스크립트로 오픈스택 모니터링하기

'OpenStack' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역