cloudstack下libvirtd服務(wù)無(wú)響應(yīng)問(wèn)題
在cloudstack4.5.2版本下,偶爾出現(xiàn)libvirtd服務(wù)無(wú)響應(yīng)的情況,導(dǎo)致virsh命令無(wú)法使用,同時(shí)伴隨cloudstack master丟失該slave主機(jī)連接的情況。最初懷疑是libvirtd服務(wù)或版本的問(wèn)題,經(jīng)過(guò)分析和排查最終確定是cloudstack-agent的問(wèn)題。但是在官網(wǎng)上并沒(méi)有找到類(lèi)似的bug提交,該問(wèn)題可能還存在于更高的版本,需要時(shí)間進(jìn)一步從根本上分析。下面是該問(wèn)題的處理過(guò)程,在此記錄下,關(guān)注和使用cloudstack的朋友可以參考。
眾所周知,cloudstack的社區(qū)熱度遠(yuǎn)不如openstack,為什么還要選擇clcoudstack?這個(gè)問(wèn)題以后有機(jī)會(huì)再和大家聊。言歸正傳。
環(huán)境交代
宿主機(jī)操作系統(tǒng):centos6.5x64(2.6.32-431.el6.x86_64)
cloudstack版本:4.5.2
libvirt版本:libvirt-0.10.2-54.el6_7.2.x86_64
問(wèn)題描述
通過(guò)cloudstackapi listHosts報(bào)警信息顯示:
node5.cloud.rtmap:192.168.14.20 state is Down at 2016-05-13T07:19:04+0800
#有關(guān)cloudstackapi的使用方法在其它文章中總結(jié),不在此處說(shuō)明。
登陸問(wèn)題宿主服務(wù)器檢查:
[root@node5 log]#virsh list --all
沒(méi)有響應(yīng)ctrl^c退出
這時(shí)的vm可以正常工作,但處于失控狀態(tài)
嘗試重啟啟動(dòng)libvirtd服務(wù):
[root@node5 log]# service libvirtd stop
正在關(guān)閉 libvirtd 守護(hù)進(jìn)程: [失敗] #無(wú)法關(guān)閉libvirtd服務(wù)
嘗試重啟啟動(dòng)cloudstack-agent服務(wù):
[root@node5 libvirt]# service cloudstack-agent restart Stopping Cloud Agent: Starting Cloud Agent:
libvirtd故障依舊
簡(jiǎn)單維護(hù)
[root@node5 ping]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
libvirtd:錯(cuò)誤:Unable to initialize network sockets。查看 /var/log/messages 或者運(yùn)行不帶 --daemon 的命令查看更多信息。
[root@node5 log]# libvirtd -d
可以執(zhí)行成功,這時(shí)執(zhí)行virsh list --all 可以查看和操作vm
[root@node5 log]#virsh list --all Id 名稱(chēng) 狀態(tài) ---------------------------------------------------- 2 i-4-185-VM running
雖然vm運(yùn)行正常,現(xiàn)在也可以通過(guò)命令正常管理了。但是對(duì)于cloudstack平臺(tái)而言,宿主機(jī)處于down狀態(tài),vm處于失控狀態(tài)。
臨時(shí)解決辦法是在其它大的升級(jí)和維護(hù)過(guò)程中重啟服務(wù)器解決,根本解決還要具體問(wèn)題具體分析。
分析與排查
檢查進(jìn)程
[root@node5 log]# ps ax |grep libvirtd 6485 ? R 863:37 libvirtd --daemon -l #該服務(wù)始終處于run狀態(tài)
[root@node5 log]# top -p 6485 top -p 6485 top - 09:19:41 up 12 days, 22:27, 1 user, load average: 3.05, 5.07, 6.64 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 4.8%us, 1.4%sy, 0.0%ni, 93.1%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 264420148k total, 182040780k used, 82379368k free, 834232k buffers Swap: 8388600k total, 92k used, 8388508k free, 100453708k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6485 root 20 0 984m 12m 4440 R 100.2 0.0 844:22.68 libvirtd #cpu占用100%,無(wú)法釋放,影響系統(tǒng)穩(wěn)定性
殺進(jìn)程
[root@node5 log]# kill -9 6485 [root@node5 log]# kill -9 6485 [root@master log]# ps ax |grep libvirtd #檢查進(jìn)程依然存在 6485 ? R 863:37 libvirtd --daemon -l [root@node5 ~]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf libvirtd:錯(cuò)誤:Unable to initialize network sockets。查看 /var/log/messages 或者運(yùn)行不帶 --daemon 的命令查看更多信息。 [root@node5 ~]# netstat -antp |grep 16509 tcp 0 0 0.0.0.0:16509 0.0.0.0:* LISTEN 3658/libvirtd tcp 1 0 192.168.14.25:16509 192.168.14.22:8717 CLOSE_WAIT - tcp 1 0 192.168.14.25:16509 192.168.14.20:5152 CLOSE_WAIT - tcp 1 0 192.168.14.25:16509 192.168.14.10:39359 CLOSE_WAIT - tcp 0 0 :::16509 :::* LISTEN 3658/libvirtd tcp 39 0 ::1:16509 ::1:19715 CLOSE_WAIT -
經(jīng)過(guò)上述操作,初步判斷l(xiāng)ibvirtd陷入了hang死狀態(tài)。
追蹤進(jìn)程
[root@node5 log]#strace -f libvirtd [pid 107570] close(23058) = -1 EBADF (Bad file descriptor) [pid 107570] close(23059) = -1 EBADF (Bad file descriptor) [pid 107570] close(23060) = -1 EBADF (Bad file descriptor) [pid 107570] close(23061) = -1 EBADF (Bad file descriptor) [pid 107570] close(23062) = -1 EBADF (Bad file descriptor) [pid 107570] close(23063) = -1 EBADF (Bad file descriptor) [pid 107570] close(23064) = -1 EBADF (Bad file descriptor) [pid 107570] close(23065) = -1 EBADF (Bad file descriptor) [pid 107570] close(23066) = -1 EBADF (Bad file descriptor) [pid 107570] close(23067) = -1 EBADF (Bad file descriptor) [pid 107570] close(23068) = -1 EBADF (Bad file descriptor) [pid 107570] close(23069) = -1 EBADF (Bad file descriptor) [pid 107570] close(23070) = -1 EBADF (Bad file descriptor) [pid 107570] close(23071) = -1 EBADF (Bad file descriptor) ^C[pid 107570] close(23072 <unfinished ...> Process 107559 detached Process 107560 detached Process 107561 detached Process 107562 detached Process 107563 detached Process 107564 detached Process 107565 detached Process 107566 detached Process 107567 detached Process 107568 detached Process 107569 detached Process 107570 detached
父進(jìn)程6485在不斷的產(chǎn)生和關(guān)閉子進(jìn)程,并返回錯(cuò)誤信息。Bad file descriptor的原因(如何觸發(fā)的,誰(shuí)觸發(fā)的)? 循環(huán)為何無(wú)法退出?問(wèn)題如何再現(xiàn)?
獲得更多的線(xiàn)索
官方文檔(libvirtd各種故障診斷記錄和解決辦法非常詳盡)
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start
開(kāi)啟系統(tǒng)日志
Change libvirt's logging in /etc/libvirt/libvirtd.conf by enabling the line below. To enable the setting the line, open the /etc/libvirt/libvirtd.conf file in a text editor, remove the hash (or #) symbol from the beginning of the following line, and save the change:
log_outputs="3:syslog:libvirtd"
參照配置,重啟服務(wù)器等待下次故障觀察日志
...... Jun 1 12:42:26 node5 abrtd: New client connected Jun 1 12:42:26 node5 abrtd: Directory 'pyhook-2016-06-01-12:42:26-70065' creation detected Jun 1 12:42:26 node5 abrt-server[70066]: Saved Python crash dump of pid 70065 to /var/spool/abrt/pyhook-2016-06-01-12:42:26-70065 Jun 1 12:42:26 node5 abrtd: Package 'cloudstack-common' isn't signed with proper key Jun 1 12:42:26 node5 abrtd: 'post-create' on '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065' exited with 1 Jun 1 12:42:26 node5 abrtd: Deleting problem directory '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065' Jun 1 12:43:26 node5 abrt: detected unhandled Python exception in '/usr/share/cloudstack-common/scripts/vm/network/security_group.py' ...... Jun 6 10:36:21 node5 libvirtd: 102840: warning : qemuDomainObjBeginJobInternal:878 : Cannot start job (modify, none) for domain i-4-30-VM; current job is (modify, none) owned by (102925, 0) Jun 6 10:36:21 node5 libvirtd: 102840: error : qemuDomainObjBeginJobInternal:883 : Timed out during operation: cannot acquire state change lock Jun 6 10:39:59 node5 libvirtd: 114071: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org) Jun 6 10:39:59 node5 libvirtd: 114071: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用 Jun 6 10:40:46 node5 libvirtd: 114147: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org) Jun 6 10:40:46 node5 libvirtd: 114147: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用 Jun 6 10:42:15 node5 libvirtd: 114204: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org) Jun 6 10:42:15 node5 libvirtd: 114204: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用 Jun 6 10:47:05 node5 libvirtd: 114375: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org) Jun 6 10:47:05 node5 libvirtd: 114375: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用 Jun 6 10:47:23 node5 libvirtd: 114412: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org) Jun 6 10:47:23 node5 libvirtd: 114412: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用 ...... Jun 12 03:08:02 node5 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3111" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Jun 12 09:20:40 node5 libvirtd: 72575: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org) Jun 12 09:20:40 node5 libvirtd: 72575: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': 資源暫時(shí)不可用
并未獲得致命錯(cuò)誤和更多線(xiàn)索。(該日志配置選項(xiàng)還是很有必要打開(kāi)的,很多問(wèn)題都可以通過(guò)它來(lái)定位)
解決過(guò)程
解決思路
嘗試和找到終止進(jìn)程、重啟服務(wù)的方法
提交bug,等待補(bǔ)丁升級(jí)
分析源代碼,再現(xiàn)問(wèn)題,解決問(wèn)題(投入研發(fā)和時(shí)間)
由于不能再現(xiàn)問(wèn)題,還是從簡(jiǎn)入繁吧。觸發(fā)這些子進(jìn)程的元兇是誰(shuí)?還是cloudstack-agent的嫌疑最大,但之前重啟過(guò)該服務(wù)并沒(méi)有解決問(wèn)題,那么agent服務(wù)是怎么一回事呢?
看下啟動(dòng)腳本可以基本了解,
[root@node5 libvirt]# cat /etc/rc.d/init.d/cloudstack-agent #!/bin/bash # chkconfig: 35 99 10 # description: Cloud Agent # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # WARNING: if this script is changed, then all other initscripts MUST BE changed to match it as well . /etc/rc.d/init.d/functions # set environment variables SHORTNAME=$(basename $0 | sed -e 's/^[SK][0-9][0-9]//') PIDFILE=/var/run/"$SHORTNAME".pid LOCKFILE=/var/lock/subsys/"$SHORTNAME" LOGDIR=/var/log/cloudstack/agent LOGFILE=${LOGDIR}/agent.log PROGNAME="Cloud Agent" CLASS="com.cloud.agent.AgentShell" JSVC=`which jsvc 2>/dev/null`; # exit if we don't find jsvc if [ -z "$JSVC" ]; then echo no jsvc found in path; exit 1; fi unset OPTIONS [ -r /etc/sysconfig/"$SHORTNAME" ] && source /etc/sysconfig/"$SHORTNAME" # The first existing directory is used for JAVA_HOME (if JAVA_HOME is not defined in $DEFAULT) JDK_DIRS="/usr/lib/jvm/jre /usr/lib/jvm/java-7-openjdk /usr/lib/jvm/java-7-openjdk-i386 /usr/lib/jvm/java-7-openjdk-amd64 /usr/lib/jvm/java-6-openjdk /usr/lib/jvm/java-6-openjdk-i386 /usr/lib/jvm/java-6-openjdk-amd64 /usr/lib/jvm/java-6-sun" for jdir in $JDK_DIRS; do if [ -r "$jdir/bin/java" -a -z "${JAVA_HOME}" ]; then JAVA_HOME="$jdir" fi done export JAVA_HOME ACP=`ls /usr/share/cloudstack-agent/lib/*.jar | tr '\n' ':' | sed s'/.$//'` PCP=`ls /usr/share/cloudstack-agent/plugins/*.jar 2>/dev/null | tr '\n' ':' | sed s'/.$//'` # We need to append the JSVC daemon JAR to the classpath # AgentShell implements the JSVC daemon methods export CLASSPATH="/usr/share/java/commons-daemon.jar:$ACP:$PCP:/etc/cloudstack/agent:/usr/share/cloudstack-common/scripts" start() { echo -n $"Starting $PROGNAME: " if hostname --fqdn >/dev/null 2>&1 ; then $JSVC -Xms256m -Xmx2048m -cp "$CLASSPATH" -pidfile "$PIDFILE" \ -errfile $LOGDIR/cloudstack-agent.err -outfile $LOGDIR/cloudstack-agent.out $CLASS RETVAL=$? echo else failure echo echo The host name does not resolve properly to an IP address. Cannot start "$PROGNAME". > /dev/stderr RETVAL=9 fi [ $RETVAL = 0 ] && touch ${LOCKFILE} return $RETVAL } stop() { echo -n $"Stopping $PROGNAME: " $JSVC -pidfile "$PIDFILE" -stop $CLASS RETVAL=$? echo [ $RETVAL = 0 ] && rm -f ${LOCKFILE} ${PIDFILE} } case "$1" in start) start ;; stop) stop ;; status) status -p ${PIDFILE} $SHORTNAME RETVAL=$? ;; restart) stop sleep 3 start ;; condrestart) if status -p ${PIDFILE} $SHORTNAME >&/dev/null; then stop sleep 3 start fi ;; *) echo $"Usage: $SHORTNAME {start|stop|restart|condrestart|status|help}" RETVAL=3 esac exit $RETVAL
[root@node5 libvirt]# ps ax |grep jsvc.exec 6655 ? Ss 0:00 jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4.3.jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7.0.jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3.22.jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-2.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2.2.jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5.2.jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7.0.jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2.1.jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0.10.jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1.3.jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7.0.jar:/usr/share/cloudstack-agent/lib/dom4j-1.6.1.jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6.6.jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0.1.jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7.1.jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7.2.jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3.6.jar:/usr/share/cloudstack-agent/lib/httpcore-4.3.3.jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jasypt-1.9.0.jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12.1.GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.1-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-1.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0.0.jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar 6657 ? Sl 0:05 jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4.3.jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7.0.jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3.22.jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-2.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2.2.jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5.2.jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7.0.jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2.1.jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0.10.jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1.3.jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7.0.jar:/usr/share/cloudstack-agent/lib/dom4j-1.6.1.jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6.6.jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0.1.jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7.1.jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7.2.jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3.6.jar:/usr/share/cloudstack-agent/lib/httpcore-4.3.3.jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jasypt-1.9.0.jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12.1.GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.1-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-1.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0.0.jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
重啟服務(wù)
[root@node5 bin]# service cloudstack-agent status cloudstack-agent (pid 6657) 正在運(yùn)行... [root@node5 bin]# service cloudstack-agent stop Stopping Cloud Agent: [root@node5 bin]# service cloudstack-agent status cloudstack-agent (pid 6657) 正在運(yùn)行..
ps ax |grep jsvc.exec 也驗(yàn)證了進(jìn)程依然存在
眼前一亮的同時(shí),也發(fā)現(xiàn)了之前使用restart帶來(lái)的問(wèn)題,stop不成功的問(wèn)題被掩蓋了~~~有沒(méi)有懊惱? 不過(guò)來(lái)不及反思,接下來(lái)的問(wèn)題還遠(yuǎn)不是這么簡(jiǎn)單......
[root@node5 bin]# kill -9 6655 6657 [root@node5 bin]# kill -9 6655 6657 -bash: kill: (6655) - 沒(méi)有那個(gè)進(jìn)程 -bash: kill: (6657) - 沒(méi)有那個(gè)進(jìn)程 [root@node5 bin]# service cloudstack-agent status cloudstack-agent 已死,但 pid 文件仍存 [root@node5 bin]# rm /var/run/cloudstack-agent.pid rm:是否刪除普通文件 "/var/run/cloudstack-agent.pid"?y [root@node5 bin]# service cloudstack-agent status cloudstack-agent 已死,但是 subsys 被鎖 [root@node5 bin]# service cloudstack-agent start [root@node5 bin]# service cloudstack-agent status cloudstack-agent (pid 109382) 正在運(yùn)行... [root@node5 bin]# netstat -antp |grep 8250 tcp 0 0 192.168.14.20:22220 192.168.14.10:8250 ESTABLISHED 109382/jsvc.exec
處理后狀態(tài)恢復(fù)正常,但是libvirtd仍然無(wú)法殺掉, 很快netstat -antp |grep 8250 狀態(tài)再次消失,cloudstack master平臺(tái)監(jiān)控主機(jī)記錄由Up狀態(tài)轉(zhuǎn)為disconnect狀態(tài)。不過(guò)畢竟不是down狀態(tài),較之前已經(jīng)有了進(jìn)步。
啟動(dòng)一個(gè)libvirtd -d看下,
[root@node5 bin]# libvirtd -d [root@node5 bin]# ps ax |grep libvirtd 6485 ? R 863:37 libvirtd --daemon -l 130057 ? Sl 0:38 libvirtd -d 28904 pts/0 S+ 0:00 grep libvirtd
然后在cloudstack master平臺(tái)上手工點(diǎn)擊強(qiáng)制重新連接該主機(jī),成功了。主機(jī)監(jiān)控狀態(tài)由disconnect轉(zhuǎn)為Up,這時(shí)再次嘗試殺掉6485仍然是不成功的,于是又在cloudstack master管理平臺(tái)上嘗試著點(diǎn)擊操作了一下暫停vm命令,vm成功暫停。再返回服務(wù)器上觀察原來(lái)hung死的libvirtd進(jìn)程已經(jīng)消失。
[root@node5 bin]# libvirtd -d [root@node5 bin]# ps ax |grep libvirtd 130057 ? Sl 0:38 libvirtd -d 28904 pts/0 S+ 0:00 grep libvirtd
至此既恢復(fù)了平臺(tái)對(duì)該主機(jī)的管控,也終止了libvirtd異常進(jìn)程。問(wèn)題初步歸于cloudstack-agent在處理發(fā)送個(gè)libvirtd的信號(hào)上存在些小問(wèn)題。以后再單獨(dú)分析下jsvc進(jìn)程,再現(xiàn)問(wèn)題和根本解決。
問(wèn)題反思
在處理服務(wù)異常的問(wèn)題上,命令行參數(shù)不要用restart,用stop和kill來(lái)調(diào)試。說(shuō)起來(lái)都是淚!
相關(guān)文章
10大HBase常見(jiàn)運(yùn)維工具整理小結(jié)
這篇文章主要介紹了10大HBase常見(jiàn)運(yùn)維工具整理小結(jié),小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,也給大家做個(gè)參考。一起跟隨小編過(guò)來(lái)看看吧2020-07-07Ubuntu20.04桌面安裝及root權(quán)限開(kāi)通和ssh安裝詳解
這篇文章主要介紹了Ubuntu20.04桌面安裝及root權(quán)限開(kāi)通和ssh安裝詳解,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2020-08-08centos8安裝zabbix提示All mirrors were tried的解決方案
這篇文章主要介紹了centos8安裝zabbix提示All mirrors were tried的解決方案,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2024-01-01Linux gzip命令壓縮文件實(shí)現(xiàn)原理及代碼實(shí)例
這篇文章主要介紹了Linux gzip命令壓縮文件實(shí)現(xiàn)原理及代碼實(shí)例,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2020-08-08