Hosted-Engine超融合部署问题求助

mengzyou 1人参与 0 次点击

大家好,使用Hosted-Engine超融合的方式部署了三台主机,三台主机作为GlusterFS集群提供了engine, vmstore, data 三个卷,同时在三台主机也作为可以运行engine虚拟机的主机,最后又加入了另外一台主机组成了4台主机的集群。

现在的问题就是访问oVirt-engine web界面的时候,经常报503的错误,然后使用 hosted-engine --vm-status 查看engine的状态如下:

`</p><p>[root@vhost1 ~]# hosted-engine –vm-status

–== Host vhost1.yhmk.lan (id: 1) status ==–

conf_on_shared_storage : True
Status up-to-date : True
Hostname : vhost1.<span style=”background-color: rgb(255, 255, 255); color: rgb(51, 51, 51);”>alatest</span>.lan
Host ID : 1
Engine status : {“reason”: “bad vm status”, “health”: “bad”, “vm”: “down_unexpected”, “detail”: “Down”}
Score : 0
stopped : False
Local maintenance : False
crc32 : 1f25baff
local_conf_timestamp : 1253650
Host timestamp : 1253649
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1253649 (Thu Apr 8 08:05:48 2021)
host-id=1
score=0
vm_conf_refresh_time=1253650 (Thu Apr 8 08:05:48 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineUnexpectedlyDown
stopped=False
timeout=Thu Jan 15 20:23:29 1970

–== Host vhost2.yhmk.lan (id: 2) status ==–

conf_on_shared_storage : True
Status up-to-date : True
Hostname : vhost2.<span style=”background-color: rgb(255, 255, 255); color: rgb(51, 51, 51);”>alatest</span>.lan
Host ID : 2
Engine status : {“reason”: “vm not running on this host”, “health”: “bad”, “vm”: “down_unexpected”, “detail”: “unknown”}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 539fc30c
local_conf_timestamp : 1253343
Host timestamp : 1253343
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1253343 (Thu Apr 8 08:05:46 2021)
host-id=2
score=3400
vm_conf_refresh_time=1253343 (Thu Apr 8 08:05:46 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False

–== Host vhost3.yhmk.lan (id: 3) status ==–

conf_on_shared_storage : True
Status up-to-date : True
Hostname : vhost3.alatest.lan
Host ID : 3
Engine status : {“reason”: “bad vm status”, “health”: “bad”, “vm”: “up”, “detail”: “Powering up”}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 4072e0b8
local_conf_timestamp : 1252345
Host timestamp : 1252345
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1252345 (Thu Apr 8 08:05:42 2021)
host-id=3
score=3400
vm_conf_refresh_time=1252345 (Thu Apr 8 08:05:42 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False</p><p></p><p>然后,过一会之后,就又可以访问engine web界面了,在界面上查看主机的状态,每次都会有一个或者两个有显示因为 HA 分数而不可用的标记,但是过一段时间又会恢复正常。重新用hosted-engine –vm-status` 查看,有时候会发现engine会自动迁移到其他主机上。

但是这种情况总是经常发生,感觉就是 hosted-engine 总是不稳定,请问大家有遇到过这样的情况吗?