免死金牌: OpenClaw + keepalived

张开发
2026/4/16 11:37:14 15 分钟阅读

分享文章

免死金牌: OpenClaw + keepalived
文章目录背景解决方案查看IP检测脚本keepalived 配置演练故障openclaw-gateway.service背景问题来自小龙虾自杀, 当我让 OpenClaw 更新一些配置时, 它执行了一条openclaw gateway stop命令, 导致 OpenClaw 服务停止, 然后我就干瞪眼了, 还在傻等, 它甚至一句分别的话都没有说…openclaw gateway stop解决方案用 keeplived 来保持 OpenClaw 服务的运行, 在服务停止时, 能够自动重启服务, 顺便也学一下 keepalived 的用法查看IPifconfigeth0:flags4163UP,BROADCAST,RUNNING,MULTICASTmtu1450inet172.16.0.9 netmask255.255.0.0 broadcast172.16.255.255 inet6 fe80::f816:3eff:fedc:b5a8 prefixlen64scopeid 0x20linkether fa:16:3e:dc:b5:a8 txqueuelen1000(Ethernet)RX packets67391273bytes27877705315(27.8GB)RX errors0dropped0overruns0frame0TX packets110078632bytes24769427866(24.7GB)TX errors0dropped0overruns0carrier0collisions0lo:flags73UP,LOOPBACK,RUNNINGmtu65536inet127.0.0.1 netmask255.0.0.0 inet6 ::1 prefixlen128scopeid 0x10hostloop txqueuelen1000(Local Loopback)RX packets2288521bytes373008683(373.0MB)RX errors0dropped0overruns0frame0TX packets2288521bytes373008683(373.0MB)TX errors0dropped0overruns0carrier0collisions0检测脚本check_openclaw.sh脚本, 用于检查 OpenClaw 服务是否运行#!/bin/bashexportXDG_RUNTIME_DIR/run/user/0ifsystemctl--useris-active openclaw-gateway.service/dev/null21;thenifcurl-s-f-o/dev/null http://127.0.0.1:18789/health2/dev/null;then#logger -t openclaw-health Health check PASSEDexit0elselogger-topenclaw-healthHealth check FAILED - service not respondingexit1fielselogger-topenclaw-healthHealth check FAILED - service is not active# 尝试启动服务logger-topenclaw-healthAttempting to start servicesystemctl--userstart openclaw-gateway.servicesleep3# 检查启动是否成功ifsystemctl--useris-active openclaw-gateway.service/dev/null21;thenlogger-topenclaw-healthService started successfullyexit0elselogger-topenclaw-healthFailed to start serviceexit1fifikeepalived 配置keepalived 配置/etc/keepalived/keepalived.conf, 因为就一台服务器, 所以 state 为 MASTERglobal_defs { router_id OPENCLAW_MONITOR script_user root enable_script_security } vrrp_script chk_openclaw { script /usr/local/bin/check_openclaw.sh interval 10 # 每10秒检查一次更频繁 timeout 5 # 脚本执行超时5秒 weight -20 # 检查失败时优先级降低20 fall 2 # 连续2次失败判定为故障 rise 1 # 1次成功就恢复 } vrrp_instance OPENCLAW_MONITOR { state MASTER # 单机必须用 MASTER interface eth0 # 使用 eth0 virtual_router_id 51 priority 100 # 优先级 advert_int 2 # 心跳间隔2秒 # 虚拟IP配置 virtual_ipaddress { 172.16.0.100/16 dev eth0 # VIP 绑定到 eth0 } track_script { chk_openclaw } }启动 keepalived 服务# 重启 keepalivedsystemctl restart keepalived# 启用 keepalived 服务systemctlenablekeepalived# 查看 keepalived 状态systemctl status keepalived查看 VIP(虚拟IP)ipaddr show eth0|grep172.16.0.100 inet172.16.0.100/16 scope global secondary eth0演练故障# 停止 OpenClaw 服务openclaw gateway stopjournalctl -t openclaw-health -f查看日志Apr 02 16:06:24 lavm-0sdc09108n openclaw-health[1567604]: Attempting to start service Apr 02 16:06:27 lavm-0sdc09108n openclaw-health[1567623]: Service started successfully Apr 02 16:06:34 lavm-0sdc09108n openclaw-health[1567638]: Health check FAILED - service not respondingnc -z localhost 18789验证:Connection to localhost(127.0.0.1)18789port[tcp/*]succeeded!也可以执行openclaw gateway status查看服务状态 OpenClaw 2026.3.24 (cff6dc9) I can grep it, git blame it, and gently roast it—pick your coping mechanism. │ ◇ Service: systemd (enabled) File logs: /tmp/openclaw/openclaw-2026-04-02.log Command: /root/.nvm/versions/node/v22.22.1/bin/node /root/.pnpm-global/5/.pnpm/openclaw2026.3.24_napi-rscanvas0.1.97/node_modules/openclaw/dist/index.js gateway --port 18789 Service file: ~/.config/systemd/user/openclaw-gateway.service Service env: OPENCLAW_GATEWAY_PORT18789 Service config looks out of date or non-standard. Service config issue: Gateway service PATH includes version managers or package managers; recommend a minimal PATH. (/root/.nvm/versions/node/v22.22.1/bin) Service config issue: Gateway service uses Node from a version manager; it can break after upgrades. (/root/.nvm/versions/node/v22.22.1/bin/node) Service config issue: System Node 22 LTS (22.14) or Node 24 not found; install it before migrating away from version managers. Recommendation: run openclaw doctor (or openclaw doctor --repair). Config (cli): ~/.openclaw/openclaw.json Config (service): ~/.openclaw/openclaw.json Gateway: bindloopback (127.0.0.1), port18789 (service args) Probe target: ws://127.0.0.1:18789 Dashboard: http://127.0.0.1:18789/ Probe note: Loopback-only gateway; only local clients can connect. Runtime: running (pid 1567606, state active, sub running, last exit 0, reason 0) RPC probe: ok Listening: 127.0.0.1:18789 Troubles: run openclaw status Troubleshooting: https://docs.openclaw.ai/troubleshootingopenclaw-gateway.service还有更简单的方法, 直接在/etc/systemd/system/openclaw-gateway.service中配置Restartalways即可, 这样它就会在服务停止时自动重启服务[Unit]DescriptionOpenClaw GatewayAfternetwork-online.targetWantsnetwork-online.target[Service]Typesimple#User$(whoami)UserrootEnvironmentFile/opt/openclaw/.envWorkingDirectory/home/$(whoami)/.openclawExecStart$(whichopenclaw)gateway--forceRestartalwaysRestartSec2[Install]WantedBymulti-user.target

更多文章