Hermes Agent 实战 04｜模型路由实战：config 全解、thinking 注入与 401/503 源码级根因原创

# 模型路由实战：config 全解、thinking 注入与 401/503 源码级根因

系列第 04 篇。一个 Hermes Agent 不止用一个模型：主模型、兜底、外加十几个「辅助任务」各用各的。这篇是 Hermes 模型配置的完整拆解——config.yaml 每个相关块怎么写、thinking 在 Hermes 里为什么默认注入不进去、辅助模型 401/503 的源码级根因。全部可照抄。

# 1. 主模型与 provider（`model` 块）

# ~/.hermes/config.yaml
model:
  default: kimi-k2.5
  provider: custom:newapi               # 自定义 OpenAI 兼容网关
  base_url: http://<newapi-host>:3000/v1
  api_key: sk-***
fallback_providers:                     # 主模型失败时按序兜底
  - deepseek-v4-pro

1
2
3
4
5
6
7
8

几个 Hermes 特有的点：

provider: custom:newapi 是「自定义 provider」语法，前缀 custom: 后面的名字仅作标识；真正决定路由的是 base_url。
fallback_providers 是列表，按顺序尝试；每个元素可以是一个模型名（复用上面的 provider）或一个完整 provider 块。
Profile 级配置 profiles/<name>/config.yaml 会覆盖全局同名键。我把 12 个 Profile 全对齐到同一套基线，于是「审计配置」退化成一次 diff。

# 2. thinking 能力矩阵（newapi 网关实测）

Hermes 不知道哪个模型支持 thinking——/v1/models 不暴露这个能力，只能自己发请求扒响应字段：

模型	thinking	默认开	触发	响应字段
`deepseek-v4-pro`	✅	是	`reasoning_effort:high` 加深明显	`reasoning_content`
`deepseek-v4-flash`	✅	是	`reasoning_effort:high` 小幅	`reasoning_content`
`minimax-m2.5`	✅	是	档位 low/medium/high；`none` 关不掉	`reasoning`
`glm-5`	✅	否	必须 `reasoning_effort:high`	`reasoning`
`kimi-k2.5`	✅	否	必须 `reasoning_effort:high`	`reasoning`
`qwen3-coder`（30b/480b/next）	❌	—	全部无效（设计上无思考模式）	—

要点：

reasoning_effort（顶层字段，low/medium/high）才是这些渠道认的开关；Anthropic/Moonshot 风格的 thinking:{type:"enabled"} 被静默丢弃。
minimax-m2.5 只接受 low/medium/high/none，传 minimal/xhigh/max/default 直接 HTTP 400。
DeepSeek 用 reasoning_content，其余用 reasoning，读响应别只盯一个字段。

# 自己探测一发，别信任何二手结论
curl -s http://<newapi-host>:3000/v1/chat/completions \
  -H "Authorization: Bearer sk-***" -H "Content-Type: application/json" \
  -d '{"model":"kimi-k2.5","reasoning_effort":"high",
       "messages":[{"role":"user","content":"1+1=? 简述"}]}' \
  | jq '.choices[0].message | {reasoning, reasoning_content}'

1
2
3
4
5
6

# 3. 源码级真相：Hermes 默认不会给自定义 provider 注入 thinking

这是最深的一个坑，纯配置层看不出来。Hermes 在 run_agent.py 里决定要不要自动注入 reasoning 参数：

_supports_reasoning_extra_body() 只对 nousresearch / github / openrouter / lmstudio 这几条 provider 返回 True；custom:newapi 返回 False → 自动注入逻辑根本不触发。
另一条 is_kimi 快捷路径只匹配 api.kimi.com / moonshot.ai / moonshot.cn 这几个 host；你走自建网关，host 不匹配，也不触发。

结论：在 Hermes 里给一个 custom:newapi 模型开 thinking，唯一的办法是在 custom_providers 里手动塞 extra_body，它会被原样转发给网关（见 hermes_cli/config.py）：

# ~/.hermes/config.yaml
custom_providers:
  - name: NewAPI-kimi-thinking
    base_url: http://<newapi-host>:3000/v1
    api_key: sk-***
    model: kimi-k2.5
    extra_body:
      reasoning_effort: high      # 唯一能让 newapi 模型思考的方式
  - name: NewAPI
    base_url: http://<newapi-host>:3000/v1
    api_key: sk-***
    model: qwen3-coder

1
2
3
4
5
6
7
8
9
10
11
12

别再在 WebUI 上给 qwen3-coder 调 thinking 等级了——它源头上就没有思考模式，调到天荒地老都是 0 输出。

# 4. 辅助模型（`auxiliary` 块）——被忽视的一整块

主模型之外，Hermes 有一整组后台小任务，各自指定模型，用主模型既贵又没必要：

auxiliary:
  vision:           { provider: custom:newapi, model: qwen3-vl-235b, base_url: http://<host>:3000/v1, api_key: sk-*** }
  compression:      { provider: custom:newapi, model: qwen3-coder,   base_url: http://<host>:3000/v1, api_key: sk-*** }
  title_generation: { provider: custom:newapi, model: qwen3-coder,   base_url: http://<host>:3000/v1, api_key: sk-*** }
  skills_hub:       { provider: custom:newapi, model: qwen3-32b,      base_url: http://<host>:3000/v1, api_key: sk-*** }
  curator:          { provider: custom:newapi, model: qwen3-coder,   base_url: http://<host>:3000/v1, api_key: sk-*** }
  approval:         { provider: custom:newapi, model: qwen3-coder,   base_url: http://<host>:3000/v1, api_key: sk-*** }
  web_extract:      { provider: custom:newapi, model: qwen3-coder,   base_url: http://<host>:3000/v1, api_key: sk-*** }
  session_search:   { provider: custom:newapi, model: qwen3-coder,   base_url: http://<host>:3000/v1, api_key: sk-*** }
  # 纯本地/可空 key 的角色用 provider: auto 即可
  triage_specifier:  { provider: auto, model: '', api_key: '' }
  kanban_decomposer: { provider: auto, model: '', api_key: '' }
  profile_describer: { provider: auto, model: '', api_key: '' }
  flush_memories:    { provider: auto, model: '', api_key: '' }

1
2
3
4
5
6
7
8
9
10
11
12
13
14

各角色干什么（按调用频率排）：title_generation 生成会话标题、compression 压缩超长上下文、session_search 语义检索历史会话、skills_hub 技能检索、curator 整理技能、approval 危险命令研判、vision 看图、web_extract 网页正文抽取。这一块配错，表现是「主对话正常但某个功能莫名其妙报错」，极难定位。

# 5. 401 / 503 的源码级根因

修 curator 时连踩这两个，都不在「明面」上：

# 5.1 HTTP 503「无可用渠道」= 模型名和 `/v1/models` 对不上

网关做过一次重命名（点分式 → 连字符式）。主模型名更新了，每个 auxiliary.<task>.model 却漏改，停在死掉的名字上 → 一调就 503：

✅ minimax-m2.5      deepseek-v3.2       qwen3-vl-235b
❌ minimax.minimax-m2.5   deepseek.v3.2   qwen.qwen3-vl-235b-a22b-instruct

1
2

# 以网关返回的 id 为唯一真相，逐个核对 aux 模型名
curl -s http://<newapi-host>:3000/v1/models \
  -H "Authorization: Bearer sk-***" | jq -r '.data[].id' | sort

1
2
3

# 5.2 HTTP 401「Invalid token」= 辅助角色 key 留空

把某个 auxiliary.<task>.api_key 留空会触发 401。根因在源码：空的自定义 provider key 会被替换成本地占位符 "no-key-required"，而真实网关拒收这个占位符 → 401。

规则：只有 provider: auto 的角色才允许空 key；所有走 custom:newapi 的角色都得把 key 填上（直接复用主 key）。

补充：custom:newapi 还有个凭证池会给空 key 回填（日志里能看到 source=pool:custom:newapi），所以手动填 key 是「双保险」而非唯一路径——但别依赖凭证池，显式填最稳。

改完别忘了：所有网关 + 系统级 WebUI 都要重启（10 个 hermes-gateway* user unit + hermes-webui），配置在进程启动时加载。

# 6. 还有几个相关开关

smart_model_routing:        # 简单短消息走廉价模型（默认关）
  enabled: false
  max_simple_chars: 160
  max_simple_words: 28
  cheap_model: {}
model_catalog:              # 在线模型目录，给 UI 选择/能力提示用
  enabled: true
  ttl_hours: 1
compression:                # 上下文压缩触发阈值
  enabled: true
  threshold: 0.75           # 上下文用到 75% 触发压缩
  protect_last_n: 20        # 最近 20 条不压

1
2
3
4
5
6
7
8
9
10
11
12

# 7. 可复现 checklist

拉真实模型名单 curl .../v1/models，作为一切配置的唯一真相。
主模型 + fallback：provider/base_url/key 三件套齐全。
thinking 验证：对每个要用的模型发 reasoning_effort:high 的 curl，确认 reasoning/reasoning_content 哪个出值；自定义 provider 必须用 custom_providers.extra_body 注入（Hermes 不会自动注）。
辅助块对齐：每个 auxiliary.<task> 的 model 对得上 /v1/models，key 非空（除非 provider: auto）。
改完重启全部网关 + WebUI；留 .bak 备份。

下一篇讲技能工程——SKILL.md 怎么写、.curated_manifest.json 怎么当单一真相源、.usage.json 的孤儿陷阱，以及让 agent 每周自己审计技能库的完整脚本。

#AI Agent #Hermes #LLM #模型路由 #配置

上次更新: 6/21/2026

← Hermes Agent 实战 03｜Gateway 运维：systemd、裸进程，和一个 Telegram token 撞车 Hermes Agent 实战 05｜技能工程：写、去重、pin，与每周自我审计→