Version: Next

架构7. 通过工具调用联系人类

默认情况下，LLM API 依赖于一个根本性的高风险 Token 选择：我们是返回纯文本内容，还是返回结构化数据？

170-contact-humans-with-tools

You're putting a lot of weight on that choice of first token, which, in the the weather in tokyo case, is

你将很大的权重压在了第一个 Token 的选择上，在 the weather in tokyo（东京的天气）这种情况下，它是

"the"（"东京"）

but in the fetch_weather case, it's some special token to denote the start of a JSON object.

但在 fetch_weather（获取天气）这种情况下，它是某个特殊 Token 来表示 JSON 对象的开始。

|JSON>

通过让 LLM 始终输出 JSON，然后使用一些自然语言 Token（如 request_human_input（请求人工输入）或 done_for_now（暂时完成））来声明其意图（而不是像 check_weather_in_city（检查城市天气）这样的“正式”工具），你可能会获得更好的结果。

再次强调，这样做可能不会带来任何性能提升，但你应该进行实验，并确保你可以自由尝试各种奇特的方法以获得最佳结果。

之后，你可能会从一个处理 Slack、电子邮件、短信或其他事件的系统收到一个 webhook。

@app.post('/webhook')
def webhook(req: Request):
  thread_id = req.body.threadId
  thread = await load_state(thread_id)
  thread.events.push({
    type: 'response_from_human',
    data: req.body
  })
  # ... simplified for brevity, you likely don't want to block the web worker here
  next_step = await determine_next_step(thread_to_prompt(thread))
  thread.events.append(next_step)
  result = await handle_next_step(thread, next_step)
  # todo - loop or break or whatever you want

  return {"status": "ok"}

以上包含了来自架构 5 - 统一执行状态和业务状态、架构 8 - 掌握你的控制流、架构 3 - 掌握你的上下文窗口和架构 4 - 工具只是结构化输出以及其他几个架构的模式。

如果我们使用来自架构 3 - 掌握你的上下文窗口的类 XML 格式，经过几轮交互后，我们的上下文窗口可能如下所示：

(snipped for brevity)

<slack_message>
    From: @alex
    Channel: #deployments
    Text: Can you deploy backend v1.2.3 to production?
    Thread: []
</slack_message>

<request_human_input>
    intent: "request_human_input"
    question: "Would you like to proceed with deploying v1.2.3 to production?"
    context: "This is a production deployment that will affect live users."
    options: {
        urgency: "high"
        format: "yes_no"
    }
</request_human_input>

<human_response>
    response: "yes please proceed"
    approved: true
    timestamp: "2024-03-15T10:30:00Z"
    user: "alex@company.com"
</human_response>

<deploy_backend>
    intent: "deploy_backend"
    tag: "v1.2.3"
    environment: "production"
</deploy_backend>

<deploy_backend_result>
    status: "success"
    message: "Deployment v1.2.3 to production completed successfully."
    timestamp: "2024-03-15T10:30:00Z"
</deploy_backend_result>

好处：

清晰的指令 ：用于不同类型人工联系的工具允许 LLM 提供更具体的指令。
内部与外部循环 ：支持传统 chatGPT 式界面之外的代理工作流，其中的控制流和上下文初始化可能是 代理->人 而不是 人->代理（想象一下，由 cron 或事件触发的代理）。
多人员访问 ：可以通过结构化事件轻松跟踪和协调来自不同人员的输入。
多代理 ：简单的抽象可以轻松扩展以支持 代理->代理 的请求和响应。
持久性 ：与架构 6 - 通过简单 API 启动/暂停/恢复结合，这构成了持久、可靠且可内省的多参与方工作流。

175-outer-loop-agents

与架构 11 - 从任何地方触发，在用户所在之处与之会面完美配合。

← 通过简单的 API 实现启动/暂停/恢复 | 拥有你的控制流 →