xxm/opro_demo

Fork 0

Files

xxm dd5339de32 原始代码

2025-12-05 07:11:25 +00:00

7.9 KiB

Raw Blame History

项目 API 文档

本项目提供用于 OPRO 风格提示优化与会话交互的 REST API。所有接口均使用 application/json，无鉴权。示例以默认本地启动地址为例：http://127.0.0.1:8010。

基础路径：/
前端页面：/ui/（三栏界面），/ui/react（React 示例页面），/ui/offline（离线备份页面）
内容类型：Content-Type: application/json

统一响应格式

所有 JSON 接口统一返回以下包装结构：

{
  "code": 0,
  "msg": "ok",
  "data": {}
}

成功：code 固定为 0，msg 为简要说明（默认 ok），业务数据在 data 字段中。
失败：HTTP 状态码保持原值（如 400/404/500），code 同步为该状态码，msg 为错误信息，data 为 null。

错误处理位于 _qwen_xinference_demo/api.py:23-31（异常处理器），成功响应包装器为 _qwen_xinference_demo/api.py:21-22 的 ok()。

健康检查

方法与路径：GET /health
作用：服务可用性检查
响应示例：

{
  "code": 0,
  "msg": "ok",
  "data": { "status": "ok" }
}

模型管理

获取可用模型

方法与路径：GET /models
作用：列出可用于推理的 Ollama 模型（过滤掉 embedding/reranker）
响应示例：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "models": ["qwen3:8b", "qwen3:14b", "qwen3:32b"]
  }
}

设置当前会话模型

方法与路径：POST /set_model
请求体：

{
  "session_id": "<SID>",
  "model_name": "qwen3:8b"
}

成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "session_id": "<SID>",
    "model_name": "qwen3:8b"
  }
}

说明：model_name 必须在 /models 返回列表中；否则返回 400 错误。

会话与候选生成（提示优化）

提示优化由以下流程实现：根据用户问题或最近消息构造“改写/变异”指令 → 调用 Qwen 批量生成候选 → 通过 Xinference（失败回退到 Ollama embedding）做语义向量 → 聚类去重并选取 Top‑K（默认 5）。

首次生成候选（创建会话）

方法与路径：POST /query
请求体（新会话）：

{ "query": "我想买苹果" }

成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "session_id": "<SID>",
    "round": 0,
    "candidates": ["...", "..."]
  }
}

说明：
- 新建会话并记录用户原始问题与首轮候选；round 会在候选入库后加 1。

继续优化（基于最近消息再生候选）

方法与路径：POST /query_from_message
请求体：

{ "session_id": "<SID>" }

成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "session_id": "<SID>",
    "round": 1,
    "candidates": ["...", "..."]
  }
}

说明：
- 从会话的最近一条“用户消息”或原始问题作为基线生成新候选。

选择候选并回答

方法与路径：POST /select
请求体：

{
  "session_id": "<SID>",
  "choice": "选中的提示词"
}

成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "prompt": "选中的提示词",
    "answer": "模型回答内容"
  }
}

说明：
- 将 choice 记录为当前会话的 selected_prompt，并用该提示词生成回答。
- 会把用户选择与回答追加到 outputs/user_feedback.jsonl。

拒绝候选并再生成

方法与路径：POST /reject
请求体：

{
  "session_id": "<SID>",
  "candidate": "不合适的候选",
  "reason": "可选的拒绝理由"
}

成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "session_id": "<SID>",
    "round": 2,
    "candidates": ["...", "..."]
  }
}

说明：
- 将被拒绝的候选加入会话历史，生成新一轮候选以“避撞并多样化”。

直接回答 + 候选（可选流程）

方法与路径：POST /answer
请求体：

{ "query": "我想买苹果" }

成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "session_id": "<SID>",
    "answer": "直接回答内容",
    "candidates": ["...", "..."]
  }
}

说明：
- 先对用户问题直接回答，再生成提示优化候选。该路由默认使用后端配置的模型。

再次生成（旧接口，含 MAX_ROUNDS）

方法与路径：POST /next
请求体：

{ "session_id": "<SID>" }

成功响应（达到最大轮次时）：

{
  "code": 0,
  "msg": "ok",
  "data": { "final": true, "answer": "最终回答" }
}

成功响应（未达到最大轮次时）：

{
  "code": 0,
  "msg": "ok",
  "data": { "session_id": "<SID>", "round": 1, "candidates": ["...", "..."] }
}

说明：
- MAX_ROUNDS 当前为 3，仅对该路由有效；前端默认不使用此路由。

会话聊天

发送消息并获取回答

方法与路径：POST /message
请求体：

{
  "session_id": "<SID>",
  "message": "继续提问或补充说明"
}

成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "session_id": "<SID>",
    "answer": "模型回答",
    "history": [
      {"role": "user", "content": "..."},
      {"role": "assistant", "content": "..."}
    ]
  }
}

说明：
- 回答会在已选提示词（如无则原始问题）基础上拼接本次消息生成。

会话管理

列出会话

方法与路径：GET /sessions
成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "sessions": [
      {
        "session_id": "<SID>",
        "round": 2,
        "selected_prompt": "...",
        "original_query": "我想买苹果"
      }
    ]
  }
}

会话详情

方法与路径：GET /session/{sid}
成功响应：

{
  "code": 0,
  "msg": "ok",
  "data": {
    "session_id": "<SID>",
    "round": 2,
    "original_query": "我想买苹果",
    "selected_prompt": "...",
    "candidates": ["...", "..."],
    "user_feedback": [{"round": 1, "choice": "..."}],
    "rejected": ["...", "..."],
    "history": [
      {"role": "user", "content": "..."},
      {"role": "assistant", "content": "..."}
    ]
  }
}

静态页面与重定向

GET / → 重定向到 /ui/
GET /ui/ → 前端三栏页面（由后端挂载静态目录 frontend）
GET /ui/react → React 版本示例页面
GET /ui/offline → 离线页面（无 CDN 依赖）
GET /react → 与 /ui/react 等价的页面入口

错误码与通用返回

错误包装：
- HTTP 404：{"code": 404, "msg": "session not found", "data": null}
- HTTP 400：{"code": 400, "msg": "model not available: <name>"|"ollama error: <message>", "data": null}
- HTTP 500：{"code": 500, "msg": "internal error", "data": null}

调用示例（curl）

# 创建会话并生成首轮候选
curl -X POST http://127.0.0.1:8010/query \
  -H 'Content-Type: application/json' \
  -d '{"query": "我想买苹果"}'

# 选择某个候选并回答
curl -X POST http://127.0.0.1:8010/select \
  -H 'Content-Type: application/json' \
  -d '{"session_id": "<SID>", "choice": "选中的提示词"}'

# 拒绝某个候选并再生成
curl -X POST http://127.0.0.1:8010/reject \
  -H 'Content-Type: application/json' \
  -d '{"session_id": "<SID>", "candidate": "不合适的候选", "reason": "太笼统"}'

# 基于最近消息继续优化
curl -X POST http://127.0.0.1:8010/query_from_message \
  -H 'Content-Type: application/json' \
  -d '{"session_id": "<SID>"}'

# 普通聊天
curl -X POST http://127.0.0.1:8010/message \
  -H 'Content-Type: application/json' \
  -d '{"session_id": "<SID>", "message": "有无更甜的品种？"}'

# 获取会话详情
curl http://127.0.0.1:8010/session/<SID>

备注

候选 Top‑K 默认 5，聚类阈值默认 0.15。
向量优先使用 Xinference（http://127.0.0.1:9997/...），失败自动回退到 Ollama embedding（qwen3-embedding:4b）。
回答默认使用 Ollama 中的 qwen3:8b，或通过 /set_model 设置当前会话模型。

7.9 KiB Raw Blame History Unescape Escape