feat: implement true OPRO with Gemini-style UI

- Add true OPRO system instruction optimization (vs query rewriting) - Implement iterative optimization with performance trajectory - Add new OPRO API endpoints (/opro/create, /opro/generate_and_evaluate, /opro/execute) - Create modern Gemini-style chat UI (frontend/opro.html) - Optimize performance: reduce candidates from 20 to 10 (2x faster) - Add model selector in UI toolbar - Add collapsible sidebar with session management - Add copy button for instructions - Ensure all generated prompts use simplified Chinese - Update README with comprehensive documentation - Add .gitignore for local_docs folder
2025-12-06 17:24:28 +08:00
parent 8f52fad41c
commit 1376d60ed5
10 changed files with 1817 additions and 13 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -147,6 +147,7 @@ cython_debug/
 outputs/
 *.jsonl
 *.log
 local_docs/
 # Node modules (if any frontend dependencies)
 node_modules/
--- a/README.md
+++ b/README.md
@@ -1,3 +1,218 @@
 # OPRO Prompt Optimizer
 ## 功能概述
 OPRO (Optimization by PROmpting) 是一个基于大语言模型的提示词优化系统。本项目实现了真正的 OPRO 算法，通过迭代优化系统指令（System Instructions）来提升 LLM 在特定任务上的性能。
 ### 核心功能
 - **系统指令优化**：使用 LLM 作为优化器，基于历史性能轨迹生成更优的系统指令
 - **多轮迭代优化**：支持多轮优化，每轮基于前一轮的性能反馈生成新的候选指令
 - **智能候选选择**：通过语义聚类和多样性选择，从大量候选中筛选出最具代表性的指令
 - **性能评估**：支持自定义测试用例对系统指令进行自动评估
 - **会话管理**：支持多个优化任务的并行管理和历史记录
 ### 用户界面
 - **现代化聊天界面**：类似 Google Gemini 的简洁设计
 - **侧边栏会话管理**：可折叠的侧边栏，支持多会话切换
 - **实时优化反馈**：每轮优化生成 3-5 个候选指令，用户可选择继续优化或执行
 - **模型选择**：支持在界面中选择不同的 LLM 模型
 ## 主要优化改进
 ### 1. 真正的 OPRO 实现
 原始代码实现的是查询重写（Query Rewriting），而非真正的 OPRO。我们添加了完整的 OPRO 功能：
 - **系统指令生成**：`generate_system_instruction_candidates()` - 生成多样化的系统指令候选
 - **性能评估**：`evaluate_system_instruction()` - 基于测试用例评估指令性能
 - **轨迹优化**：基于历史 (instruction, score) 轨迹生成更优指令
 - **元提示工程**：专门设计的元提示用于指导 LLM 生成和优化系统指令
 ### 2. 性能优化
 - **候选池大小优化**：从 20 个候选减少到 10 个，速度提升约 2 倍
 - **智能聚类选择**：使用 AgglomerativeClustering 从候选池中选择最具多样性的 Top-K
 - **嵌入服务回退**：Xinference → Ollama 自动回退机制，确保服务可用性
 ### 3. API 架构改进
 - **新增 OPRO 端点**：
  - `POST /opro/create` - 创建 OPRO 优化任务
  - `POST /opro/generate_and_evaluate` - 生成并自动评估候选
  - `POST /opro/execute` - 执行系统指令
  - `GET /opro/runs` - 获取所有优化任务
  - `GET /opro/run/{run_id}` - 获取特定任务详情
 - **会话状态管理**：完整的 OPRO 运行状态跟踪（轨迹、测试用例、迭代次数）
 - **向后兼容**：保留原有查询重写功能，标记为 `opro-legacy`
 ### 4. 前端界面重构
 - **Gemini 风格设计**：简洁的白色/灰色配色，圆角设计，微妙的阴影效果
 - **可折叠侧边栏**：默认折叠，支持会话列表管理
 - **多行输入框**：支持多行文本输入，底部工具栏包含模型选择器
 - **候选指令卡片**：每个候选显示编号、内容、分数，提供"继续优化"、"复制"、"执行"按钮
 - **简体中文界面**：所有 UI 文本和生成的指令均使用简体中文
 ## 快速开始
 ### 环境要求
 - **Python** ≥ 3.10（推荐使用 conda 虚拟环境）
 - **Ollama** 本地服务及模型（如 `qwen3:8b`、`qwen3-embedding:4b`）
 - **可选**：Xinference embedding 服务
 ### 安装依赖
 ```bash
 # 创建 conda 环境（推荐）
 conda create -n opro python=3.10
 conda activate opro
 # 安装 Python 依赖
 pip install fastapi uvicorn requests numpy scikit-learn pydantic
 ```
 ### 启动 Ollama 服务
 ```bash
 # 确保 Ollama 已安装并运行
 ollama serve
 # 拉取所需模型
 ollama pull qwen3:8b
 ollama pull qwen3-embedding:4b
 ```
 ### 启动应用
 ```bash
 # 启动后端服务
 uvicorn _qwen_xinference_demo.api:app --host 127.0.0.1 --port 8010
 # 或使用 0.0.0.0 允许外部访问
 uvicorn _qwen_xinference_demo.api:app --host 0.0.0.0 --port 8010
 ```
 ### 访问界面
 - **OPRO 优化界面**：http://127.0.0.1:8010/ui/opro.html
 - **传统三栏界面**：http://127.0.0.1:8010/ui/
 - **API 文档**：http://127.0.0.1:8010/docs
 - **OpenAPI JSON**：http://127.0.0.1:8010/openapi.json
 ### 使用示例
 1. **创建新会话**：在 OPRO 界面点击"新建会话"或侧边栏的 + 按钮
 2. **输入任务描述**：例如"将中文翻译成英文"
 3. **查看候选指令**：系统生成 3-5 个优化的系统指令
 4. **继续优化**：点击"继续优化"进行下一轮迭代
 5. **执行指令**：点击"执行此指令"测试指令效果
 6. **复制指令**：点击"复制"按钮将指令复制到剪贴板
 ## 配置说明
 配置文件：`config.py`
 ### 关键配置项
 ```python
 # Ollama 服务配置
 OLLAMA_HOST = "http://127.0.0.1:11434"
 DEFAULT_CHAT_MODEL = "qwen3:8b"
 DEFAULT_EMBED_MODEL = "qwen3-embedding:4b"
 # OPRO 优化参数
 GENERATION_POOL_SIZE = 10  # 生成候选池大小
 TOP_K = 5                   # 返回给用户的候选数量
 CLUSTER_DISTANCE_THRESHOLD = 0.15  # 聚类距离阈值
 # Xinference 配置（可选）
 XINFERENCE_EMBED_URL = "http://127.0.0.1:9997/models/bge-base-zh/embed"
 ```
 ## 项目结构
 ```
 .
 ├── _qwen_xinference_demo/
 │   ├── api.py                          # FastAPI 主应用
 │   └── opro/
 │       ├── user_prompt_optimizer.py    # OPRO 核心逻辑
 │       ├── prompt_utils.py             # 元提示生成
 │       ├── session_state.py            # 会话状态管理
 │       ├── ollama_client.py            # Ollama 客户端
 │       └── xinference_client.py        # Xinference 客户端
 ├── frontend/
 │   ├── opro.html                       # OPRO 优化界面
 │   └── index.html                      # 传统三栏界面
 ├── examples/
 │   ├── opro_demo.py                    # OPRO 功能演示
 │   └── client_demo.py                  # API 调用示例
 ├── config.py                           # 全局配置
 ├── API.md                              # API 文档
 └── README.md                           # 本文件
 ```
 ## API 端点
 ### OPRO 相关（推荐使用）
 - `POST /opro/create` - 创建优化任务
 - `POST /opro/generate_and_evaluate` - 生成并评估候选
 - `POST /opro/execute` - 执行系统指令
 - `GET /opro/runs` - 获取所有任务
 - `GET /opro/run/{run_id}` - 获取任务详情
 ### 传统端点（向后兼容）
 - `POST /query` - 查询重写（首轮）
 - `POST /select` - 选择候选并回答
 - `POST /reject` - 拒绝并重新生成
 - `POST /message` - 聊天消息
 ### 通用端点
 - `GET /health` - 健康检查
 - `GET /version` - 版本信息
 - `GET /models` - 可用模型列表
 - `POST /set_model` - 设置模型
 详细 API 文档请访问：http://127.0.0.1:8010/docs
 ## 常见问题
 ### 1. 无法连接 Ollama 服务
 确保 Ollama 服务正在运行：
 ```bash
 ollama serve
 ```
 检查配置文件中的 `OLLAMA_HOST` 是否正确。
 ### 2. 模型不可用
 通过 `/models` 端点查看可用模型列表，使用 `/set_model` 切换模型。
 ### 3. 生成速度慢
 - 调整 `GENERATION_POOL_SIZE` 减少候选数量
 - 使用更小的模型（如 `qwen3:4b`）
 - 确保 Ollama 使用 GPU 加速
 ### 4. 界面显示异常
 硬刷新浏览器缓存：
 - **Mac**: `Cmd + Shift + R`
 - **Windows/Linux**: `Ctrl + Shift + R`
 ---
 <details>
 <summary><b>原始 README（点击展开）</b></summary>
 - 项目简介
 - OPRO Prompt Optimizer：面向提示优化的交互式系统，支持多轮拒选/再生成、语义聚类去重与 Top‑K 代表选择。
@@ -65,3 +280,5 @@
 - 第二轮无相关候选：使用 POST /query_from_message 基于最近消息再生候选 _qwen_xinference_demo/api.py:193-206
 - 立即回答诉求：用 POST /answer 先答后给候选 _qwen_xinference_demo/api.py:211-219
 - 端口与地址访问差异：在启动命令中明确 --host 0.0.0.0 --port 8010 ，本地浏览器建议访问 127.0.0.1
 </details>
--- a/_qwen_xinference_demo/api.py
+++ b/_qwen_xinference_demo/api.py
@@ -2,14 +2,30 @@ from fastapi import FastAPI, HTTPException, Request
 from fastapi.responses import RedirectResponse, FileResponse, JSONResponse
 from fastapi.staticfiles import StaticFiles
 from pydantic import BaseModel
 from typing import List, Tuple, Optional
 import config
 # Legacy session management (query rewriting)
 from .opro.session_state import create_session, get_session, update_session_add_candidates, log_user_choice
 from .opro.session_state import log_user_reject
 from .opro.session_state import set_selected_prompt, log_chat_message
 from .opro.session_state import set_session_model
 from .opro.session_state import USER_FEEDBACK_LOG
 # True OPRO session management
 from .opro.session_state import (
    create_opro_run, get_opro_run, update_opro_iteration,
    add_opro_evaluation, get_opro_trajectory, set_opro_test_cases,
    complete_opro_run, list_opro_runs
 )
 # Optimization functions
 from .opro.user_prompt_optimizer import generate_candidates
 from .opro.user_prompt_optimizer import (
    generate_system_instruction_candidates,
    evaluate_system_instruction
 )
 from .opro.ollama_client import call_qwen
 from .opro.ollama_client import list_models
@@ -23,8 +39,9 @@ app = FastAPI(
    openapi_tags=[
        {"name": "health", "description": "健康检查"},
        {"name": "models", "description": "模型列表与设置"},
-        {"name": "sessions", "description": "会话管理"},
+        {"name": "sessions", "description": "会话管理（旧版查询重写）"},
-        {"name": "opro", "description": "提示优化候选生成与选择/拒绝"},
+        {"name": "opro-legacy", "description": "旧版提示优化（查询重写）"},
        {"name": "opro-true", "description": "真正的OPRO（系统指令优化）"},
        {"name": "chat", "description": "会话聊天"},
        {"name": "ui", "description": "静态页面"}
    ]
@@ -89,14 +106,69 @@ class SetModelReq(BaseModel):
    session_id: str
    model_name: str
-@app.post("/start", tags=["opro"])
+
 # ============================================================================
 # TRUE OPRO REQUEST MODELS
 # ============================================================================
 class TestCase(BaseModel):
    """A single test case for OPRO evaluation."""
    input: str
    expected_output: str
 class CreateOPRORunReq(BaseModel):
    """Request to create a new OPRO optimization run."""
    task_description: str
    test_cases: Optional[List[TestCase]] = None
    model_name: Optional[str] = None
 class OPROIterateReq(BaseModel):
    """Request to run one OPRO iteration."""
    run_id: str
    top_k: Optional[int] = None
 class OPROEvaluateReq(BaseModel):
    """Request to evaluate a system instruction."""
    run_id: str
    instruction: str
 class OPROAddTestCasesReq(BaseModel):
    """Request to add test cases to an OPRO run."""
    run_id: str
    test_cases: List[TestCase]
 class OPROGenerateAndEvaluateReq(BaseModel):
    """Request to generate and auto-evaluate candidates (for chat-like UX)."""
    run_id: str
    top_k: Optional[int] = None
    pool_size: Optional[int] = None
    auto_evaluate: Optional[bool] = True  # If False, use diversity-based selection only
 class OPROExecuteReq(BaseModel):
    """Request to execute a system instruction with user input."""
    instruction: str
    user_input: str
    model_name: Optional[str] = None
 # ============================================================================
 # LEGACY ENDPOINTS (Query Rewriting - NOT true OPRO)
 # ============================================================================
@app.post("/start", tags=["opro-legacy"])
 def start(req: StartReq):
    sid = create_session(req.query)
    cands = generate_candidates(req.query, [], model_name=get_session(sid).get("model_name"))
    update_session_add_candidates(sid, cands)
    return ok({"session_id": sid, "round": 0, "candidates": cands})
-@app.post("/next", tags=["opro"])
+@app.post("/next", tags=["opro-legacy"])
 def next_round(req: NextReq):
    s = get_session(req.session_id)
    if not s:
@@ -110,7 +182,7 @@ def next_round(req: NextReq):
    update_session_add_candidates(req.session_id, cands)
    return ok({"session_id": req.session_id, "round": s["round"], "candidates": cands})
-@app.post("/select", tags=["opro"])
+@app.post("/select", tags=["opro-legacy"])
 def select(req: SelectReq):
    s = get_session(req.session_id)
    if not s:
@@ -138,7 +210,7 @@ def select(req: SelectReq):
        pass
    return ok({"prompt": req.choice, "answer": ans})
-@app.post("/reject", tags=["opro"])
+@app.post("/reject", tags=["opro-legacy"])
 def reject(req: RejectReq):
    s = get_session(req.session_id)
    if not s:
@@ -151,7 +223,7 @@ class QueryReq(BaseModel):
    query: str
    session_id: str | None = None
-@app.post("/query", tags=["opro"])
+@app.post("/query", tags=["opro-legacy"])
 def query(req: QueryReq):
    if req.session_id:
        s = get_session(req.session_id)
@@ -240,7 +312,7 @@ def message(req: MessageReq):
 class QueryFromMsgReq(BaseModel):
    session_id: str
-@app.post("/query_from_message", tags=["opro"])
+@app.post("/query_from_message", tags=["opro-legacy"])
 def query_from_message(req: QueryFromMsgReq):
    s = get_session(req.session_id)
    if not s:
@@ -258,7 +330,7 @@ def query_from_message(req: QueryFromMsgReq):
 class AnswerReq(BaseModel):
    query: str
-@app.post("/answer", tags=["opro"])
+@app.post("/answer", tags=["opro-legacy"])
 def answer(req: AnswerReq):
    sid = create_session(req.query)
    log_chat_message(sid, "user", req.query)
@@ -282,3 +354,287 @@ def set_model(req: SetModelReq):
        raise AppException(400, f"model not available: {req.model_name}", "MODEL_NOT_AVAILABLE")
    set_session_model(req.session_id, req.model_name)
    return ok({"session_id": req.session_id, "model_name": req.model_name})
 # ============================================================================
 # TRUE OPRO ENDPOINTS (System Instruction Optimization)
 # ============================================================================
@app.post("/opro/create", tags=["opro-true"])
 def opro_create_run(req: CreateOPRORunReq):
    """
    Create a new OPRO optimization run.
    This starts a new system instruction optimization process for a given task.
    """
    # Convert test cases from Pydantic models to tuples
    test_cases = None
    if req.test_cases:
        test_cases = [(tc.input, tc.expected_output) for tc in req.test_cases]
    run_id = create_opro_run(
        task_description=req.task_description,
        test_cases=test_cases,
        model_name=req.model_name
    )
    run = get_opro_run(run_id)
    return ok({
        "run_id": run_id,
        "task_description": run["task_description"],
        "num_test_cases": len(run["test_cases"]),
        "iteration": run["iteration"],
        "status": run["status"]
    })
@app.post("/opro/iterate", tags=["opro-true"])
 def opro_iterate(req: OPROIterateReq):
    """
    Run one OPRO iteration: generate new system instruction candidates.
    This generates optimized system instructions based on the performance trajectory.
    """
    run = get_opro_run(req.run_id)
    if not run:
        raise AppException(404, "OPRO run not found", "RUN_NOT_FOUND")
    # Get trajectory for optimization
    trajectory = get_opro_trajectory(req.run_id)
    # Generate candidates
    top_k = req.top_k or config.TOP_K
    try:
        candidates = generate_system_instruction_candidates(
            task_description=run["task_description"],
            trajectory=trajectory if trajectory else None,
            top_k=top_k,
            model_name=run["model_name"]
        )
    except Exception as e:
        raise AppException(500, f"Failed to generate candidates: {e}", "GENERATION_ERROR")
    # Update run with new candidates
    update_opro_iteration(req.run_id, candidates)
    return ok({
        "run_id": req.run_id,
        "iteration": run["iteration"] + 1,
        "candidates": candidates,
        "num_candidates": len(candidates),
        "best_score": run["best_score"]
    })
@app.post("/opro/evaluate", tags=["opro-true"])
 def opro_evaluate(req: OPROEvaluateReq):
    """
    Evaluate a system instruction on the test cases.
    This scores the instruction and updates the performance trajectory.
    """
    run = get_opro_run(req.run_id)
    if not run:
        raise AppException(404, "OPRO run not found", "RUN_NOT_FOUND")
    if not run["test_cases"]:
        raise AppException(400, "No test cases defined for this run", "NO_TEST_CASES")
    # Evaluate the instruction
    try:
        score = evaluate_system_instruction(
            system_instruction=req.instruction,
            test_cases=run["test_cases"],
            model_name=run["model_name"]
        )
    except Exception as e:
        raise AppException(500, f"Evaluation failed: {e}", "EVALUATION_ERROR")
    # Add to trajectory
    add_opro_evaluation(req.run_id, req.instruction, score)
    # Get updated run info
    run = get_opro_run(req.run_id)
    return ok({
        "run_id": req.run_id,
        "instruction": req.instruction,
        "score": score,
        "best_score": run["best_score"],
        "is_new_best": score == run["best_score"] and score > 0
    })
@app.get("/opro/runs", tags=["opro-true"])
 def opro_list_runs():
    """
    List all OPRO optimization runs.
    """
    runs = list_opro_runs()
    return ok({"runs": runs, "total": len(runs)})
@app.get("/opro/run/{run_id}", tags=["opro-true"])
 def opro_get_run(run_id: str):
    """
    Get detailed information about an OPRO run.
    """
    run = get_opro_run(run_id)
    if not run:
        raise AppException(404, "OPRO run not found", "RUN_NOT_FOUND")
    # Get sorted trajectory
    trajectory = get_opro_trajectory(run_id)
    return ok({
        "run_id": run_id,
        "task_description": run["task_description"],
        "iteration": run["iteration"],
        "status": run["status"],
        "best_score": run["best_score"],
        "best_instruction": run["best_instruction"],
        "num_test_cases": len(run["test_cases"]),
        "test_cases": [{"input": tc[0], "expected_output": tc[1]} for tc in run["test_cases"]],
        "trajectory": [{"instruction": inst, "score": score} for inst, score in trajectory[:10]],  # Top 10
        "current_candidates": run["current_candidates"]
    })
@app.post("/opro/test_cases", tags=["opro-true"])
 def opro_add_test_cases(req: OPROAddTestCasesReq):
    """
    Add or update test cases for an OPRO run.
    """
    run = get_opro_run(req.run_id)
    if not run:
        raise AppException(404, "OPRO run not found", "RUN_NOT_FOUND")
    # Convert test cases
    test_cases = [(tc.input, tc.expected_output) for tc in req.test_cases]
    # Update test cases
    set_opro_test_cases(req.run_id, test_cases)
    return ok({
        "run_id": req.run_id,
        "num_test_cases": len(test_cases),
        "test_cases": [{"input": tc[0], "expected_output": tc[1]} for tc in test_cases]
    })
@app.post("/opro/generate_and_evaluate", tags=["opro-true"])
 def opro_generate_and_evaluate(req: OPROGenerateAndEvaluateReq):
    """
    Generate candidates and auto-evaluate them (for chat-like UX).
    This is the main endpoint for the chat interface. It:
    1. Generates candidates based on trajectory
    2. Auto-evaluates them (if test cases exist and auto_evaluate=True)
    3. Returns top-k sorted by score (or diversity if no evaluation)
    """
    run = get_opro_run(req.run_id)
    if not run:
        raise AppException(404, "OPRO run not found", "RUN_NOT_FOUND")
    top_k = req.top_k or config.TOP_K
    pool_size = req.pool_size or config.GENERATION_POOL_SIZE
    # Get trajectory for optimization
    trajectory = get_opro_trajectory(req.run_id)
    # Generate candidates
    try:
        candidates = generate_system_instruction_candidates(
            task_description=run["task_description"],
            trajectory=trajectory if trajectory else None,
            top_k=pool_size,  # Generate pool_size candidates first
            pool_size=pool_size,
            model_name=run["model_name"]
        )
    except Exception as e:
        raise AppException(500, f"Failed to generate candidates: {e}", "GENERATION_ERROR")
    # Decide whether to evaluate
    should_evaluate = req.auto_evaluate and len(run["test_cases"]) > 0
    if should_evaluate:
        # Auto-evaluate all candidates
        scored_candidates = []
        for candidate in candidates:
            try:
                score = evaluate_system_instruction(
                    system_instruction=candidate,
                    test_cases=run["test_cases"],
                    model_name=run["model_name"]
                )
                scored_candidates.append({"instruction": candidate, "score": score})
                # Add to trajectory
                add_opro_evaluation(req.run_id, candidate, score)
            except Exception as e:
                # If evaluation fails, assign score 0
                scored_candidates.append({"instruction": candidate, "score": 0.0})
        # Sort by score (highest first)
        scored_candidates.sort(key=lambda x: x["score"], reverse=True)
        # Return top-k
        top_candidates = scored_candidates[:top_k]
        # Update iteration
        update_opro_iteration(req.run_id, [c["instruction"] for c in top_candidates])
        return ok({
            "run_id": req.run_id,
            "candidates": top_candidates,
            "iteration": run["iteration"] + 1,
            "evaluated": True,
            "best_score": run["best_score"]
        })
    else:
        # No evaluation - use diversity-based selection (already done by clustering)
        # Just return the candidates without scores
        top_candidates = [
            {"instruction": candidate, "score": None}
            for candidate in candidates[:top_k]
        ]
        # Update iteration
        update_opro_iteration(req.run_id, [c["instruction"] for c in top_candidates])
        return ok({
            "run_id": req.run_id,
            "candidates": top_candidates,
            "iteration": run["iteration"] + 1,
            "evaluated": False,
            "best_score": run["best_score"]
        })
@app.post("/opro/execute", tags=["opro-true"])
 def opro_execute(req: OPROExecuteReq):
    """
    Execute a system instruction with user input.
    This uses the selected instruction as a system prompt and calls the LLM.
    """
    try:
        # Construct full prompt with system instruction
        full_prompt = f"{req.instruction}\n\n{req.user_input}"
        # Call LLM
        response = call_qwen(
            full_prompt,
            temperature=0.2,
            max_tokens=1024,
            model_name=req.model_name
        )
        return ok({
            "instruction": req.instruction,
            "user_input": req.user_input,
            "response": response
        })
    except Exception as e:
        raise AppException(500, f"Execution failed: {e}", "EXECUTION_ERROR")
--- a/_qwen_xinference_demo/opro/prompt_utils.py
+++ b/_qwen_xinference_demo/opro/prompt_utils.py
@@ -1,4 +1,14 @@
 from typing import List, Tuple
 # ============================================================================
 # OLD FUNCTIONS (Query Rewriting - NOT true OPRO, kept for compatibility)
 # ============================================================================
 def refine_instruction(query: str) -> str:
    """
    LEGACY: Generates query rewrites (NOT true OPRO).
    This is query expansion, not system instruction optimization.
    """
    return f"""
 你是一个“问题澄清与重写助手”。
 请根据用户的原始问题：
@@ -7,6 +17,9 @@ def refine_instruction(query: str) -> str:
 """
 def refine_instruction_with_history(query: str, rejected_list: list) -> str:
    """
    LEGACY: Generates query rewrites with rejection history (NOT true OPRO).
    """
    rejected_text = "\n".join(f"- {r}" for r in rejected_list) if rejected_list else ""
    return f"""
 你是一个“问题澄清与重写助手”。
@@ -18,3 +31,100 @@ def refine_instruction_with_history(query: str, rejected_list: list) -> str:
 请从新的角度重新生成至少20条不同的改写问题，每条单独一行。
 """
 # ============================================================================
 # TRUE OPRO FUNCTIONS (System Instruction Optimization)
 # ============================================================================
 def generate_initial_system_instruction_candidates(task_description: str, pool_size: int = None) -> str:
    """
    TRUE OPRO: Generates initial candidate System Instructions for a new OPRO run.
    Args:
        task_description: Description of the task the LLM should perform
        pool_size: Number of candidates to generate (defaults to config.GENERATION_POOL_SIZE)
    Returns:
        Meta-prompt that instructs the optimizer LLM to generate system instruction candidates
    """
    import config
    pool_size = pool_size or config.GENERATION_POOL_SIZE
    return f"""
 你是一个"系统指令生成助手"。
 目标任务描述：
 【{task_description}】
 请根据以上任务，生成 {pool_size} 条高质量、风格各异的"System Instruction"候选指令。
 要求：
 1. 每条指令必须有明显不同的风格和侧重点
 2. 覆盖不同的实现策略（例如：简洁型、详细型、示例型、角色扮演型、步骤型等）
 3. 这些指令应指导LLM的行为和输出格式，以最大化任务性能
 4. 每条指令单独成行，不包含编号或额外说明
 5. 所有生成的指令必须使用简体中文
 生成 {pool_size} 条指令：
 """
 def generate_optimized_system_instruction(
    task_description: str,
    trajectory: List[Tuple[str, float]],
    pool_size: int = None
 ) -> str:
    """
    TRUE OPRO: Analyzes performance trajectory and generates optimized System Instructions.
    This is the core OPRO function that uses an LLM as an optimizer to improve
    system instructions based on historical performance scores.
    Args:
        task_description: Description of the task the LLM should perform
        trajectory: List of (instruction, score) tuples, sorted by score (highest first)
        pool_size: Number of candidates to generate (defaults to config.GENERATION_POOL_SIZE)
    Returns:
        Meta-prompt that instructs the optimizer LLM to generate better system instructions
    """
    import config
    pool_size = pool_size or config.GENERATION_POOL_SIZE
    if not trajectory:
        # If no trajectory, fall back to initial generation
        return generate_initial_system_instruction_candidates(task_description, pool_size)
    # Format the trajectory for the Optimizer LLM
    formatted_history = "\n".join(
        f"--- Instruction Score: {score:.4f}\n{instruction}"
        for instruction, score in trajectory
    )
    # Determine the current highest score to set the optimization goal
    highest_score = max(score for _, score in trajectory)
    # Construct the Meta-Prompt (The OPRO Instruction)
    return f"""
 你是一个"System Prompt 优化器"。
 你的任务是改进一个LLM的系统指令，以最大化其在以下任务中的性能：
 【{task_description}】
 ---
 **历史性能轨迹 (Instructions and Scores):**
 {formatted_history}
 ---
 **当前最高得分: {highest_score:.4f}**
 请分析得分最高的指令的特点和得分最低指令的缺陷。
 然后，生成 {pool_size} 条新的、有潜力超越 {highest_score:.4f} 分的System Instruction。
 要求：
 1. 每条指令必须有明显不同的改进策略
 2. 结合高分指令的优点，避免低分指令的缺陷
 3. 探索新的优化方向和表达方式
 4. 每条指令单独成行，不包含编号或额外说明
 5. 所有生成的指令必须使用简体中文
 生成 {pool_size} 条优化后的指令：
 """
--- a/_qwen_xinference_demo/opro/session_state.py
+++ b/_qwen_xinference_demo/opro/session_state.py
@@ -1,8 +1,14 @@
 import uuid
 from typing import List, Tuple, Dict, Any
 # Legacy session storage (for query rewriting)
 SESSIONS = {}
 USER_FEEDBACK_LOG = []
 # OPRO session storage (for system instruction optimization)
 OPRO_RUNS = {}
 OPRO_RUN_LOG = []
 def create_session(query: str) -> str:
    sid = uuid.uuid4().hex
    SESSIONS[sid] = {
@@ -54,3 +60,167 @@ def set_session_model(sid: str, model_name: str | None):
    s = SESSIONS.get(sid)
    if s is not None:
        s["model_name"] = model_name
 # ============================================================================
 # TRUE OPRO SESSION MANAGEMENT
 # ============================================================================
 def create_opro_run(
    task_description: str,
    test_cases: List[Tuple[str, str]] = None,
    model_name: str = None
 ) -> str:
    """
    Create a new OPRO optimization run.
    Args:
        task_description: Description of the task to optimize for
        test_cases: List of (input, expected_output) tuples for evaluation
        model_name: Optional model name to use
    Returns:
        run_id: Unique identifier for this OPRO run
    """
    run_id = uuid.uuid4().hex
    OPRO_RUNS[run_id] = {
        "task_description": task_description,
        "test_cases": test_cases or [],
        "model_name": model_name,
        "iteration": 0,
        "trajectory": [],  # List of (instruction, score) tuples
        "best_instruction": None,
        "best_score": 0.0,
        "current_candidates": [],
        "created_at": uuid.uuid1().time,
        "status": "active"  # active, completed, failed
    }
    return run_id
 def get_opro_run(run_id: str) -> Dict[str, Any]:
    """Get OPRO run by ID."""
    return OPRO_RUNS.get(run_id)
 def update_opro_iteration(
    run_id: str,
    candidates: List[str],
    scores: List[float] = None
 ):
    """
    Update OPRO run with new iteration results.
    Args:
        run_id: OPRO run identifier
        candidates: List of system instruction candidates
        scores: Optional list of scores (if evaluated)
    """
    run = OPRO_RUNS.get(run_id)
    if not run:
        return
    run["iteration"] += 1
    run["current_candidates"] = candidates
    # If scores provided, update trajectory
    if scores and len(scores) == len(candidates):
        for candidate, score in zip(candidates, scores):
            run["trajectory"].append((candidate, score))
            # Update best if this is better
            if score > run["best_score"]:
                run["best_score"] = score
                run["best_instruction"] = candidate
    # Log the iteration
    OPRO_RUN_LOG.append({
        "run_id": run_id,
        "iteration": run["iteration"],
        "num_candidates": len(candidates),
        "best_score": run["best_score"]
    })
 def add_opro_evaluation(
    run_id: str,
    instruction: str,
    score: float
 ):
    """
    Add a single evaluation result to OPRO run.
    Args:
        run_id: OPRO run identifier
        instruction: System instruction that was evaluated
        score: Performance score
    """
    run = OPRO_RUNS.get(run_id)
    if not run:
        return
    # Add to trajectory
    run["trajectory"].append((instruction, score))
    # Update best if this is better
    if score > run["best_score"]:
        run["best_score"] = score
        run["best_instruction"] = instruction
 def get_opro_trajectory(run_id: str) -> List[Tuple[str, float]]:
    """
    Get the performance trajectory for an OPRO run.
    Returns:
        List of (instruction, score) tuples sorted by score (highest first)
    """
    run = OPRO_RUNS.get(run_id)
    if not run:
        return []
    trajectory = run["trajectory"]
    return sorted(trajectory, key=lambda x: x[1], reverse=True)
 def set_opro_test_cases(
    run_id: str,
    test_cases: List[Tuple[str, str]]
 ):
    """
    Set or update test cases for an OPRO run.
    Args:
        run_id: OPRO run identifier
        test_cases: List of (input, expected_output) tuples
    """
    run = OPRO_RUNS.get(run_id)
    if run:
        run["test_cases"] = test_cases
 def complete_opro_run(run_id: str):
    """Mark an OPRO run as completed."""
    run = OPRO_RUNS.get(run_id)
    if run:
        run["status"] = "completed"
 def list_opro_runs() -> List[Dict[str, Any]]:
    """
    List all OPRO runs with summary information.
    Returns:
        List of run summaries
    """
    return [
        {
            "run_id": run_id,
            "task_description": run["task_description"][:100] + "..." if len(run["task_description"]) > 100 else run["task_description"],
            "iteration": run["iteration"],
            "best_score": run["best_score"],
            "num_test_cases": len(run["test_cases"]),
            "status": run["status"]
        }
        for run_id, run in OPRO_RUNS.items()
    ]
--- a/_qwen_xinference_demo/opro/user_prompt_optimizer.py
+++ b/_qwen_xinference_demo/opro/user_prompt_optimizer.py
@@ -1,12 +1,18 @@
 import re
 import numpy as np
 from typing import List, Tuple
 from sklearn.cluster import AgglomerativeClustering
 from sklearn.metrics.pairwise import cosine_similarity
 import config
 from .ollama_client import call_qwen
 from .xinference_client import embed_texts
-from .prompt_utils import refine_instruction, refine_instruction_with_history
+from .prompt_utils import (
    refine_instruction,
    refine_instruction_with_history,
    generate_initial_system_instruction_candidates,
    generate_optimized_system_instruction
 )
 def parse_candidates(raw: str) -> list:
    lines = [l.strip() for l in re.split(r'\r?\n', raw) if l.strip()]
@@ -44,6 +50,10 @@ def cluster_and_select(candidates: list, top_k=config.TOP_K, distance_threshold=
    return selected[:top_k]
 def generate_candidates(query: str, rejected=None, top_k=config.TOP_K, model_name=None):
    """
    LEGACY: Query rewriting function (NOT true OPRO).
    Kept for backward compatibility with existing API endpoints.
    """
    rejected = rejected or []
    if rejected:
        prompt = refine_instruction_with_history(query, rejected)
@@ -53,3 +63,87 @@ def generate_candidates(query: str, rejected=None, top_k=config.TOP_K, model_nam
    raw = call_qwen(prompt, temperature=0.9, max_tokens=1024, model_name=model_name)
    all_candidates = parse_candidates(raw)
    return cluster_and_select(all_candidates, top_k=top_k)
 # ============================================================================
 # TRUE OPRO FUNCTIONS (System Instruction Optimization)
 # ============================================================================
 def generate_system_instruction_candidates(
    task_description: str,
    trajectory: List[Tuple[str, float]] = None,
    top_k: int = config.TOP_K,
    pool_size: int = None,
    model_name: str = None
 ) -> List[str]:
    """
    TRUE OPRO: Generates optimized system instruction candidates.
    This is the core OPRO function that generates system instructions based on
    performance trajectory (if available) or initial candidates (if starting fresh).
    Args:
        task_description: Description of the task the LLM should perform
        trajectory: Optional list of (instruction, score) tuples from previous iterations
        top_k: Number of diverse candidates to return (default: config.TOP_K = 5)
        pool_size: Number of candidates to generate before clustering (default: config.GENERATION_POOL_SIZE = 10)
        model_name: Optional model name to use for generation
    Returns:
        List of top-k diverse system instruction candidates
    """
    pool_size = pool_size or config.GENERATION_POOL_SIZE
    # Generate the meta-prompt based on whether we have trajectory data
    if trajectory and len(trajectory) > 0:
        # Sort trajectory by score (highest first)
        sorted_trajectory = sorted(trajectory, key=lambda x: x[1], reverse=True)
        meta_prompt = generate_optimized_system_instruction(task_description, sorted_trajectory, pool_size)
    else:
        # No trajectory yet, generate initial candidates
        meta_prompt = generate_initial_system_instruction_candidates(task_description, pool_size)
    # Use the optimizer LLM to generate candidates
    raw = call_qwen(meta_prompt, temperature=0.9, max_tokens=1024, model_name=model_name)
    # Parse the generated candidates
    all_candidates = parse_candidates(raw)
    # Cluster and select diverse representatives
    return cluster_and_select(all_candidates, top_k=top_k)
 def evaluate_system_instruction(
    system_instruction: str,
    test_cases: List[Tuple[str, str]],
    model_name: str = None
 ) -> float:
    """
    TRUE OPRO: Evaluates a system instruction's performance on test cases.
    Args:
        system_instruction: The system instruction to evaluate
        test_cases: List of (input, expected_output) tuples
        model_name: Optional model name to use for evaluation
    Returns:
        Performance score (0.0 to 1.0)
    """
    if not test_cases:
        return 0.0
    correct = 0
    total = len(test_cases)
    for input_text, expected_output in test_cases:
        # Construct the full prompt with system instruction
        full_prompt = f"{system_instruction}\n\n{input_text}"
        # Get LLM response
        response = call_qwen(full_prompt, temperature=0.2, max_tokens=512, model_name=model_name)
        # Simple exact match scoring (can be replaced with more sophisticated metrics)
        if expected_output.strip().lower() in response.strip().lower():
            correct += 1
    return correct / total
--- a/config.py
+++ b/config.py
@@ -14,6 +14,7 @@ DEFAULT_EMBED_MODEL = "qwen3-embedding:4b"
 XINFERENCE_EMBED_URL = "http://127.0.0.1:9997/models/bge-base-zh/embed"
 # Clustering/selection
-TOP_K = 5
+GENERATION_POOL_SIZE = 10  # Generate this many candidates before clustering
 TOP_K = 5                   # Return this many diverse candidates to user
 CLUSTER_DISTANCE_THRESHOLD = 0.15
--- a/examples/opro_demo.py
+++ b/examples/opro_demo.py
@@ -0,0 +1,164 @@
 """
 TRUE OPRO Demo Script
 This script demonstrates the true OPRO (Optimization by PROmpting) functionality.
 It shows how to:
 1. Generate initial system instruction candidates
 2. Evaluate them on test cases
 3. Use the performance trajectory to generate better candidates
 """
 import sys
 sys.path.insert(0, '.')
 from _qwen_xinference_demo.opro.user_prompt_optimizer import (
    generate_system_instruction_candidates,
    evaluate_system_instruction
 )
 import config
 def demo_opro_workflow():
    """
    Demonstrates a complete OPRO optimization workflow.
    """
    print("=" * 80)
    print("TRUE OPRO Demo - System Instruction Optimization")
    print("=" * 80)
    print(f"Pool Size: {config.GENERATION_POOL_SIZE} candidates → Clustered to Top {config.TOP_K}")
    # Define the task
    task_description = """
 任务：将用户输入的中文句子翻译成英文。
 要求：翻译准确、自然、符合英语表达习惯。
 """
    print(f"\n📋 Task Description:\n{task_description}")
    # Define test cases for evaluation
    test_cases = [
        ("你好，很高兴见到你", "Hello, nice to meet you"),
        ("今天天气真好", "The weather is really nice today"),
        ("我喜欢学习编程", "I like learning programming"),
        ("这本书很有趣", "This book is very interesting"),
    ]
    print(f"\n🧪 Test Cases: {len(test_cases)} examples")
    for i, (input_text, expected) in enumerate(test_cases, 1):
        print(f"  {i}. '{input_text}' → '{expected}'")
    # Iteration 1: Generate initial candidates
    print("\n" + "=" * 80)
    print("🔄 Iteration 1: Generating Initial System Instruction Candidates")
    print("=" * 80)
    print("\n⏳ Generating candidates... (this may take a moment)")
    candidates_round1 = generate_system_instruction_candidates(
        task_description=task_description,
        trajectory=None,  # No history yet
        top_k=3,
        model_name=None  # Use default model
    )
    print(f"\n✅ Generated {len(candidates_round1)} candidates:")
    for i, candidate in enumerate(candidates_round1, 1):
        print(f"\n  Candidate {i}:")
        print(f"  {candidate[:100]}..." if len(candidate) > 100 else f"  {candidate}")
    # Evaluate each candidate
    print("\n" + "-" * 80)
    print("📊 Evaluating Candidates on Test Cases")
    print("-" * 80)
    trajectory = []
    for i, candidate in enumerate(candidates_round1, 1):
        print(f"\n⏳ Evaluating Candidate {i}...")
        score = evaluate_system_instruction(
            system_instruction=candidate,
            test_cases=test_cases,
            model_name=None
        )
        trajectory.append((candidate, score))
        print(f"  Score: {score:.2%}")
    # Sort by score
    trajectory.sort(key=lambda x: x[1], reverse=True)
    print("\n📈 Performance Summary (Round 1):")
    for i, (candidate, score) in enumerate(trajectory, 1):
        print(f"  {i}. Score: {score:.2%} - {candidate[:60]}...")
    best_score = trajectory[0][1]
    print(f"\n🏆 Best Score: {best_score:.2%}")
    # Iteration 2: Generate optimized candidates based on trajectory
    print("\n" + "=" * 80)
    print("🔄 Iteration 2: Generating Optimized System Instructions")
    print("=" * 80)
    print(f"\n💡 Using performance trajectory to generate better candidates...")
    print(f"   Goal: Beat current best score of {best_score:.2%}")
    print("\n⏳ Generating optimized candidates...")
    candidates_round2 = generate_system_instruction_candidates(
        task_description=task_description,
        trajectory=trajectory,  # Use performance history
        top_k=3,
        model_name=None
    )
    print(f"\n✅ Generated {len(candidates_round2)} optimized candidates:")
    for i, candidate in enumerate(candidates_round2, 1):
        print(f"\n  Candidate {i}:")
        print(f"  {candidate[:100]}..." if len(candidate) > 100 else f"  {candidate}")
    # Evaluate new candidates
    print("\n" + "-" * 80)
    print("📊 Evaluating Optimized Candidates")
    print("-" * 80)
    for i, candidate in enumerate(candidates_round2, 1):
        print(f"\n⏳ Evaluating Optimized Candidate {i}...")
        score = evaluate_system_instruction(
            system_instruction=candidate,
            test_cases=test_cases,
            model_name=None
        )
        trajectory.append((candidate, score))
        print(f"  Score: {score:.2%}")
        if score > best_score:
            print(f"  🎉 NEW BEST! Improved from {best_score:.2%} to {score:.2%}")
            best_score = score
    # Final summary
    trajectory.sort(key=lambda x: x[1], reverse=True)
    print("\n" + "=" * 80)
    print("🏁 Final Results")
    print("=" * 80)
    print(f"\n🏆 Best System Instruction (Score: {trajectory[0][1]:.2%}):")
    print(f"\n{trajectory[0][0]}")
    print("\n📊 All Candidates Ranked:")
    for i, (candidate, score) in enumerate(trajectory[:5], 1):
        print(f"\n  {i}. Score: {score:.2%}")
        print(f"     {candidate[:80]}...")
    print("\n" + "=" * 80)
    print("✅ OPRO Demo Complete!")
    print("=" * 80)
 if __name__ == "__main__":
    print("\n⚠️  NOTE: This demo requires:")
    print("   1. Ollama running locally (http://127.0.0.1:11434)")
    print("   2. A Qwen model available (e.g., qwen3:8b)")
    print("   3. An embedding model (e.g., qwen3-embedding:4b)")
    print("\n   Press Ctrl+C to cancel, or Enter to continue...")
    try:
        input()
        demo_opro_workflow()
    except KeyboardInterrupt:
        print("\n\n❌ Demo cancelled by user.")
        sys.exit(0)
--- a/frontend/opro.html
+++ b/frontend/opro.html
@@ -0,0 +1,507 @@
 <!DOCTYPE html>
 <html lang="zh-CN">
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
    <meta http-equiv="Pragma" content="no-cache">
    <meta http-equiv="Expires" content="0">
    <title>OPRO - System Instruction Optimizer</title>
    <script crossorigin src="https://unpkg.com/react@18/umd/react.production.min.js"></script>
    <script crossorigin src="https://unpkg.com/react-dom@18/umd/react-dom.production.min.js"></script>
    <script src="https://cdn.tailwindcss.com"></script>
    <style>
        body {
            margin: 0;
            font-family: 'Google Sans', 'Segoe UI', Roboto, sans-serif;
            background: #f8f9fa;
        }
        .chat-container { height: 100vh; display: flex; }
        .scrollbar-hide::-webkit-scrollbar { display: none; }
        .scrollbar-hide { -ms-overflow-style: none; scrollbar-width: none; }
        .sidebar-collapsed { width: 60px; }
        .sidebar-expanded { width: 260px; }
        .instruction-card {
            transition: all 0.15s ease;
            border: 1px solid #e8eaed;
        }
        .instruction-card:hover {
            border-color: #dadce0;
            box-shadow: 0 1px 3px rgba(60,64,67,0.15);
        }
        .loading-dots::after {
            content: '...';
            animation: dots 1.5s steps(4, end) infinite;
        }
        @keyframes dots {
            0%, 20% { content: '.'; }
            40% { content: '..'; }
            60%, 100% { content: '...'; }
        }
    </style>
 </head>
 <body>
    <div id="root"></div>
    <script>
        const { useState, useEffect, useRef } = React;
        const API_BASE = 'http://127.0.0.1:8010';
        // Main App Component
        function App() {
            const [sidebarOpen, setSidebarOpen] = useState(false);
            const [runs, setRuns] = useState([]);
            const [currentRunId, setCurrentRunId] = useState(null);
            const [messages, setMessages] = useState([]);
            const [inputValue, setInputValue] = useState('');
            const [loading, setLoading] = useState(false);
            const [models, setModels] = useState([]);
            const [selectedModel, setSelectedModel] = useState('');
            const chatEndRef = useRef(null);
            // Load runs and models on mount
            useEffect(() => {
                loadRuns();
                loadModels();
            }, []);
            async function loadModels() {
                try {
                    const res = await fetch(`${API_BASE}/models`);
                    const data = await res.json();
                    if (data.success && data.data.models) {
                        setModels(data.data.models);
                        if (data.data.models.length > 0) {
                            setSelectedModel(data.data.models[0]);
                        }
                    }
                } catch (err) {
                    console.error('Failed to load models:', err);
                }
            }
            // Auto-scroll chat
            useEffect(() => {
                chatEndRef.current?.scrollIntoView({ behavior: 'smooth' });
            }, [messages]);
            async function loadRuns() {
                try {
                    const res = await fetch(`${API_BASE}/opro/runs`);
                    const data = await res.json();
                    if (data.success) {
                        setRuns(data.data.runs || []);
                    }
                } catch (err) {
                    console.error('Failed to load runs:', err);
                }
            }
            async function createNewRun(taskDescription) {
                setLoading(true);
                try {
                    // Create run
                    const res = await fetch(`${API_BASE}/opro/create`, {
                        method: 'POST',
                        headers: { 'Content-Type': 'application/json' },
                        body: JSON.stringify({
                            task_description: taskDescription,
                            test_cases: [],
                            model_name: selectedModel || undefined
                        })
                    });
                    const data = await res.json();
                    if (!data.success) {
                        throw new Error(data.error || 'Failed to create run');
                    }
                    const runId = data.data.run_id;
                    setCurrentRunId(runId);
                    // Add user message
                    setMessages([{ role: 'user', content: taskDescription }]);
                    // Generate and evaluate candidates
                    await generateCandidates(runId);
                    // Reload runs list
                    await loadRuns();
                } catch (err) {
                    alert('创建任务失败: ' + err.message);
                } finally {
                    setLoading(false);
                }
            }
            async function generateCandidates(runId) {
                setLoading(true);
                try {
                    const res = await fetch(`${API_BASE}/opro/generate_and_evaluate`, {
                        method: 'POST',
                        headers: { 'Content-Type': 'application/json' },
                        body: JSON.stringify({
                            run_id: runId,
                            top_k: 5,
                            auto_evaluate: false  // Use diversity-based selection
                        })
                    });
                    const data = await res.json();
                    if (!data.success) {
                        throw new Error(data.error || 'Failed to generate candidates');
                    }
                    // Add assistant message with candidates
                    setMessages(prev => [...prev, {
                        role: 'assistant',
                        type: 'candidates',
                        candidates: data.data.candidates,
                        iteration: data.data.iteration
                    }]);
                } catch (err) {
                    alert('生成候选指令失败: ' + err.message);
                } finally {
                    setLoading(false);
                }
            }
            async function executeInstruction(instruction, userInput) {
                setLoading(true);
                try {
                    const res = await fetch(`${API_BASE}/opro/execute`, {
                        method: 'POST',
                        headers: { 'Content-Type': 'application/json' },
                        body: JSON.stringify({
                            instruction: instruction,
                            user_input: userInput || '请执行任务',
                            model_name: selectedModel || undefined
                        })
                    });
                    const data = await res.json();
                    if (!data.success) {
                        throw new Error(data.error || 'Failed to execute');
                    }
                    // Add execution result
                    setMessages(prev => [...prev, {
                        role: 'assistant',
                        type: 'execution',
                        instruction: instruction,
                        response: data.data.response
                    }]);
                } catch (err) {
                    alert('执行失败: ' + err.message);
                } finally {
                    setLoading(false);
                }
            }
            function handleSendMessage() {
                const msg = inputValue.trim();
                if (!msg || loading) return;
                setInputValue('');
                if (!currentRunId) {
                    // Create new run with task description
                    createNewRun(msg);
                } else {
                    // Continue optimization or execute
                    // For now, just show message
                    setMessages(prev => [...prev, { role: 'user', content: msg }]);
                }
            }
            function handleContinueOptimize() {
                if (!currentRunId || loading) return;
                generateCandidates(currentRunId);
            }
            function handleExecute(instruction) {
                if (loading) return;
                const userInput = prompt('请输入要处理的内容（可选）:');
                executeInstruction(instruction, userInput);
            }
            function handleCopyInstruction(instruction) {
                navigator.clipboard.writeText(instruction).then(() => {
                    // Could add a toast notification here
                    console.log('Instruction copied to clipboard');
                }).catch(err => {
                    console.error('Failed to copy:', err);
                });
            }
            function handleNewTask() {
                setCurrentRunId(null);
                setMessages([]);
                setInputValue('');
            }
            async function loadRun(runId) {
                setLoading(true);
                try {
                    const res = await fetch(`${API_BASE}/opro/run/${runId}`);
                    const data = await res.json();
                    if (!data.success) {
                        throw new Error(data.error || 'Failed to load run');
                    }
                    const run = data.data;
                    setCurrentRunId(runId);
                    // Reconstruct messages from run data
                    const msgs = [
                        { role: 'user', content: run.task_description }
                    ];
                    if (run.current_candidates && run.current_candidates.length > 0) {
                        msgs.push({
                            role: 'assistant',
                            type: 'candidates',
                            candidates: run.current_candidates.map(c => ({ instruction: c, score: null })),
                            iteration: run.iteration
                        });
                    }
                    setMessages(msgs);
                } catch (err) {
                    alert('加载任务失败: ' + err.message);
                } finally {
                    setLoading(false);
                }
            }
            return React.createElement('div', { className: 'chat-container' },
                // Sidebar
                React.createElement('div', {
                    className: `bg-white border-r border-gray-200 transition-all duration-300 flex flex-col ${sidebarOpen ? 'sidebar-expanded' : 'sidebar-collapsed'}`
                },
                    // Header area - Collapse button only
                    React.createElement('div', { className: 'p-3 border-b border-gray-200 flex items-center justify-between' },
                        sidebarOpen ? React.createElement('button', {
                            onClick: () => setSidebarOpen(false),
                            className: 'p-2 text-gray-600 hover:bg-gray-100 rounded-lg transition-colors'
                        },
                            React.createElement('svg', { width: '20', height: '20', viewBox: '0 0 24 24', fill: 'none', stroke: 'currentColor', strokeWidth: '2' },
                                React.createElement('path', { d: 'M15 18l-6-6 6-6' })
                            )
                        ) : React.createElement('button', {
                            onClick: () => setSidebarOpen(true),
                            className: 'w-full p-2 text-gray-600 hover:bg-gray-100 rounded-lg transition-colors flex items-center justify-center'
                        },
                            React.createElement('svg', { width: '20', height: '20', viewBox: '0 0 24 24', fill: 'none', stroke: 'currentColor', strokeWidth: '2' },
                                React.createElement('path', { d: 'M3 12h18M3 6h18M3 18h18' })
                            )
                        )
                    ),
                    // Content area
                    React.createElement('div', { className: 'flex-1 overflow-y-auto scrollbar-hide p-2 flex flex-col' },
                        sidebarOpen ? React.createElement(React.Fragment, null,
                            // New task button (expanded)
                            React.createElement('button', {
                                onClick: handleNewTask,
                                className: 'mb-3 px-4 py-2.5 bg-white border border-gray-300 hover:bg-gray-50 rounded-lg transition-colors flex items-center justify-center gap-2 text-gray-700 font-medium'
                            },
                                React.createElement('span', { className: 'text-lg' }, '+'),
                                React.createElement('span', null, '新建会话')
                            ),
                            // Sessions list
                            runs.length > 0 && React.createElement('div', { className: 'text-xs text-gray-500 mb-2 px-2' }, '会话列表'),
                            runs.map(run =>
                                React.createElement('div', {
                                    key: run.run_id,
                                    onClick: () => loadRun(run.run_id),
                                    className: `p-3 mb-1 rounded-lg cursor-pointer transition-colors flex items-center gap-2 ${
                                        currentRunId === run.run_id ? 'bg-gray-100' : 'hover:bg-gray-50'
                                    }`
                                },
                                    React.createElement('svg', {
                                        width: '16',
                                        height: '16',
                                        viewBox: '0 0 24 24',
                                        fill: 'none',
                                        stroke: 'currentColor',
                                        strokeWidth: '2',
                                        className: 'flex-shrink-0 text-gray-500'
                                    },
                                        React.createElement('path', { d: 'M21 15a2 2 0 0 1-2 2H7l-4 4V5a2 2 0 0 1 2-2h14a2 2 0 0 1 2 2z' })
                                    ),
                                    React.createElement('div', { className: 'text-sm text-gray-800 truncate flex-1' },
                                        run.task_description
                                    )
                                )
                            )
                        ) : React.createElement('button', {
                            onClick: handleNewTask,
                            className: 'p-2 text-gray-600 hover:bg-gray-100 rounded-lg transition-colors flex items-center justify-center',
                            title: '新建会话'
                        },
                            React.createElement('svg', { width: '24', height: '24', viewBox: '0 0 24 24', fill: 'none', stroke: 'currentColor', strokeWidth: '2' },
                                React.createElement('path', { d: 'M12 5v14M5 12h14' })
                            )
                        )
                    )
                ),
                // Main Chat Area
                React.createElement('div', { className: 'flex-1 flex flex-col bg-white' },
                    // Header
                    React.createElement('div', { className: 'px-4 py-3 border-b border-gray-200 bg-white flex items-center gap-3' },
                        React.createElement('h1', { className: 'text-lg font-normal text-gray-800' },
                            'OPRO'
                        )
                    ),
                    // Chat Messages
                    React.createElement('div', { className: 'flex-1 overflow-y-auto scrollbar-hide p-6 space-y-6 max-w-4xl mx-auto w-full' },
                        messages.map((msg, idx) => {
                            if (msg.role === 'user') {
                                return React.createElement('div', { key: idx, className: 'flex justify-end' },
                                    React.createElement('div', { className: 'max-w-2xl bg-gray-100 text-gray-800 rounded-2xl px-5 py-3' },
                                        msg.content
                                    )
                                );
                            } else if (msg.type === 'candidates') {
                                return React.createElement('div', { key: idx, className: 'flex justify-start' },
                                    React.createElement('div', { className: 'w-full' },
                                        React.createElement('div', { className: 'mb-3' },
                                            React.createElement('div', { className: 'text-sm text-gray-600' },
                                                `优化后的提示词（第 ${msg.iteration} 轮）`
                                            ),
                                        ),
                                        msg.candidates.map((cand, cidx) =>
                                            React.createElement('div', {
                                                key: cidx,
                                                className: 'instruction-card bg-white rounded-xl p-5 mb-3'
                                            },
                                                React.createElement('div', { className: 'flex items-start gap-3' },
                                                    React.createElement('div', { className: 'flex-shrink-0 w-7 h-7 bg-gray-200 text-gray-700 rounded-full flex items-center justify-center text-sm font-medium' },
                                                        cidx + 1
                                                    ),
                                                    React.createElement('div', { className: 'flex-1' },
                                                        React.createElement('div', { className: 'text-gray-800 mb-4 whitespace-pre-wrap leading-relaxed' },
                                                            cand.instruction
                                                        ),
                                                        cand.score !== null && React.createElement('div', { className: 'text-xs text-gray-500 mb-3' },
                                                            `评分: ${cand.score.toFixed(4)}`
                                                        ),
                                                        React.createElement('div', { className: 'flex gap-2' },
                                                            React.createElement('button', {
                                                                onClick: handleContinueOptimize,
                                                                disabled: loading,
                                                                className: 'px-4 py-2 bg-white border border-gray-300 text-gray-700 rounded-lg hover:bg-gray-50 disabled:bg-gray-100 disabled:text-gray-400 disabled:cursor-not-allowed transition-colors text-sm font-medium'
                                                            }, '继续优化'),
                                                            React.createElement('button', {
                                                                onClick: () => handleCopyInstruction(cand.instruction),
                                                                className: 'px-4 py-2 bg-white border border-gray-300 text-gray-700 rounded-lg hover:bg-gray-50 transition-colors text-sm font-medium flex items-center gap-1'
                                                            },
                                                                React.createElement('svg', { width: '16', height: '16', viewBox: '0 0 24 24', fill: 'none', stroke: 'currentColor', strokeWidth: '2' },
                                                                    React.createElement('rect', { x: '9', y: '9', width: '13', height: '13', rx: '2', ry: '2' }),
                                                                    React.createElement('path', { d: 'M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1' })
                                                                ),
                                                                '复制'
                                                            ),
                                                            React.createElement('button', {
                                                                onClick: () => handleExecute(cand.instruction),
                                                                disabled: loading,
                                                                className: 'px-4 py-2 bg-gray-900 text-white rounded-lg hover:bg-gray-800 disabled:bg-gray-300 disabled:cursor-not-allowed transition-colors text-sm font-medium'
                                                            }, '执行此指令')
                                                        )
                                                    )
                                                )
                                            )
                                        )
                                    )
                                );
                            } else if (msg.type === 'execution') {
                                return React.createElement('div', { key: idx, className: 'flex justify-start' },
                                    React.createElement('div', { className: 'max-w-2xl bg-gray-50 border border-gray-200 rounded-2xl p-5' },
                                        React.createElement('div', { className: 'text-xs text-gray-600 mb-2 font-medium' },
                                            '执行结果'
                                        ),
                                        React.createElement('div', { className: 'text-gray-800 whitespace-pre-wrap leading-relaxed' },
                                            msg.response
                                        )
                                    )
                                );
                            }
                        }),
                        loading && React.createElement('div', { className: 'flex justify-start' },
                            React.createElement('div', { className: 'bg-gray-100 rounded-2xl px-5 py-3 text-gray-600' },
                                React.createElement('span', { className: 'loading-dots' }, '思考中')
                            )
                        ),
                        React.createElement('div', { ref: chatEndRef })
                    ),
                    // Input Area
                    React.createElement('div', { className: 'p-6 bg-white max-w-4xl mx-auto w-full' },
                        React.createElement('div', { className: 'relative' },
                            React.createElement('div', { className: 'bg-white border border-gray-300 rounded-3xl shadow-sm hover:shadow-md transition-shadow focus-within:shadow-md focus-within:border-gray-400' },
                                // Textarea
                                React.createElement('textarea', {
                                    value: inputValue,
                                    onChange: (e) => setInputValue(e.target.value),
                                    onKeyPress: (e) => {
                                        if (e.key === 'Enter' && !e.shiftKey) {
                                            e.preventDefault();
                                            handleSendMessage();
                                        }
                                    },
                                    placeholder: currentRunId ? '输入消息...' : '在此输入提示词',
                                    disabled: loading,
                                    rows: 3,
                                    className: 'w-full px-5 pt-4 pb-2 bg-transparent focus:outline-none disabled:bg-transparent text-gray-800 placeholder-gray-500 resize-none'
                                }),
                                // Toolbar
                                React.createElement('div', { className: 'flex items-center justify-between px-4 pb-3 pt-1 border-t border-gray-100' },
                                    // Left side - Model selector
                                    React.createElement('div', { className: 'flex items-center gap-2' },
                                        React.createElement('label', { className: 'text-xs text-gray-600' }, '模型:'),
                                        React.createElement('select', {
                                            value: selectedModel,
                                            onChange: (e) => setSelectedModel(e.target.value),
                                            className: 'text-sm px-2 py-1 border border-gray-300 rounded-lg bg-white text-gray-700 focus:outline-none focus:border-gray-400 cursor-pointer'
                                        },
                                            models.map(model =>
                                                React.createElement('option', { key: model, value: model }, model)
                                            )
                                        )
                                    ),
                                    // Right side - Send button
                                    React.createElement('button', {
                                        onClick: handleSendMessage,
                                        disabled: loading || !inputValue.trim(),
                                        className: 'p-2.5 bg-gray-100 text-gray-700 rounded-full hover:bg-gray-200 disabled:bg-gray-50 disabled:text-gray-300 disabled:cursor-not-allowed transition-colors flex items-center justify-center'
                                    },
                                        React.createElement('svg', {
                                            width: '20',
                                            height: '20',
                                            viewBox: '0 0 24 24',
                                            fill: 'currentColor'
                                        },
                                            React.createElement('path', { d: 'M2.01 21L23 12 2.01 3 2 10l15 2-15 2z' })
                                        )
                                    )
                                )
                            ),
                            !currentRunId && React.createElement('div', { className: 'text-xs text-gray-500 mt-3 px-4' },
                                '输入任务描述后，AI 将为你生成优化的系统指令'
                            )
                        )
                    )
                )
            );
        }
        // Render App
        const root = ReactDOM.createRoot(document.getElementById('root'));
        root.render(React.createElement(App));
    </script>
 </body>
 </html>
--- a/test_opro_api.py
+++ b/test_opro_api.py
@@ -0,0 +1,184 @@
 #!/usr/bin/env python3
 """
 Test script for TRUE OPRO API endpoints.
 This script tests the complete OPRO workflow:
 1. Create OPRO run
 2. Generate initial candidates
 3. Evaluate candidates
 4. Generate optimized candidates
 5. View results
 Usage:
    python test_opro_api.py
 """
 import requests
 import json
 import time
 BASE_URL = "http://127.0.0.1:8010"
 def print_section(title):
    """Print a section header."""
    print("\n" + "=" * 60)
    print(f"  {title}")
    print("=" * 60)
 def test_opro_workflow():
    """Test the complete OPRO workflow."""
    print_section("1. Create OPRO Run")
    # Create a new OPRO run
    create_req = {
        "task_description": "将用户输入的中文翻译成英文，要求准确自然",
        "test_cases": [
            {"input": "你好", "expected_output": "Hello"},
            {"input": "谢谢", "expected_output": "Thank you"},
            {"input": "早上好", "expected_output": "Good morning"},
            {"input": "晚安", "expected_output": "Good night"},
            {"input": "再见", "expected_output": "Goodbye"}
        ]
    }
    response = requests.post(f"{BASE_URL}/opro/create", json=create_req)
    result = response.json()
    if not result.get("success"):
        print(f"❌ Failed to create OPRO run: {result}")
        return
    run_id = result["data"]["run_id"]
    print(f"✅ Created OPRO run: {run_id}")
    print(f"   Task: {result['data']['task_description']}")
    print(f"   Test cases: {result['data']['num_test_cases']}")
    # ========================================================================
    print_section("2. Generate Initial Candidates")
    iterate_req = {"run_id": run_id, "top_k": 5}
    response = requests.post(f"{BASE_URL}/opro/iterate", json=iterate_req)
    result = response.json()
    if not result.get("success"):
        print(f"❌ Failed to generate candidates: {result}")
        return
    candidates = result["data"]["candidates"]
    print(f"✅ Generated {len(candidates)} initial candidates:")
    for i, candidate in enumerate(candidates, 1):
        print(f"\n   [{i}] {candidate[:100]}...")
    # ========================================================================
    print_section("3. Evaluate Candidates")
    scores = []
    for i, candidate in enumerate(candidates, 1):
        print(f"\n   Evaluating candidate {i}/{len(candidates)}...")
        eval_req = {
            "run_id": run_id,
            "instruction": candidate
        }
        response = requests.post(f"{BASE_URL}/opro/evaluate", json=eval_req)
        result = response.json()
        if result.get("success"):
            score = result["data"]["score"]
            scores.append(score)
            is_best = "🏆" if result["data"]["is_new_best"] else ""
            print(f"   ✅ Score: {score:.4f} {is_best}")
        else:
            print(f"   ❌ Evaluation failed: {result}")
        time.sleep(0.5)  # Small delay to avoid overwhelming the API
    print(f"\n   Average score: {sum(scores)/len(scores):.4f}")
    print(f"   Best score: {max(scores):.4f}")
    # ========================================================================
    print_section("4. Generate Optimized Candidates (Iteration 2)")
    print("   Generating candidates based on performance trajectory...")
    iterate_req = {"run_id": run_id, "top_k": 5}
    response = requests.post(f"{BASE_URL}/opro/iterate", json=iterate_req)
    result = response.json()
    if not result.get("success"):
        print(f"❌ Failed to generate optimized candidates: {result}")
        return
    optimized_candidates = result["data"]["candidates"]
    print(f"✅ Generated {len(optimized_candidates)} optimized candidates:")
    for i, candidate in enumerate(optimized_candidates, 1):
        print(f"\n   [{i}] {candidate[:100]}...")
    # ========================================================================
    print_section("5. View Run Details")
    response = requests.get(f"{BASE_URL}/opro/run/{run_id}")
    result = response.json()
    if not result.get("success"):
        print(f"❌ Failed to get run details: {result}")
        return
    data = result["data"]
    print(f"✅ OPRO Run Details:")
    print(f"   Run ID: {data['run_id']}")
    print(f"   Task: {data['task_description']}")
    print(f"   Iteration: {data['iteration']}")
    print(f"   Status: {data['status']}")
    print(f"   Best Score: {data['best_score']:.4f}")
    print(f"\n   Best Instruction:")
    print(f"   {data['best_instruction'][:200]}...")
    print(f"\n   Top 5 Trajectory:")
    for i, item in enumerate(data['trajectory'][:5], 1):
        print(f"   [{i}] Score: {item['score']:.4f}")
        print(f"       {item['instruction'][:80]}...")
    # ========================================================================
    print_section("6. List All Runs")
    response = requests.get(f"{BASE_URL}/opro/runs")
    result = response.json()
    if result.get("success"):
        runs = result["data"]["runs"]
        print(f"✅ Total OPRO runs: {result['data']['total']}")
        for run in runs:
            print(f"\n   Run: {run['run_id']}")
            print(f"   Task: {run['task_description'][:50]}...")
            print(f"   Iteration: {run['iteration']}, Best Score: {run['best_score']:.4f}")
    print_section("✅ OPRO Workflow Test Complete!")
    print(f"\nRun ID: {run_id}")
    print("You can view details at:")
    print(f"  {BASE_URL}/opro/run/{run_id}")
 if __name__ == "__main__":
    print("=" * 60)
    print("  TRUE OPRO API Test")
    print("=" * 60)
    print(f"\nBase URL: {BASE_URL}")
    print("\nMake sure the API server is running:")
    print("  uvicorn _qwen_xinference_demo.api:app --host 127.0.0.1 --port 8010")
    print("\nStarting test in 3 seconds...")
    time.sleep(3)
    try:
        test_opro_workflow()
    except requests.exceptions.ConnectionError:
        print("\n❌ ERROR: Could not connect to API server")
        print("Please start the server first:")
        print("  uvicorn _qwen_xinference_demo.api:app --host 127.0.0.1 --port 8010")
    except Exception as e:
        print(f"\n❌ ERROR: {e}")
        import traceback
        traceback.print_exc()