feat: 实现智能指令的格式转换和文档编辑功能

主要更新： - 新增 transform 意图：支持 Word/Excel/Markdown 格式互转 - 新增 edit 意图：使用 LLM 润色编辑文档内容 - 智能指令接口增加异步执行模式（async_execute 参数） - 修复 Word 模板导出文档损坏问题（改用临时文件方式） - 优化 intent_parser 增加 transform/edit 关键词识别新增文件： - app/api/endpoints/instruction.py: 智能指令 API 端点 - app/services/multi_doc_reasoning_service.py: 多文档推理服务其他优化： - RAG 服务混合搜索（BM25 + 向量）融合 - 模板填充服务表头匹配增强 - Word AI 解析服务返回结构完善 - 前端 InstructionChat 组件对接真实 API
123
2026-04-14 20:39:37 +08:00 · 2026-04-14 17:35:40 +08:00 · 2026-04-14 17:25:13 +08:00 · 2026-04-14 17:16:38 +08:00 · 2026-04-14 17:14:59 +08:00 · 2026-04-14 15:18:50 +08:00
38 changed files with 8005 additions and 2754 deletions
--- a/backend/.env.example
+++ b/backend/.env.example
@@ -29,9 +29,14 @@ REDIS_URL="redis://localhost:6379/0"

 # ==================== LLM AI 配置 ====================
 # 大语言模型 API 配置
-LLM_API_KEY="your_api_key_here"
-LLM_BASE_URL=""
-LLM_MODEL_NAME=""
+# 支持 OpenAI 兼容格式 (DeepSeek, 智谱 GLM, 阿里等)
+# 智谱 AI (Zhipu AI) GLM 系列:
+#   - 模型: glm-4-flash (快速文本模型), glm-4 (标准), glm-4-plus (高性能)
+#   - API: https://open.bigmodel.cn
+#   - API Key: https://open.bigmodel.cn/usercenter/apikeys
+LLM_API_KEY="ca79ad9f96524cd5afc3e43ca97f347d.cpiLLx2oyitGvTeU"
+LLM_BASE_URL="https://open.bigmodel.cn/api/paas/v4"
+LLM_MODEL_NAME="glm-4v-plus"

 # ==================== Supabase 配置 ====================
 # Supabase 项目配置
--- a/backend/=3.0.0
+++ b/backend/=3.0.0
@@ -0,0 +1,38 @@
+Requirement already satisfied: sentence-transformers in c:\python312\lib\site-packages (2.2.2)
+Requirement already satisfied: transformers<5.0.0,>=4.6.0 in c:\python312\lib\site-packages (from sentence-transformers) (4.57.6)
+Requirement already satisfied: tqdm in c:\python312\lib\site-packages (from sentence-transformers) (4.66.1)
+Requirement already satisfied: torch>=1.6.0 in c:\python312\lib\site-packages (from sentence-transformers) (2.10.0)
+Requirement already satisfied: torchvision in c:\python312\lib\site-packages (from sentence-transformers) (0.25.0)
+Requirement already satisfied: numpy in c:\python312\lib\site-packages (from sentence-transformers) (1.26.2)
+Requirement already satisfied: scikit-learn in c:\python312\lib\site-packages (from sentence-transformers) (1.8.0)
+Requirement already satisfied: scipy in c:\python312\lib\site-packages (from sentence-transformers) (1.16.3)
+Requirement already satisfied: nltk in c:\python312\lib\site-packages (from sentence-transformers) (3.9.3)
+Requirement already satisfied: sentencepiece in c:\python312\lib\site-packages (from sentence-transformers) (0.2.1)
+Requirement already satisfied: huggingface-hub>=0.4.0 in c:\python312\lib\site-packages (from sentence-transformers) (0.36.2)
+Requirement already satisfied: filelock in c:\python312\lib\site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (3.25.2)
+Requirement already satisfied: fsspec>=2023.5.0 in c:\python312\lib\site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (2026.2.0)
+Requirement already satisfied: packaging>=20.9 in c:\python312\lib\site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (23.2)
+Requirement already satisfied: pyyaml>=5.1 in c:\python312\lib\site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (6.0.1)
+Requirement already satisfied: requests in c:\python312\lib\site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (2.31.0)
+Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\python312\lib\site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (4.15.0)
+Requirement already satisfied: sympy>=1.13.3 in c:\python312\lib\site-packages (from torch>=1.6.0->sentence-transformers) (1.14.0)
+Requirement already satisfied: networkx>=2.5.1 in c:\python312\lib\site-packages (from torch>=1.6.0->sentence-transformers) (3.6.1)
+Requirement already satisfied: jinja2 in c:\python312\lib\site-packages (from torch>=1.6.0->sentence-transformers) (3.1.6)
+Requirement already satisfied: setuptools in c:\python312\lib\site-packages (from torch>=1.6.0->sentence-transformers) (82.0.1)
+Requirement already satisfied: colorama in c:\python312\lib\site-packages (from tqdm->sentence-transformers) (0.4.6)
+Requirement already satisfied: regex!=2019.12.17 in c:\python312\lib\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (2026.2.28)
+Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in c:\python312\lib\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (0.22.2)
+Requirement already satisfied: safetensors>=0.4.3 in c:\python312\lib\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (0.7.0)
+Requirement already satisfied: click in c:\python312\lib\site-packages (from nltk->sentence-transformers) (8.3.1)
+Requirement already satisfied: joblib in c:\python312\lib\site-packages (from nltk->sentence-transformers) (1.5.3)
+Requirement already satisfied: threadpoolctl>=3.2.0 in c:\python312\lib\site-packages (from scikit-learn->sentence-transformers) (3.6.0)
+Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\python312\lib\site-packages (from torchvision->sentence-transformers) (12.1.1)
+Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\python312\lib\site-packages (from sympy>=1.13.3->torch>=1.6.0->sentence-transformers) (1.3.0)
+Requirement already satisfied: MarkupSafe>=2.0 in c:\python312\lib\site-packages (from jinja2->torch>=1.6.0->sentence-transformers) (3.0.3)
+Requirement already satisfied: charset-normalizer<4,>=2 in c:\python312\lib\site-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (3.4.6)
+Requirement already satisfied: idna<4,>=2.5 in c:\python312\lib\site-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (3.11)
+Requirement already satisfied: urllib3<3,>=1.21.1 in c:\python312\lib\site-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (2.6.3)
+Requirement already satisfied: certifi>=2017.4.17 in c:\python312\lib\site-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (2026.2.25)
+
+[notice] A new release of pip is available: 24.2 -> 26.0.1
+[notice] To update, run: python.exe -m pip install --upgrade pip
--- a/backend/app/api/init.py
+++ b/backend/app/api/init.py
@@ -13,6 +13,7 @@ from app.api.endpoints import (
    visualization,
    analysis_charts,
    health,
+    instruction,    # 智能指令
 )

 # 创建主路由
@@ -29,3 +30,4 @@ api_router.include_router(templates.router)        # 表格模板
 api_router.include_router(ai_analyze.router)       # AI分析
 api_router.include_router(visualization.router)    # 可视化
 api_router.include_router(analysis_charts.router)  # 分析图表
+api_router.include_router(instruction.router)      # 智能指令
--- a/backend/app/api/endpoints/ai_analyze.py
+++ b/backend/app/api/endpoints/ai_analyze.py
@@ -10,6 +10,8 @@ import os

 from app.services.excel_ai_service import excel_ai_service
 from app.services.markdown_ai_service import markdown_ai_service
+from app.services.template_fill_service import template_fill_service
+from app.services.word_ai_service import word_ai_service

 logger = logging.getLogger(__name__)

@@ -215,9 +217,12 @@ async def analyze_markdown(
            return result

        finally:
-            # 清理临时文件
-            if os.path.exists(tmp_path):
-                os.unlink(tmp_path)
+            # 清理临时文件，确保在所有情况下都能清理
+            try:
+                if tmp_path and os.path.exists(tmp_path):
+                    os.unlink(tmp_path)
+            except Exception as cleanup_error:
+                logger.warning(f"临时文件清理失败: {tmp_path}, error: {cleanup_error}")

    except HTTPException:
        raise
@@ -279,8 +284,12 @@ async def analyze_markdown_stream(
            )

        finally:
-            if os.path.exists(tmp_path):
-                os.unlink(tmp_path)
+            # 清理临时文件，确保在所有情况下都能清理
+            try:
+                if tmp_path and os.path.exists(tmp_path):
+                    os.unlink(tmp_path)
+            except Exception as cleanup_error:
+                logger.warning(f"临时文件清理失败: {tmp_path}, error: {cleanup_error}")

    except HTTPException:
        raise
@@ -289,7 +298,7 @@ async def analyze_markdown_stream(
        raise HTTPException(status_code=500, detail=f"流式分析失败: {str(e)}")


-@router.get("/analyze/md/outline")
+@router.post("/analyze/md/outline")
 async def get_markdown_outline(
    file: UploadFile = File(...)
 ):
@@ -323,9 +332,154 @@ async def get_markdown_outline(
            result = await markdown_ai_service.extract_outline(tmp_path)
            return result
        finally:
-            if os.path.exists(tmp_path):
-                os.unlink(tmp_path)
+            # 清理临时文件，确保在所有情况下都能清理
+            try:
+                if tmp_path and os.path.exists(tmp_path):
+                    os.unlink(tmp_path)
+            except Exception as cleanup_error:
+                logger.warning(f"临时文件清理失败: {tmp_path}, error: {cleanup_error}")

    except Exception as e:
        logger.error(f"获取 Markdown 大纲失败: {str(e)}")
        raise HTTPException(status_code=500, detail=f"获取大纲失败: {str(e)}")
+
+
+@router.post("/analyze/txt")
+async def analyze_txt(
+    file: UploadFile = File(...),
+):
+    """
+    上传并使用 AI 分析 TXT 文本文件，提取结构化数据
+
+    将非结构化文本转换为结构化表格数据，便于后续填表使用
+
+    Args:
+        file: 上传的 TXT 文件
+
+    Returns:
+        dict: 分析结果，包含结构化表格数据
+    """
+    if not file.filename:
+        raise HTTPException(status_code=400, detail="文件名为空")
+
+    file_ext = file.filename.split('.')[-1].lower()
+    if file_ext not in ['txt', 'text']:
+        raise HTTPException(
+            status_code=400,
+            detail=f"不支持的文件类型: {file_ext}，仅支持 .txt"
+        )
+
+    try:
+        # 读取文件内容
+        content = await file.read()
+
+        # 保存到临时文件
+        with tempfile.NamedTemporaryFile(mode='wb', suffix='.txt', delete=False) as tmp:
+            tmp.write(content)
+            tmp_path = tmp.name
+
+        try:
+            logger.info(f"开始 AI 分析 TXT 文件: {file.filename}")
+
+            # 使用 template_fill_service 的 AI 分析方法
+            result = await template_fill_service.analyze_txt_with_ai(
+                content=content.decode('utf-8', errors='replace'),
+                filename=file.filename
+            )
+
+            if result:
+                logger.info(f"TXT AI 分析成功: {file.filename}")
+                return {
+                    "success": True,
+                    "filename": file.filename,
+                    "structured_data": result
+                }
+            else:
+                logger.warning(f"TXT AI 分析返回空结果: {file.filename}")
+                return {
+                    "success": False,
+                    "filename": file.filename,
+                    "error": "AI 分析未能提取到结构化数据",
+                    "structured_data": None
+                }
+
+        finally:
+            # 清理临时文件
+            if os.path.exists(tmp_path):
+                os.unlink(tmp_path)
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"TXT AI 分析过程中出错: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"分析失败: {str(e)}")
+
+
+# ==================== Word 文档 AI 解析 ====================
+
+@router.post("/analyze/word")
+async def analyze_word(
+    file: UploadFile = File(...),
+    user_hint: str = Query("", description="用户提示词，如'请提取表格数据'")
+):
+    """
+    使用 AI 解析 Word 文档，提取结构化数据
+
+    适用于从非结构化的 Word 文档中提取表格数据、键值对等信息
+
+    Args:
+        file: 上传的 Word 文件
+        user_hint: 用户提示词
+
+    Returns:
+        dict: 包含结构化数据的解析结果
+    """
+    if not file.filename:
+        raise HTTPException(status_code=400, detail="文件名为空")
+
+    file_ext = file.filename.split('.')[-1].lower()
+    if file_ext not in ['docx']:
+        raise HTTPException(
+            status_code=400,
+            detail=f"不支持的文件类型: {file_ext}，仅支持 .docx"
+        )
+
+    try:
+        # 保存上传的文件
+        content = await file.read()
+        suffix = f".{file_ext}"
+        with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp:
+            tmp.write(content)
+            tmp_path = tmp.name
+
+        try:
+            # 使用 AI 解析 Word 文档
+            result = await word_ai_service.parse_word_with_ai(
+                file_path=tmp_path,
+                user_hint=user_hint or "请提取文档中的所有结构化数据，包括表格、键值对等"
+            )
+
+            if result.get("success"):
+                return {
+                    "success": True,
+                    "filename": file.filename,
+                    "result": result
+                }
+            else:
+                return {
+                    "success": False,
+                    "filename": file.filename,
+                    "error": result.get("error", "AI 解析失败"),
+                    "result": None
+                }
+
+        finally:
+            # 清理临时文件
+            if os.path.exists(tmp_path):
+                os.unlink(tmp_path)
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Word AI 分析过程中出错: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"分析失败: {str(e)}")
--- a/backend/app/api/endpoints/documents.py
+++ b/backend/app/api/endpoints/documents.py
@@ -23,6 +23,52 @@ logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/upload", tags=["文档上传"])


+# ==================== 辅助函数 ====================
+
+async def update_task_status(
+    task_id: str,
+    status: str,
+    progress: int = 0,
+    message: str = "",
+    result: dict = None,
+    error: str = None
+):
+    """
+    更新任务状态，同时写入 Redis 和 MongoDB
+
+    Args:
+        task_id: 任务ID
+        status: 状态
+        progress: 进度
+        message: 消息
+        result: 结果
+        error: 错误信息
+    """
+    meta = {"progress": progress, "message": message}
+    if result:
+        meta["result"] = result
+    if error:
+        meta["error"] = error
+
+    # 尝试写入 Redis
+    try:
+        await redis_db.set_task_status(task_id, status, meta)
+    except Exception as e:
+        logger.warning(f"Redis 任务状态更新失败: {e}")
+
+    # 尝试写入 MongoDB（作为备用）
+    try:
+        await mongodb.update_task(
+            task_id=task_id,
+            status=status,
+            message=message,
+            result=result,
+            error=error
+        )
+    except Exception as e:
+        logger.warning(f"MongoDB 任务状态更新失败: {e}")
+
+
 # ==================== 请求/响应模型 ====================

 class UploadResponse(BaseModel):
@@ -77,6 +123,17 @@ async def upload_document(
    task_id = str(uuid.uuid4())

    try:
+        # 保存任务记录到 MongoDB（如果 Redis 不可用时仍能查询）
+        try:
+            await mongodb.insert_task(
+                task_id=task_id,
+                task_type="document_parse",
+                status="pending",
+                message=f"文档 {file.filename} 已提交处理"
+            )
+        except Exception as mongo_err:
+            logger.warning(f"MongoDB 保存任务记录失败: {mongo_err}")
+
        content = await file.read()
        saved_path = file_service.save_uploaded_file(
            content,
@@ -122,6 +179,17 @@ async def upload_documents(
    saved_paths = []

    try:
+        # 保存任务记录到 MongoDB
+        try:
+            await mongodb.insert_task(
+                task_id=task_id,
+                task_type="batch_parse",
+                status="pending",
+                message=f"已提交 {len(files)} 个文档处理"
+            )
+        except Exception as mongo_err:
+            logger.warning(f"MongoDB 保存批量任务记录失败: {mongo_err}")
+
        for file in files:
            if not file.filename:
                continue
@@ -159,9 +227,9 @@ async def process_document(
    """处理单个文档"""
    try:
        # 状态: 解析中
-        await redis_db.set_task_status(
+        await update_task_status(
            task_id, status="processing",
-            meta={"progress": 10, "message": "正在解析文档"}
+            progress=10, message="正在解析文档"
        )

        # 解析文档
@@ -172,9 +240,9 @@ async def process_document(
            raise Exception(result.error or "解析失败")

        # 状态: 存储中
-        await redis_db.set_task_status(
+        await update_task_status(
            task_id, status="processing",
-            meta={"progress": 30, "message": "正在存储数据"}
+            progress=30, message="正在存储数据"
        )

        # 存储到 MongoDB
@@ -191,9 +259,9 @@ async def process_document(

        # 如果是 Excel，存储到 MySQL + AI生成描述 + RAG索引
        if doc_type in ["xlsx", "xls"]:
-            await redis_db.set_task_status(
+            await update_task_status(
                task_id, status="processing",
-                meta={"progress": 50, "message": "正在存储到MySQL并生成字段描述"}
+                progress=50, message="正在存储到MySQL并生成字段描述"
            )

            try:
@@ -215,9 +283,9 @@ async def process_document(

        else:
            # 非结构化文档
-            await redis_db.set_task_status(
+            await update_task_status(
                task_id, status="processing",
-                meta={"progress": 60, "message": "正在建立索引"}
+                progress=60, message="正在建立索引"
            )

            # 如果文档中有表格数据，提取并存储到 MySQL + RAG
@@ -238,17 +306,13 @@ async def process_document(
            await index_document_to_rag(doc_id, original_filename, result, doc_type)

        # 完成
-        await redis_db.set_task_status(
+        await update_task_status(
            task_id, status="success",
-            meta={
-                "progress": 100,
-                "message": "处理完成",
+            progress=100, message="处理完成",
+            result={
                "doc_id": doc_id,
-                "result": {
-                    "doc_id": doc_id,
-                    "doc_type": doc_type,
-                    "filename": original_filename
-                }
+                "doc_type": doc_type,
+                "filename": original_filename
            }
        )

@@ -256,18 +320,19 @@ async def process_document(

    except Exception as e:
        logger.error(f"文档处理失败: {str(e)}")
-        await redis_db.set_task_status(
+        await update_task_status(
            task_id, status="failure",
-            meta={"error": str(e)}
+            progress=0, message="处理失败",
+            error=str(e)
        )


 async def process_documents_batch(task_id: str, files: List[dict]):
    """批量处理文档"""
    try:
-        await redis_db.set_task_status(
+        await update_task_status(
            task_id, status="processing",
-            meta={"progress": 0, "message": "开始批量处理"}
+            progress=0, message="开始批量处理"
        )

        results = []
@@ -318,37 +383,43 @@ async def process_documents_batch(task_id: str, files: List[dict]):
                results.append({"filename": file_info["filename"], "success": False, "error": str(e)})

            progress = int((i + 1) / len(files) * 100)
-            await redis_db.set_task_status(
+            await update_task_status(
                task_id, status="processing",
-                meta={"progress": progress, "message": f"已处理 {i+1}/{len(files)}"}
+                progress=progress, message=f"已处理 {i+1}/{len(files)}"
            )

-        await redis_db.set_task_status(
+        await update_task_status(
            task_id, status="success",
-            meta={"progress": 100, "message": "批量处理完成", "results": results}
+            progress=100, message="批量处理完成",
+            result={"results": results}
        )

    except Exception as e:
        logger.error(f"批量处理失败: {str(e)}")
-        await redis_db.set_task_status(
+        await update_task_status(
            task_id, status="failure",
-            meta={"error": str(e)}
+            progress=0, message="批量处理失败",
+            error=str(e)
        )


 async def index_document_to_rag(doc_id: str, filename: str, result: ParseResult, doc_type: str):
-    """将非结构化文档索引到 RAG"""
+    """将非结构化文档索引到 RAG（使用分块索引）"""
    try:
        content = result.data.get("content", "")
        if content:
+            # 将完整内容传递给 RAG 服务自动分块索引
            rag_service.index_document_content(
                doc_id=doc_id,
-                content=content[:5000],
+                content=content,  # 传递完整内容，由 RAG 服务自动分块
                metadata={
                    "filename": filename,
                    "doc_type": doc_type
-                }
+                },
+                chunk_size=500,  # 每块 500 字符
+                chunk_overlap=50  # 块之间 50 字符重叠
            )
+            logger.info(f"RAG 索引完成: {filename}, doc_id={doc_id}")
    except Exception as e:
        logger.warning(f"RAG 索引失败: {str(e)}")

--- a/backend/app/api/endpoints/health.py
+++ b/backend/app/api/endpoints/health.py
@@ -19,26 +19,43 @@ async def health_check() -> Dict[str, Any]:
    返回各数据库连接状态和应用信息
    """
    # 检查各数据库连接状态
-    mysql_status = "connected"
-    mongodb_status = "connected"
-    redis_status = "connected"
+    mysql_status = "unknown"
+    mongodb_status = "unknown"
+    redis_status = "unknown"

    try:
        if mysql_db.async_engine is None:
            mysql_status = "disconnected"
-    except Exception:
+        else:
+            # 实际执行一次查询验证连接
+            from sqlalchemy import text
+            async with mysql_db.async_engine.connect() as conn:
+                await conn.execute(text("SELECT 1"))
+            mysql_status = "connected"
+    except Exception as e:
+        logger.warning(f"MySQL 健康检查失败: {e}")
        mysql_status = "error"

    try:
        if mongodb.client is None:
            mongodb_status = "disconnected"
-    except Exception:
+        else:
+            # 实际 ping 验证
+            await mongodb.client.admin.command('ping')
+            mongodb_status = "connected"
+    except Exception as e:
+        logger.warning(f"MongoDB 健康检查失败: {e}")
        mongodb_status = "error"

    try:
-        if not redis_db.is_connected:
+        if not redis_db.is_connected or redis_db.client is None:
            redis_status = "disconnected"
-    except Exception:
+        else:
+            # 实际执行 ping 验证
+            await redis_db.client.ping()
+            redis_status = "connected"
+    except Exception as e:
+        logger.warning(f"Redis 健康检查失败: {e}")
        redis_status = "error"

    return {
--- a/backend/app/api/endpoints/instruction.py
+++ b/backend/app/api/endpoints/instruction.py
@@ -0,0 +1,439 @@
+"""
+智能指令 API 接口
+
+支持自然语言指令解析和执行
+"""
+import logging
+import uuid
+from typing import Any, Dict, List, Optional
+
+from fastapi import APIRouter, HTTPException, Query, BackgroundTasks
+from pydantic import BaseModel
+
+from app.instruction.intent_parser import intent_parser
+from app.instruction.executor import instruction_executor
+from app.core.database import mongodb
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/instruction", tags=["智能指令"])
+
+
+# ==================== 请求/响应模型 ====================
+
+class InstructionRequest(BaseModel):
+    instruction: str
+    doc_ids: Optional[List[str]] = None  # 关联的文档 ID 列表
+    context: Optional[Dict[str, Any]] = None  # 额外上下文
+
+
+class IntentRecognitionResponse(BaseModel):
+    success: bool
+    intent: str
+    params: Dict[str, Any]
+    message: str
+
+
+class InstructionExecutionResponse(BaseModel):
+    success: bool
+    intent: str
+    result: Dict[str, Any]
+    message: str
+
+
+# ==================== 接口 ====================
+
+@router.post("/recognize", response_model=IntentRecognitionResponse)
+async def recognize_intent(request: InstructionRequest):
+    """
+    意图识别接口
+
+    将自然语言指令解析为结构化的意图和参数
+
+    示例指令:
+    - "提取文档中的医院数量和床位数"
+    - "根据这些数据填表"
+    - "总结一下这份文档"
+    - "对比这两个文档的差异"
+    """
+    try:
+        intent, params = await intent_parser.parse(request.instruction)
+
+        # 添加文档关联信息
+        if request.doc_ids:
+            params["document_refs"] = [f"doc_{doc_id}" for doc_id in request.doc_ids]
+
+        intent_names = {
+            "extract": "信息提取",
+            "fill_table": "表格填写",
+            "summarize": "摘要总结",
+            "question": "智能问答",
+            "search": "文档搜索",
+            "compare": "对比分析",
+            "transform": "格式转换",
+            "edit": "文档编辑",
+            "unknown": "未知"
+        }
+
+        return IntentRecognitionResponse(
+            success=True,
+            intent=intent,
+            params=params,
+            message=f"识别到意图: {intent_names.get(intent, intent)}"
+        )
+
+    except Exception as e:
+        logger.error(f"意图识别失败: {e}")
+        return IntentRecognitionResponse(
+            success=False,
+            intent="error",
+            params={},
+            message=f"意图识别失败: {str(e)}"
+        )
+
+
+@router.post("/execute")
+async def execute_instruction(
+    background_tasks: BackgroundTasks,
+    request: InstructionRequest,
+    async_execute: bool = Query(False, description="是否异步执行（仅返回任务ID）")
+):
+    """
+    指令执行接口
+
+    解析并执行自然语言指令
+
+    示例:
+    - 指令: "提取文档1中的医院数量"
+      返回: {"extracted_data": {"医院数量": ["38710个"]}}
+
+    - 指令: "填表"
+      返回: {"filled_data": {...}}
+
+    设置 async_execute=true 可异步执行，返回任务ID用于查询进度
+    """
+    task_id = str(uuid.uuid4())
+
+    if async_execute:
+        # 异步模式：立即返回任务ID，后台执行
+        background_tasks.add_task(
+            _execute_instruction_task,
+            task_id=task_id,
+            instruction=request.instruction,
+            doc_ids=request.doc_ids,
+            context=request.context
+        )
+
+        return {
+            "success": True,
+            "task_id": task_id,
+            "message": "指令已提交执行",
+            "status_url": f"/api/v1/tasks/{task_id}"
+        }
+
+    # 同步模式：等待执行完成
+    return await _execute_instruction_task(task_id, request.instruction, request.doc_ids, request.context)
+
+
+async def _execute_instruction_task(
+    task_id: str,
+    instruction: str,
+    doc_ids: Optional[List[str]],
+    context: Optional[Dict[str, Any]]
+) -> InstructionExecutionResponse:
+    """执行指令的后台任务"""
+    from app.core.database import redis_db, mongodb as mongo_client
+
+    try:
+        # 记录任务
+        try:
+            await mongo_client.insert_task(
+                task_id=task_id,
+                task_type="instruction_execute",
+                status="processing",
+                message="正在执行指令"
+            )
+        except Exception:
+            pass
+
+        # 构建执行上下文
+        ctx: Dict[str, Any] = context or {}
+
+        # 如果提供了文档 ID，获取文档内容
+        if doc_ids:
+            docs = []
+            for doc_id in doc_ids:
+                doc = await mongo_client.get_document(doc_id)
+                if doc:
+                    docs.append(doc)
+
+            if docs:
+                ctx["source_docs"] = docs
+                logger.info(f"指令执行上下文: 关联了 {len(docs)} 个文档")
+
+        # 执行指令
+        result = await instruction_executor.execute(instruction, ctx)
+
+        # 更新任务状态
+        try:
+            await mongo_client.update_task(
+                task_id=task_id,
+                status="success",
+                message="执行完成",
+                result=result
+            )
+        except Exception:
+            pass
+
+        return InstructionExecutionResponse(
+            success=result.get("success", False),
+            intent=result.get("intent", "unknown"),
+            result=result,
+            message=result.get("message", "执行完成")
+        )
+
+    except Exception as e:
+        logger.error(f"指令执行失败: {e}")
+        try:
+            await mongo_client.update_task(
+                task_id=task_id,
+                status="failure",
+                message="执行失败",
+                error=str(e)
+            )
+        except Exception:
+            pass
+
+        return InstructionExecutionResponse(
+            success=False,
+            intent="error",
+            result={"error": str(e)},
+            message=f"指令执行失败: {str(e)}"
+        )
+
+
+@router.post("/chat")
+async def instruction_chat(
+    background_tasks: BackgroundTasks,
+    request: InstructionRequest,
+    async_execute: bool = Query(False, description="是否异步执行（仅返回任务ID）")
+):
+    """
+    指令对话接口
+
+    支持多轮对话的指令执行
+
+    示例对话流程:
+    1. 用户: "上传一些文档"
+    2. 系统: "请上传文档"
+    3. 用户: "提取其中的医院数量"
+    4. 系统: 返回提取结果
+
+    设置 async_execute=true 可异步执行，返回任务ID用于查询进度
+    """
+    task_id = str(uuid.uuid4())
+
+    if async_execute:
+        # 异步模式：立即返回任务ID，后台执行
+        background_tasks.add_task(
+            _execute_chat_task,
+            task_id=task_id,
+            instruction=request.instruction,
+            doc_ids=request.doc_ids,
+            context=request.context
+        )
+
+        return {
+            "success": True,
+            "task_id": task_id,
+            "message": "指令已提交执行",
+            "status_url": f"/api/v1/tasks/{task_id}"
+        }
+
+    # 同步模式：等待执行完成
+    return await _execute_chat_task(task_id, request.instruction, request.doc_ids, request.context)
+
+
+async def _execute_chat_task(
+    task_id: str,
+    instruction: str,
+    doc_ids: Optional[List[str]],
+    context: Optional[Dict[str, Any]]
+):
+    """执行指令对话的后台任务"""
+    from app.core.database import mongodb as mongo_client
+
+    try:
+        # 记录任务
+        try:
+            await mongo_client.insert_task(
+                task_id=task_id,
+                task_type="instruction_chat",
+                status="processing",
+                message="正在处理对话"
+            )
+        except Exception:
+            pass
+
+        # 构建上下文
+        ctx: Dict[str, Any] = context or {}
+
+        # 获取关联文档
+        if doc_ids:
+            docs = []
+            for doc_id in doc_ids:
+                doc = await mongo_client.get_document(doc_id)
+                if doc:
+                    docs.append(doc)
+            if docs:
+                ctx["source_docs"] = docs
+
+        # 执行指令
+        result = await instruction_executor.execute(instruction, ctx)
+
+        # 根据意图类型添加友好的响应消息
+        response_messages = {
+            "extract": f"已提取 {len(result.get('extracted_data', {}))} 个字段的数据",
+            "fill_table": f"填表完成，填写了 {len(result.get('result', {}).get('filled_data', {}))} 个字段",
+            "summarize": "已生成文档摘要",
+            "question": "已找到相关答案",
+            "search": f"找到 {len(result.get('results', []))} 条相关内容",
+            "compare": f"对比了 {len(result.get('comparison', []))} 个文档",
+            "edit": "编辑操作已完成",
+            "transform": "格式转换已完成",
+            "unknown": "无法理解该指令，请尝试更明确的描述"
+        }
+
+        response = {
+            "success": result.get("success", False),
+            "intent": result.get("intent", "unknown"),
+            "result": result,
+            "message": response_messages.get(result.get("intent", ""), result.get("message", "")),
+            "hint": _get_intent_hint(result.get("intent", ""))
+        }
+
+        # 更新任务状态
+        try:
+            await mongo_client.update_task(
+                task_id=task_id,
+                status="success",
+                message="处理完成",
+                result=response
+            )
+        except Exception:
+            pass
+
+        return response
+
+    except Exception as e:
+        logger.error(f"指令对话失败: {e}")
+        try:
+            await mongo_client.update_task(
+                task_id=task_id,
+                status="failure",
+                message="处理失败",
+                error=str(e)
+            )
+        except Exception:
+            pass
+
+        return {
+            "success": False,
+            "error": str(e),
+            "message": f"处理失败: {str(e)}"
+        }
+
+
+def _get_intent_hint(intent: str) -> Optional[str]:
+    """根据意图返回下一步提示"""
+    hints = {
+        "extract": "您可以继续说 '提取更多字段' 或 '将数据填入表格'",
+        "fill_table": "您可以提供表格模板或说 '帮我创建一个表格'",
+        "question": "您可以继续提问或说 '总结一下这些内容'",
+        "search": "您可以查看搜索结果或说 '对比这些内容'",
+        "unknown": "您可以尝试: '提取数据'、'填表'、'总结'、'问答' 等指令"
+    }
+    return hints.get(intent)
+
+
+@router.get("/intents")
+async def list_supported_intents():
+    """
+    获取支持的意图类型列表
+
+    返回所有可用的自然语言指令类型
+    """
+    return {
+        "intents": [
+            {
+                "intent": "extract",
+                "name": "信息提取",
+                "examples": [
+                    "提取文档中的医院数量",
+                    "抽取所有机构的名称",
+                    "找出表格中的数据"
+                ],
+                "params": ["field_refs", "document_refs"]
+            },
+            {
+                "intent": "fill_table",
+                "name": "表格填写",
+                "examples": [
+                    "填表",
+                    "根据这些数据填写表格",
+                    "帮我填到Excel里"
+                ],
+                "params": ["template", "document_refs"]
+            },
+            {
+                "intent": "summarize",
+                "name": "摘要总结",
+                "examples": [
+                    "总结一下这份文档",
+                    "生成摘要",
+                    "概括主要内容"
+                ],
+                "params": ["document_refs"]
+            },
+            {
+                "intent": "question",
+                "name": "智能问答",
+                "examples": [
+                    "这段话说的是什么？",
+                    "有多少家医院？",
+                    "解释一下这个概念"
+                ],
+                "params": ["question", "focus"]
+            },
+            {
+                "intent": "search",
+                "name": "文档搜索",
+                "examples": [
+                    "搜索相关内容",
+                    "找找看有哪些机构",
+                    "查询医院相关的数据"
+                ],
+                "params": ["field_refs", "question"]
+            },
+            {
+                "intent": "compare",
+                "name": "对比分析",
+                "examples": [
+                    "对比这两个文档",
+                    "比较一下差异",
+                    "找出不同点"
+                ],
+                "params": ["document_refs"]
+            },
+            {
+                "intent": "edit",
+                "name": "文档编辑",
+                "examples": [
+                    "润色这段文字",
+                    "修改格式",
+                    "添加注释"
+                ],
+                "params": []
+            }
+        ]
+    }
--- a/backend/app/api/endpoints/tasks.py
+++ b/backend/app/api/endpoints/tasks.py
@@ -1,13 +1,13 @@
 """
 任务管理 API 接口

-提供异步任务状态查询
+提供异步任务状态查询和历史记录
 """
 from typing import Optional

 from fastapi import APIRouter, HTTPException

-from app.core.database import redis_db
+from app.core.database import redis_db, mongodb

 router = APIRouter(prefix="/tasks", tags=["任务管理"])

@@ -23,25 +23,94 @@ async def get_task_status(task_id: str):
    Returns:
        任务状态信息
    """
+    # 优先从 Redis 获取
    status = await redis_db.get_task_status(task_id)

-    if not status:
-        # Redis不可用时，假设任务已完成（文档已成功处理）
-        # 前端轮询时会得到这个响应
+    if status:
        return {
            "task_id": task_id,
-            "status": "success",
-            "progress": 100,
-            "message": "任务处理完成",
-            "result": None,
-            "error": None
+            "status": status.get("status", "unknown"),
+            "progress": status.get("meta", {}).get("progress", 0),
+            "message": status.get("meta", {}).get("message"),
+            "result": status.get("meta", {}).get("result"),
+            "error": status.get("meta", {}).get("error")
        }

+    # Redis 不可用时，尝试从 MongoDB 获取
+    mongo_task = await mongodb.get_task(task_id)
+    if mongo_task:
+        return {
+            "task_id": mongo_task.get("task_id"),
+            "status": mongo_task.get("status", "unknown"),
+            "progress": 100 if mongo_task.get("status") == "success" else 0,
+            "message": mongo_task.get("message"),
+            "result": mongo_task.get("result"),
+            "error": mongo_task.get("error")
+        }
+
+    # 任务不存在或状态未知
    return {
        "task_id": task_id,
-        "status": status.get("status", "unknown"),
-        "progress": status.get("meta", {}).get("progress", 0),
-        "message": status.get("meta", {}).get("message"),
-        "result": status.get("meta", {}).get("result"),
-        "error": status.get("meta", {}).get("error")
+        "status": "unknown",
+        "progress": 0,
+        "message": "无法获取任务状态（Redis和MongoDB均不可用）",
+        "result": None,
+        "error": None
    }
+
+
+@router.get("/")
+async def list_tasks(limit: int = 50, skip: int = 0):
+    """
+    获取任务历史列表
+
+    Args:
+        limit: 返回数量限制
+        skip: 跳过数量
+
+    Returns:
+        任务列表
+    """
+    try:
+        tasks = await mongodb.list_tasks(limit=limit, skip=skip)
+        return {
+            "success": True,
+            "tasks": tasks,
+            "count": len(tasks)
+        }
+    except Exception as e:
+        # MongoDB 不可用时返回空列表
+        return {
+            "success": False,
+            "tasks": [],
+            "count": 0,
+            "error": str(e)
+        }
+
+
+@router.delete("/{task_id}")
+async def delete_task(task_id: str):
+    """
+    删除任务
+
+    Args:
+        task_id: 任务ID
+
+    Returns:
+        是否删除成功
+    """
+    try:
+        # 从 Redis 删除
+        if redis_db._connected and redis_db.client:
+            key = f"task:{task_id}"
+            await redis_db.client.delete(key)
+
+        # 从 MongoDB 删除
+        deleted = await mongodb.delete_task(task_id)
+
+        return {
+            "success": True,
+            "deleted": deleted
+        }
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"删除任务失败: {str(e)}")
--- a/backend/app/api/endpoints/templates.py
+++ b/backend/app/api/endpoints/templates.py
@@ -5,21 +5,62 @@
 """
 import io
 import logging
+import uuid
 from typing import List, Optional

-from fastapi import APIRouter, File, HTTPException, Query, UploadFile
+from fastapi import APIRouter, File, HTTPException, Query, UploadFile, BackgroundTasks
 from fastapi.responses import StreamingResponse
 import pandas as pd
 from pydantic import BaseModel

 from app.services.template_fill_service import template_fill_service, TemplateField
 from app.services.file_service import file_service
+from app.core.database import mongodb
+from app.core.document_parser import ParserFactory

 logger = logging.getLogger(__name__)

 router = APIRouter(prefix="/templates", tags=["表格模板"])


+# ==================== 辅助函数 ====================
+
+async def update_task_status(
+    task_id: str,
+    status: str,
+    progress: int = 0,
+    message: str = "",
+    result: dict = None,
+    error: str = None
+):
+    """
+    更新任务状态，同时写入 Redis 和 MongoDB
+    """
+    from app.core.database import redis_db
+
+    meta = {"progress": progress, "message": message}
+    if result:
+        meta["result"] = result
+    if error:
+        meta["error"] = error
+
+    try:
+        await redis_db.set_task_status(task_id, status, meta)
+    except Exception as e:
+        logger.warning(f"Redis 任务状态更新失败: {e}")
+
+    try:
+        await mongodb.update_task(
+            task_id=task_id,
+            status=status,
+            message=message,
+            result=result,
+            error=error
+        )
+    except Exception as e:
+        logger.warning(f"MongoDB 任务状态更新失败: {e}")
+
+
 # ==================== 请求/响应模型 ====================

 class TemplateFieldRequest(BaseModel):
@@ -38,6 +79,7 @@ class FillRequest(BaseModel):
    source_doc_ids: Optional[List[str]] = None  # MongoDB 文档 ID 列表
    source_file_paths: Optional[List[str]] = None  # 源文档文件路径列表
    user_hint: Optional[str] = None
+    task_id: Optional[str] = None  # 可选的任务ID，用于任务历史跟踪


 class ExportRequest(BaseModel):
@@ -109,6 +151,240 @@ async def upload_template(
        raise HTTPException(status_code=500, detail=f"上传失败: {str(e)}")


+@router.post("/upload-joint")
+async def upload_joint_template(
+    background_tasks: BackgroundTasks,
+    template_file: UploadFile = File(..., description="模板文件"),
+    source_files: List[UploadFile] = File(..., description="源文档文件列表"),
+):
+    """
+    联合上传模板和源文档，一键完成解析和存储
+
+    1. 保存模板文件并提取字段
+    2. 异步处理源文档（解析+存MongoDB）
+    3. 返回模板信息和源文档ID列表
+
+    Args:
+        template_file: 模板文件 (xlsx/xls/docx)
+        source_files: 源文档列表 (docx/xlsx/md/txt)
+
+    Returns:
+        模板ID、字段列表、源文档ID列表
+    """
+    if not template_file.filename:
+        raise HTTPException(status_code=400, detail="模板文件名为空")
+
+    # 验证模板格式
+    template_ext = template_file.filename.split('.')[-1].lower()
+    if template_ext not in ['xlsx', 'xls', 'docx']:
+        raise HTTPException(
+            status_code=400,
+            detail=f"不支持的模板格式: {template_ext}，仅支持 xlsx/xls/docx"
+        )
+
+    # 验证源文档格式
+    valid_exts = ['docx', 'xlsx', 'xls', 'md', 'txt']
+    for sf in source_files:
+        if sf.filename:
+            sf_ext = sf.filename.split('.')[-1].lower()
+            if sf_ext not in valid_exts:
+                raise HTTPException(
+                    status_code=400,
+                    detail=f"不支持的源文档格式: {sf_ext}，仅支持 docx/xlsx/xls/md/txt"
+                )
+
+    try:
+        # 1. 保存模板文件
+        template_content = await template_file.read()
+        template_path = file_service.save_uploaded_file(
+            template_content,
+            template_file.filename,
+            subfolder="templates"
+        )
+
+        # 2. 保存并解析源文档 - 提取内容用于生成表头
+        source_file_info = []
+        source_contents = []
+        for sf in source_files:
+            if sf.filename:
+                sf_content = await sf.read()
+                sf_ext = sf.filename.split('.')[-1].lower()
+                sf_path = file_service.save_uploaded_file(
+                    sf_content,
+                    sf.filename,
+                    subfolder=sf_ext
+                )
+                source_file_info.append({
+                    "path": sf_path,
+                    "filename": sf.filename,
+                    "ext": sf_ext
+                })
+                # 解析源文档获取内容（用于 AI 生成表头）
+                try:
+                    from app.core.document_parser import ParserFactory
+                    parser = ParserFactory.get_parser(sf_path)
+                    parse_result = parser.parse(sf_path)
+                    if parse_result.success and parse_result.data:
+                        # 获取原始内容
+                        content = parse_result.data.get("content", "")[:5000] if parse_result.data.get("content") else ""
+
+                        # 获取标题（可能在顶层或structured_data内）
+                        titles = parse_result.data.get("titles", [])
+                        if not titles and parse_result.data.get("structured_data"):
+                            titles = parse_result.data.get("structured_data", {}).get("titles", [])
+                        titles = titles[:10] if titles else []
+
+                        # 获取表格数量（可能在顶层或structured_data内）
+                        tables = parse_result.data.get("tables", [])
+                        if not tables and parse_result.data.get("structured_data"):
+                            tables = parse_result.data.get("structured_data", {}).get("tables", [])
+                        tables_count = len(tables) if tables else 0
+
+                        # 获取表格内容摘要（用于 AI 理解源文档结构）
+                        tables_summary = ""
+                        if tables:
+                            tables_summary = "\n【文档中的表格】:\n"
+                            for idx, table in enumerate(tables[:5]):  # 最多5个表格
+                                if isinstance(table, dict):
+                                    headers = table.get("headers", [])
+                                    rows = table.get("rows", [])
+                                    if headers:
+                                        tables_summary += f"表格{idx+1}表头: {', '.join(str(h) for h in headers)}\n"
+                                    if rows:
+                                        tables_summary += f"表格{idx+1}前3行: "
+                                        for row_idx, row in enumerate(rows[:3]):
+                                            if isinstance(row, list):
+                                                tables_summary += " | ".join(str(c) for c in row) + "; "
+                                            elif isinstance(row, dict):
+                                                tables_summary += " | ".join(str(row.get(h, "")) for h in headers if headers) + "; "
+                                        tables_summary += "\n"
+
+                        source_contents.append({
+                            "filename": sf.filename,
+                            "doc_type": sf_ext,
+                            "content": content,
+                            "titles": titles,
+                            "tables_count": tables_count,
+                            "tables_summary": tables_summary
+                        })
+                        logger.info(f"[DEBUG] source_contents built: filename={sf.filename}, content_len={len(content)}, titles_count={len(titles)}, tables_count={tables_count}")
+                        if tables_summary:
+                            logger.info(f"[DEBUG] tables_summary preview: {tables_summary[:300]}")
+                except Exception as e:
+                    logger.warning(f"解析源文档失败 {sf.filename}: {e}")
+
+        # 3. 根据源文档内容生成表头
+        template_fields = await template_fill_service.get_template_fields_from_file(
+            template_path,
+            template_ext,
+            source_contents=source_contents  # 传递源文档内容
+        )
+
+        # 3. 异步处理源文档到MongoDB
+        task_id = str(uuid.uuid4())
+        if source_file_info:
+            # 保存任务记录到 MongoDB
+            try:
+                await mongodb.insert_task(
+                    task_id=task_id,
+                    task_type="source_process",
+                    status="pending",
+                    message=f"开始处理 {len(source_file_info)} 个源文档"
+                )
+            except Exception as mongo_err:
+                logger.warning(f"MongoDB 保存任务记录失败: {mongo_err}")
+
+            background_tasks.add_task(
+                process_source_documents,
+                task_id=task_id,
+                files=source_file_info
+            )
+
+        logger.info(f"联合上传完成: 模板={template_file.filename}, 源文档={len(source_file_info)}个")
+
+        return {
+            "success": True,
+            "template_id": template_path,
+            "filename": template_file.filename,
+            "file_type": template_ext,
+            "fields": [
+                {
+                    "cell": f.cell,
+                    "name": f.name,
+                    "field_type": f.field_type,
+                    "required": f.required,
+                    "hint": f.hint
+                }
+                for f in template_fields
+            ],
+            "field_count": len(template_fields),
+            "source_file_paths": [f["path"] for f in source_file_info],
+            "source_filenames": [f["filename"] for f in source_file_info],
+            "task_id": task_id
+        }
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"联合上传失败: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"联合上传失败: {str(e)}")
+
+
+async def process_source_documents(task_id: str, files: List[dict]):
+    """异步处理源文档，存入MongoDB"""
+    try:
+        await update_task_status(
+            task_id, status="processing",
+            progress=0, message="开始处理源文档"
+        )
+
+        doc_ids = []
+        for i, file_info in enumerate(files):
+            try:
+                parser = ParserFactory.get_parser(file_info["path"])
+                result = parser.parse(file_info["path"])
+
+                if result.success:
+                    doc_id = await mongodb.insert_document(
+                        doc_type=file_info["ext"],
+                        content=result.data.get("content", ""),
+                        metadata={
+                            **result.metadata,
+                            "original_filename": file_info["filename"],
+                            "file_path": file_info["path"]
+                        },
+                        structured_data=result.data.get("structured_data")
+                    )
+                    doc_ids.append(doc_id)
+                    logger.info(f"源文档处理成功: {file_info['filename']}, doc_id: {doc_id}")
+                else:
+                    logger.error(f"源文档解析失败: {file_info['filename']}, error: {result.error}")
+
+            except Exception as e:
+                logger.error(f"源文档处理异常: {file_info['filename']}, error: {str(e)}")
+
+            progress = int((i + 1) / len(files) * 100)
+            await update_task_status(
+                task_id, status="processing",
+                progress=progress, message=f"已处理 {i+1}/{len(files)}"
+            )
+
+        await update_task_status(
+            task_id, status="success",
+            progress=100, message="源文档处理完成",
+            result={"doc_ids": doc_ids}
+        )
+        logger.info(f"所有源文档处理完成: {len(doc_ids)}个")
+
+    except Exception as e:
+        logger.error(f"源文档批量处理失败: {str(e)}")
+        await update_task_status(
+            task_id, status="failure",
+            progress=0, message="源文档处理失败",
+            error=str(e)
+        )
+
+
@router.post("/fields")
 async def extract_template_fields(
    template_id: str = Query(..., description="模板ID/文件路径"),
@@ -164,7 +440,27 @@ async def fill_template(
    Returns:
        填写结果
    """
+    # 生成或使用传入的 task_id
+    task_id = request.task_id or str(uuid.uuid4())
+
    try:
+        # 创建任务记录到 MongoDB
+        try:
+            await mongodb.insert_task(
+                task_id=task_id,
+                task_type="template_fill",
+                status="processing",
+                message=f"开始填表任务: {len(request.template_fields)} 个字段"
+            )
+        except Exception as mongo_err:
+            logger.warning(f"MongoDB 创建任务记录失败: {mongo_err}")
+
+        # 更新进度 - 开始
+        await update_task_status(
+            task_id, "processing",
+            progress=0, message="开始处理..."
+        )
+
        # 转换字段
        fields = [
            TemplateField(
@@ -177,17 +473,51 @@ async def fill_template(
            for f in request.template_fields
        ]

+        # 从 template_id 提取文件类型
+        template_file_type = "xlsx"  # 默认类型
+        if request.template_id:
+            ext = request.template_id.split('.')[-1].lower()
+            if ext in ["xlsx", "xls"]:
+                template_file_type = "xlsx"
+            elif ext == "docx":
+                template_file_type = "docx"
+
+        # 更新进度 - 准备开始填写
+        await update_task_status(
+            task_id, "processing",
+            progress=10, message=f"准备填写 {len(fields)} 个字段..."
+        )
+
        # 执行填写
        result = await template_fill_service.fill_template(
            template_fields=fields,
            source_doc_ids=request.source_doc_ids,
            source_file_paths=request.source_file_paths,
-            user_hint=request.user_hint
+            user_hint=request.user_hint,
+            template_id=request.template_id,
+            template_file_type=template_file_type,
+            task_id=task_id
        )

-        return result
+        # 更新为成功
+        await update_task_status(
+            task_id, "success",
+            progress=100, message="填表完成",
+            result={
+                "field_count": len(fields),
+                "max_rows": result.get("max_rows", 0)
+            }
+        )
+
+        return {**result, "task_id": task_id}

    except Exception as e:
+        # 更新为失败
+        await update_task_status(
+            task_id, "failure",
+            progress=0, message="填表失败",
+            error=str(e)
+        )
        logger.error(f"填写表格失败: {str(e)}")
        raise HTTPException(status_code=500, detail=f"填写失败: {str(e)}")

@@ -280,51 +610,79 @@ async def _export_to_excel(filled_data: dict, template_id: str) -> StreamingResp

 async def _export_to_word(filled_data: dict, template_id: str) -> StreamingResponse:
    """导出为 Word 格式"""
+    import re
+    import tempfile
+    import os
    from docx import Document
    from docx.shared import Pt, RGBColor
    from docx.enum.text import WD_ALIGN_PARAGRAPH

-    doc = Document()
+    def clean_text(text: str) -> str:
+        """清理文本，移除可能导致Word问题的非法字符"""
+        if not text:
+            return ""
+        # 移除控制字符
+        text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', text)
+        return text.strip()

-    # 添加标题
-    title = doc.add_heading('填写结果', level=1)
-    title.alignment = WD_ALIGN_PARAGRAPH.CENTER
+    try:
+        # 先保存到临时文件，再读取到内存，确保文档完整性
+        with tempfile.NamedTemporaryFile(delete=False, suffix='.docx') as tmp_file:
+            tmp_path = tmp_file.name

-    # 添加填写时间和模板信息
-    from datetime import datetime
-    info_para = doc.add_paragraph()
-    info_para.add_run(f"模板ID: {template_id}\n").bold = True
-    info_para.add_run(f"导出时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+        doc = Document()
+        doc.add_heading('填写结果', level=1)

-    doc.add_paragraph()  # 空行
+        from datetime import datetime
+        info_para = doc.add_paragraph()
+        template_filename = template_id.split('/')[-1].split('\\')[-1] if template_id else '未知'
+        info_para.add_run(f"模板文件: {clean_text(template_filename)}\n").bold = True
+        info_para.add_run(f"导出时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+        doc.add_paragraph()

-    # 添加字段表格
-    table = doc.add_table(rows=1, cols=3)
-    table.style = 'Light Grid Accent 1'
+        table = doc.add_table(rows=1, cols=3)
+        table.style = 'Table Grid'

-    # 表头
-    header_cells = table.rows[0].cells
-    header_cells[0].text = '字段名'
-    header_cells[1].text = '填写值'
-    header_cells[2].text = '状态'
+        header_cells = table.rows[0].cells
+        header_cells[0].text = '字段名'
+        header_cells[1].text = '填写值'
+        header_cells[2].text = '状态'

-    for field_name, field_value in filled_data.items():
-        row_cells = table.add_row().cells
-        row_cells[0].text = field_name
-        row_cells[1].text = str(field_value) if field_value else ''
-        row_cells[2].text = '已填写' if field_value else '为空'
+        for field_name, field_value in filled_data.items():
+            row_cells = table.add_row().cells
+            row_cells[0].text = clean_text(str(field_name))

-    # 保存到 BytesIO
-    output = io.BytesIO()
-    doc.save(output)
-    output.seek(0)
+            if isinstance(field_value, list):
+                clean_values = [clean_text(str(v)) for v in field_value if v]
+                display_value = ', '.join(clean_values) if clean_values else ''
+            else:
+                display_value = clean_text(str(field_value)) if field_value else ''

-    filename = f"filled_template.docx"
+            row_cells[1].text = display_value
+            row_cells[2].text = '已填写' if display_value else '为空'
+
+        # 保存到临时文件
+        doc.save(tmp_path)
+
+        # 读取文件内容
+        with open(tmp_path, 'rb') as f:
+            file_content = f.read()
+
+    finally:
+        # 清理临时文件
+        if os.path.exists(tmp_path):
+            try:
+                os.unlink(tmp_path)
+            except:
+                pass
+
+    output = io.BytesIO(file_content)
+    filename = "filled_template.docx"

    return StreamingResponse(
-        io.BytesIO(output.getvalue()),
+        output,
        media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
-        headers={"Content-Disposition": f"attachment; filename={filename}"}
+        headers={"Content-Disposition": f"attachment; filename*=UTF-8''{filename}"}
    )


--- a/backend/app/api/endpoints/upload.py
+++ b/backend/app/api/endpoints/upload.py
@@ -5,6 +5,7 @@ from fastapi import APIRouter, UploadFile, File, HTTPException, Query
 from fastapi.responses import StreamingResponse
 from typing import Optional
 import logging
+import os
 import pandas as pd
 import io

@@ -126,7 +127,7 @@ async def upload_excel(
                                content += f"... (共 {len(sheet_data['rows'])} 行)\n\n"

            doc_metadata = {
-                "filename": saved_path.split("/")[-1] if "/" in saved_path else saved_path.split("\\")[-1],
+                "filename": os.path.basename(saved_path),
                "original_filename": file.filename,
                "saved_path": saved_path,
                "file_size": len(content),
@@ -253,7 +254,7 @@ async def export_excel(
        output.seek(0)

        # 生成文件名
-        original_name = file_path.split('/')[-1] if '/' in file_path else file_path
+        original_name = os.path.basename(file_path)
        if columns:
            export_name = f"export_{sheet_name or 'data'}_{len(column_list) if columns else 'all'}_cols.xlsx"
        else:
--- a/backend/app/core/database/mongodb.py
+++ b/backend/app/core/database/mongodb.py
@@ -59,6 +59,11 @@ class MongoDB:
        """RAG索引集合 - 存储字段语义索引"""
        return self.db["rag_index"]

+    @property
+    def tasks(self):
+        """任务集合 - 存储任务历史记录"""
+        return self.db["tasks"]
+
    # ==================== 文档操作 ====================

    async def insert_document(
@@ -242,8 +247,128 @@ class MongoDB:
        await self.rag_index.create_index("table_name")
        await self.rag_index.create_index("field_name")

+        # 任务集合索引
+        await self.tasks.create_index("task_id", unique=True)
+        await self.tasks.create_index("created_at")
+
        logger.info("MongoDB 索引创建完成")

+    # ==================== 任务历史操作 ====================
+
+    async def insert_task(
+        self,
+        task_id: str,
+        task_type: str,
+        status: str = "pending",
+        message: str = "",
+        result: Optional[Dict[str, Any]] = None,
+        error: Optional[str] = None,
+    ) -> str:
+        """
+        插入任务记录
+
+        Args:
+            task_id: 任务ID
+            task_type: 任务类型
+            status: 任务状态
+            message: 任务消息
+            result: 任务结果
+            error: 错误信息
+
+        Returns:
+            插入文档的ID
+        """
+        task = {
+            "task_id": task_id,
+            "task_type": task_type,
+            "status": status,
+            "message": message,
+            "result": result,
+            "error": error,
+            "created_at": datetime.utcnow(),
+            "updated_at": datetime.utcnow(),
+        }
+        result_obj = await self.tasks.insert_one(task)
+        return str(result_obj.inserted_id)
+
+    async def update_task(
+        self,
+        task_id: str,
+        status: Optional[str] = None,
+        message: Optional[str] = None,
+        result: Optional[Dict[str, Any]] = None,
+        error: Optional[str] = None,
+    ) -> bool:
+        """
+        更新任务状态
+
+        Args:
+            task_id: 任务ID
+            status: 任务状态
+            message: 任务消息
+            result: 任务结果
+            error: 错误信息
+
+        Returns:
+            是否更新成功
+        """
+        from bson import ObjectId
+
+        update_data = {"updated_at": datetime.utcnow()}
+        if status is not None:
+            update_data["status"] = status
+        if message is not None:
+            update_data["message"] = message
+        if result is not None:
+            update_data["result"] = result
+        if error is not None:
+            update_data["error"] = error
+
+        update_result = await self.tasks.update_one(
+            {"task_id": task_id},
+            {"$set": update_data}
+        )
+        return update_result.modified_count > 0
+
+    async def get_task(self, task_id: str) -> Optional[Dict[str, Any]]:
+        """根据task_id获取任务"""
+        task = await self.tasks.find_one({"task_id": task_id})
+        if task:
+            task["_id"] = str(task["_id"])
+        return task
+
+    async def list_tasks(
+        self,
+        limit: int = 50,
+        skip: int = 0,
+    ) -> List[Dict[str, Any]]:
+        """
+        获取任务列表
+
+        Args:
+            limit: 返回数量
+            skip: 跳过数量
+
+        Returns:
+            任务列表
+        """
+        cursor = self.tasks.find().sort("created_at", -1).skip(skip).limit(limit)
+        tasks = []
+        async for task in cursor:
+            task["_id"] = str(task["_id"])
+            # 转换 datetime 为字符串
+            if task.get("created_at"):
+                task["created_at"] = task["created_at"].isoformat()
+            if task.get("updated_at"):
+                task["updated_at"] = task["updated_at"].isoformat()
+            tasks.append(task)
+        return tasks
+
+    async def delete_task(self, task_id: str) -> bool:
+        """删除任务"""
+        result = await self.tasks.delete_one({"task_id": task_id})
+        return result.deleted_count > 0
+

 # ==================== 全局单例 ====================

--- a/backend/app/core/document_parser/docx_parser.py
+++ b/backend/app/core/document_parser/docx_parser.py
@@ -59,7 +59,13 @@ class DocxParser(BaseParser):
            paragraphs = []
            for para in doc.paragraphs:
                if para.text.strip():
-                    paragraphs.append(para.text)
+                    paragraphs.append({
+                        "text": para.text,
+                        "style": str(para.style.name) if para.style else "Normal"
+                    })
+
+            # 提取段落纯文本（用于 AI 解析）
+            paragraphs_text = [p["text"] for p in paragraphs if p["text"].strip()]

            # 提取表格内容
            tables_data = []
@@ -77,8 +83,25 @@ class DocxParser(BaseParser):
                        "column_count": len(table_rows[0]) if table_rows else 0
                    })

-            # 合并所有文本
-            full_text = "\n".join(paragraphs)
+            # 提取图片/嵌入式对象信息
+            images_info = self._extract_images_info(doc, path)
+
+            # 合并所有文本（包括图片描述）
+            full_text_parts = []
+            full_text_parts.append("【文档正文】")
+            full_text_parts.extend(paragraphs_text)
+
+            if tables_data:
+                full_text_parts.append("\n【文档表格】")
+                for idx, table in enumerate(tables_data):
+                    full_text_parts.append(f"--- 表格 {idx + 1} ---")
+                    for row in table["rows"]:
+                        full_text_parts.append(" | ".join(str(cell) for cell in row))
+
+            if images_info.get("image_count", 0) > 0:
+                full_text_parts.append(f"\n【文档图片】文档包含 {images_info['image_count']} 张图片/图表")
+
+            full_text = "\n".join(full_text_parts)

            # 构建元数据
            metadata = {
@@ -89,7 +112,9 @@ class DocxParser(BaseParser):
                "table_count": len(tables_data),
                "word_count": len(full_text),
                "char_count": len(full_text.replace("\n", "")),
-                "has_tables": len(tables_data) > 0
+                "has_tables": len(tables_data) > 0,
+                "has_images": images_info.get("image_count", 0) > 0,
+                "image_count": images_info.get("image_count", 0)
            }

            # 返回结果
@@ -97,12 +122,16 @@ class DocxParser(BaseParser):
                success=True,
                data={
                    "content": full_text,
-                    "paragraphs": paragraphs,
+                    "paragraphs": paragraphs_text,
+                    "paragraphs_with_style": paragraphs,
                    "tables": tables_data,
+                    "images": images_info,
                    "word_count": len(full_text),
                    "structured_data": {
                        "paragraphs": paragraphs,
-                        "tables": tables_data
+                        "paragraphs_text": paragraphs_text,
+                        "tables": tables_data,
+                        "images": images_info
                    }
                },
                metadata=metadata
@@ -115,6 +144,59 @@ class DocxParser(BaseParser):
                error=f"解析 Word 文档失败: {str(e)}"
            )

+    def extract_images_as_base64(self, file_path: str) -> List[Dict[str, str]]:
+        """
+        提取 Word 文档中的所有图片，返回 base64 编码列表
+
+        Args:
+            file_path: Word 文件路径
+
+        Returns:
+            图片列表，每项包含 base64 编码和图片类型
+        """
+        import zipfile
+        import base64
+        from io import BytesIO
+
+        images = []
+
+        try:
+            with zipfile.ZipFile(file_path, 'r') as zf:
+                # 查找 word/media 目录下的图片文件
+                for filename in zf.namelist():
+                    if filename.startswith('word/media/'):
+                        # 获取图片类型
+                        ext = filename.split('.')[-1].lower()
+                        mime_types = {
+                            'png': 'image/png',
+                            'jpg': 'image/jpeg',
+                            'jpeg': 'image/jpeg',
+                            'gif': 'image/gif',
+                            'bmp': 'image/bmp'
+                        }
+                        mime_type = mime_types.get(ext, 'image/png')
+
+                        try:
+                            # 读取图片数据并转为 base64
+                            image_data = zf.read(filename)
+                            base64_data = base64.b64encode(image_data).decode('utf-8')
+
+                            images.append({
+                                "filename": filename,
+                                "mime_type": mime_type,
+                                "base64": base64_data,
+                                "size": len(image_data)
+                            })
+                            logger.info(f"提取图片: {filename}, 大小: {len(image_data)} bytes")
+                        except Exception as e:
+                            logger.warning(f"提取图片失败 {filename}: {str(e)}")
+
+        except Exception as e:
+            logger.error(f"打开 Word 文档提取图片失败: {str(e)}")
+
+        logger.info(f"共提取 {len(images)} 张图片")
+        return images
+
    def extract_key_sentences(self, text: str, max_sentences: int = 10) -> List[str]:
        """
        从文本中提取关键句子
@@ -268,6 +350,60 @@ class DocxParser(BaseParser):

        return fields

+    def _extract_images_info(self, doc: Document, path: Path) -> Dict[str, Any]:
+        """
+        提取 Word 文档中的图片/嵌入式对象信息
+
+        Args:
+            doc: Document 对象
+            path: 文件路径
+
+        Returns:
+            图片信息字典
+        """
+        import zipfile
+        from io import BytesIO
+
+        image_count = 0
+        image_descriptions = []
+        inline_shapes_count = 0
+
+        try:
+            # 方法1: 通过 inline shapes 统计图片
+            try:
+                inline_shapes_count = len(doc.inline_shapes)
+                if inline_shapes_count > 0:
+                    image_count = inline_shapes_count
+                    image_descriptions.append(f"文档包含 {inline_shapes_count} 个嵌入式图形/图片")
+            except Exception:
+                pass
+
+            # 方法2: 通过 ZIP 分析 document.xml 获取图片引用
+            try:
+                with zipfile.ZipFile(path, 'r') as zf:
+                    # 查找 word/media 目录下的图片文件
+                    media_files = [f for f in zf.namelist() if f.startswith('word/media/')]
+                    if media_files and not inline_shapes_count:
+                        image_count = len(media_files)
+                        image_descriptions.append(f"文档包含 {image_count} 个嵌入图片")
+
+                    # 检查是否有页眉页脚中的图片
+                    header_images = [f for f in zf.namelist() if 'header' in f.lower() and f.endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp'))]
+                    if header_images:
+                        image_descriptions.append(f"页眉/页脚包含 {len(header_images)} 个图片")
+            except Exception:
+                pass
+
+        except Exception as e:
+            logger.warning(f"提取图片信息失败: {str(e)}")
+
+        return {
+            "image_count": image_count,
+            "inline_shapes_count": inline_shapes_count,
+            "descriptions": image_descriptions,
+            "has_images": image_count > 0
+        }
+
    def _infer_field_type_from_hint(self, hint: str) -> str:
        """
        从提示词推断字段类型
--- a/backend/app/core/document_parser/xlsx_parser.py
+++ b/backend/app/core/document_parser/xlsx_parser.py
@@ -317,24 +317,70 @@ class XlsxParser(BaseParser):
        import zipfile
        from xml.etree import ElementTree as ET

+        # 常见的命名空间
+        COMMON_NAMESPACES = [
+            'http://schemas.openxmlformats.org/spreadsheetml/2006/main',
+            'http://schemas.openxmlformats.org/spreadsheetml/2005/main',
+            'http://schemas.openxmlformats.org/spreadsheetml/2004/main',
+            'http://schemas.openxmlformats.org/spreadsheetml/2003/main',
+        ]
+
        try:
            with zipfile.ZipFile(file_path, 'r') as z:
-                if 'xl/workbook.xml' not in z.namelist():
+                # 尝试多种可能的 workbook.xml 路径
+                possible_paths = ['xl/workbook.xml', 'xl\\workbook.xml', 'workbook.xml']
+                content = None
+                for path in possible_paths:
+                    if path in z.namelist():
+                        content = z.read(path)
+                        logger.info(f"找到 workbook.xml at: {path}")
+                        break
+
+                if content is None:
+                    logger.warning(f"未找到 workbook.xml，文件列表: {z.namelist()[:10]}")
                    return []
-                content = z.read('xl/workbook.xml')
+
                root = ET.fromstring(content)

-                # 命名空间
-                ns = {'main': 'http://schemas.openxmlformats.org/spreadsheetml/2006/main'}
-
                sheet_names = []
-                for sheet in root.findall('.//main:sheet', ns):
-                    name = sheet.get('name')
-                    if name:
-                        sheet_names.append(name)
+
+                # 方法1：尝试带命名空间的查找
+                for ns in COMMON_NAMESPACES:
+                    sheet_elements = root.findall(f'.//{{{ns}}}sheet')
+                    if sheet_elements:
+                        for sheet in sheet_elements:
+                            name = sheet.get('name')
+                            if name:
+                                sheet_names.append(name)
+                        if sheet_names:
+                            logger.info(f"使用命名空间 {ns} 提取工作表: {sheet_names}")
+                            return sheet_names
+
+                # 方法2：不使用命名空间，直接查找所有 sheet 元素
+                if not sheet_names:
+                    for elem in root.iter():
+                        if elem.tag.endswith('sheet') and elem.tag != 'sheets':
+                            name = elem.get('name')
+                            if name:
+                                sheet_names.append(name)
+                            for child in elem:
+                                if child.tag.endswith('sheet') or child.tag == 'sheet':
+                                    name = child.get('name')
+                                    if name and name not in sheet_names:
+                                        sheet_names.append(name)
+
+                # 方法3：直接从 XML 文本中正则匹配 sheet name
+                if not sheet_names:
+                    import re
+                    xml_str = content.decode('utf-8', errors='ignore')
+                    matches = re.findall(r'<sheet\s+[^>]*name=["\']([^"\']+)["\']', xml_str, re.IGNORECASE)
+                    if matches:
+                        sheet_names = matches
+                        logger.info(f"使用正则提取工作表: {sheet_names}")

                logger.info(f"从 XML 提取工作表: {sheet_names}")
                return sheet_names
+
        except Exception as e:
            logger.error(f"从 XML 提取工作表名称失败: {e}")
            return []
@@ -356,6 +402,32 @@ class XlsxParser(BaseParser):
        import zipfile
        from xml.etree import ElementTree as ET

+        # 常见的命名空间
+        COMMON_NAMESPACES = [
+            'http://schemas.openxmlformats.org/spreadsheetml/2006/main',
+            'http://schemas.openxmlformats.org/spreadsheetml/2005/main',
+            'http://schemas.openxmlformats.org/spreadsheetml/2004/main',
+            'http://schemas.openxmlformats.org/spreadsheetml/2003/main',
+        ]
+
+        def find_elements_with_ns(root, tag_name):
+            """灵活查找元素，支持任意命名空间"""
+            results = []
+            # 方法1：用固定命名空间
+            for ns in COMMON_NAMESPACES:
+                try:
+                    elems = root.findall(f'.//{{{ns}}}{tag_name}')
+                    if elems:
+                        results.extend(elems)
+                except:
+                    pass
+            # 方法2：不带命名空间查找
+            if not results:
+                for elem in root.iter():
+                    if elem.tag.endswith('}' + tag_name):
+                        results.append(elem)
+            return results
+
        with zipfile.ZipFile(file_path, 'r') as z:
            # 获取工作表名称
            sheet_names = self._extract_sheet_names_from_xml(file_path)
@@ -366,57 +438,68 @@ class XlsxParser(BaseParser):
            target_sheet = sheet_name if sheet_name and sheet_name in sheet_names else sheet_names[0]
            sheet_index = sheet_names.index(target_sheet) + 1  # sheet1.xml, sheet2.xml, ...

-            # 读取 shared strings
+            # 读取 shared strings - 尝试多种路径
            shared_strings = []
-            if 'xl/sharedStrings.xml' in z.namelist():
-                ss_content = z.read('xl/sharedStrings.xml')
-                ss_root = ET.fromstring(ss_content)
-                ns = {'main': 'http://schemas.openxmlformats.org/spreadsheetml/2006/main'}
-                for si in ss_root.findall('.//main:si', ns):
-                    t = si.find('.//main:t', ns)
-                    if t is not None:
-                        shared_strings.append(t.text or '')
-                    else:
-                        shared_strings.append('')
+            ss_paths = ['xl/sharedStrings.xml', 'xl\\sharedStrings.xml', 'sharedStrings.xml']
+            for ss_path in ss_paths:
+                if ss_path in z.namelist():
+                    try:
+                        ss_content = z.read(ss_path)
+                        ss_root = ET.fromstring(ss_content)
+                        for si in find_elements_with_ns(ss_root, 'si'):
+                            t_elements = [c for c in si if c.tag.endswith('}t') or c.tag == 't']
+                            if t_elements:
+                                shared_strings.append(t_elements[0].text or '')
+                            else:
+                                shared_strings.append('')
+                        break
+                    except Exception as e:
+                        logger.warning(f"读取 sharedStrings 失败: {e}")

-            # 读取工作表
-            sheet_file = f'xl/worksheets/sheet{sheet_index}.xml'
-            if sheet_file not in z.namelist():
-                raise ValueError(f"工作表文件 {sheet_file} 不存在")
+            # 读取工作表 - 尝试多种可能的路径
+            sheet_content = None
+            sheet_paths = [
+                f'xl/worksheets/sheet{sheet_index}.xml',
+                f'xl\\worksheets\\sheet{sheet_index}.xml',
+                f'worksheets/sheet{sheet_index}.xml',
+            ]
+            for sp in sheet_paths:
+                if sp in z.namelist():
+                    sheet_content = z.read(sp)
+                    break
+
+            if sheet_content is None:
+                raise ValueError(f"工作表文件 sheet{sheet_index}.xml 不存在")

-            sheet_content = z.read(sheet_file)
            root = ET.fromstring(sheet_content)
-            ns = {'main': 'http://schemas.openxmlformats.org/spreadsheetml/2006/main'}

            # 收集所有行数据
            all_rows = []
            headers = {}

-            for row in root.findall('.//main:row', ns):
+            for row in find_elements_with_ns(root, 'row'):
                row_idx = int(row.get('r', 0))
                row_cells = {}
-                for cell in row.findall('main:c', ns):
+                for cell in find_elements_with_ns(row, 'c'):
                    cell_ref = cell.get('r', '')
                    col_letters = ''.join(filter(str.isalpha, cell_ref))
                    cell_type = cell.get('t', 'n')
-                    v = cell.find('main:v', ns)
+                    v_elements = find_elements_with_ns(cell, 'v')
+                    v = v_elements[0] if v_elements else None

                    if v is not None and v.text:
                        if cell_type == 's':
-                            # shared string
                            try:
                                row_cells[col_letters] = shared_strings[int(v.text)]
                            except (ValueError, IndexError):
                                row_cells[col_letters] = v.text
                        elif cell_type == 'b':
-                            # boolean
                            row_cells[col_letters] = v.text == '1'
                        else:
                            row_cells[col_letters] = v.text
                    else:
                        row_cells[col_letters] = None

-                # 处理表头行
                if row_idx == header_row + 1:
                    headers = {**row_cells}
                elif row_idx > header_row + 1:
@@ -424,7 +507,6 @@ class XlsxParser(BaseParser):

            # 构建 DataFrame
            if headers:
-                # 按原始列顺序排列
                col_order = list(headers.keys())
                df = pd.DataFrame(all_rows)
                if not df.empty:
--- a/backend/app/instruction/init.py
+++ b/backend/app/instruction/init.py
@@ -0,0 +1,14 @@
+"""
+指令执行模块
+
+支持文档智能操作交互，包括意图解析和指令执行
+"""
+from .intent_parser import IntentParser, intent_parser
+from .executor import InstructionExecutor, instruction_executor
+
+__all__ = [
+    "IntentParser",
+    "intent_parser",
+    "InstructionExecutor",
+    "instruction_executor",
+]
--- a/backend/app/instruction/executor.py
+++ b/backend/app/instruction/executor.py
@@ -0,0 +1,572 @@
+"""
+指令执行器模块
+
+将自然语言指令转换为可执行操作
+"""
+import logging
+import json
+from typing import Any, Dict, List, Optional
+
+from app.services.template_fill_service import template_fill_service
+from app.services.rag_service import rag_service
+from app.services.markdown_ai_service import markdown_ai_service
+from app.core.database import mongodb
+
+logger = logging.getLogger(__name__)
+
+
+class InstructionExecutor:
+    """指令执行器"""
+
+    def __init__(self):
+        self.intent_parser = None  # 将通过 set_intent_parser 设置
+
+    def set_intent_parser(self, intent_parser):
+        """设置意图解析器"""
+        self.intent_parser = intent_parser
+
+    async def execute(self, instruction: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
+        """
+        执行指令
+
+        Args:
+            instruction: 自然语言指令
+            context: 执行上下文（包含文档信息等）
+
+        Returns:
+            执行结果
+        """
+        if self.intent_parser is None:
+            from app.instruction.intent_parser import intent_parser
+            self.intent_parser = intent_parser
+
+        context = context or {}
+
+        # 解析意图
+        intent, params = await self.intent_parser.parse(instruction)
+
+        # 根据意图类型执行相应操作
+        if intent == "extract":
+            return await self._execute_extract(params, context)
+        elif intent == "fill_table":
+            return await self._execute_fill_table(params, context)
+        elif intent == "summarize":
+            return await self._execute_summarize(params, context)
+        elif intent == "question":
+            return await self._execute_question(params, context)
+        elif intent == "search":
+            return await self._execute_search(params, context)
+        elif intent == "compare":
+            return await self._execute_compare(params, context)
+        elif intent == "edit":
+            return await self._execute_edit(params, context)
+        elif intent == "transform":
+            return await self._execute_transform(params, context)
+        else:
+            return {
+                "success": False,
+                "error": f"未知意图类型: {intent}",
+                "message": "无法理解该指令，请尝试更明确的描述"
+            }
+
+    async def _execute_extract(self, params: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """执行信息提取"""
+        try:
+            target_fields = params.get("field_refs", [])
+            doc_ids = params.get("document_refs", [])
+
+            if not target_fields:
+                return {
+                    "success": False,
+                    "error": "未指定要提取的字段",
+                    "message": "请明确说明要提取哪些字段，如：'提取医院数量和床位数'"
+                }
+
+            # 如果指定了文档，验证文档存在
+            if doc_ids and "all_docs" not in doc_ids:
+                valid_docs = []
+                for doc_ref in doc_ids:
+                    doc_id = doc_ref.replace("doc_", "")
+                    doc = await mongodb.get_document(doc_id)
+                    if doc:
+                        valid_docs.append(doc)
+                if not valid_docs:
+                    return {
+                        "success": False,
+                        "error": "指定的文档不存在",
+                        "message": "请检查文档编号是否正确"
+                    }
+                context["source_docs"] = valid_docs
+
+            # 构建字段列表
+            fields = []
+            for i, field_name in enumerate(target_fields):
+                fields.append({
+                    "name": field_name,
+                    "cell": f"A{i+1}",
+                    "field_type": "text",
+                    "required": False
+                })
+
+            # 调用填表服务
+            result = await template_fill_service.fill_template(
+                template_fields=fields,
+                source_doc_ids=[doc.get("_id") for doc in context.get("source_docs", [])] if context.get("source_docs") else None,
+                user_hint=f"请提取字段: {', '.join(target_fields)}"
+            )
+
+            return {
+                "success": True,
+                "intent": "extract",
+                "extracted_data": result.get("filled_data", {}),
+                "fields": target_fields,
+                "message": f"成功提取 {len(result.get('filled_data', {}))} 个字段"
+            }
+
+        except Exception as e:
+            logger.error(f"提取执行失败: {e}")
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"提取失败: {str(e)}"
+            }
+
+    async def _execute_fill_table(self, params: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """执行填表操作"""
+        try:
+            template_file = context.get("template_file")
+            if not template_file:
+                return {
+                    "success": False,
+                    "error": "未提供表格模板",
+                    "message": "请先上传要填写的表格模板"
+                }
+
+            # 获取源文档
+            source_docs = context.get("source_docs", [])
+            source_doc_ids = [doc.get("_id") for doc in source_docs if doc.get("_id")]
+
+            # 获取字段
+            fields = context.get("template_fields", [])
+
+            # 调用填表服务
+            result = await template_fill_service.fill_template(
+                template_fields=fields,
+                source_doc_ids=source_doc_ids if source_doc_ids else None,
+                source_file_paths=context.get("source_file_paths"),
+                user_hint=params.get("user_hint"),
+                template_id=template_file if isinstance(template_file, str) else None,
+                template_file_type=params.get("template", {}).get("type", "xlsx")
+            )
+
+            return {
+                "success": True,
+                "intent": "fill_table",
+                "result": result,
+                "message": f"填表完成，成功填写 {len(result.get('filled_data', {}))} 个字段"
+            }
+
+        except Exception as e:
+            logger.error(f"填表执行失败: {e}")
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"填表失败: {str(e)}"
+            }
+
+    async def _execute_summarize(self, params: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """执行摘要总结"""
+        try:
+            docs = context.get("source_docs", [])
+            if not docs:
+                return {
+                    "success": False,
+                    "error": "没有可用的文档",
+                    "message": "请先上传要总结的文档"
+                }
+
+            summaries = []
+            for doc in docs[:5]:  # 最多处理5个文档
+                content = doc.get("content", "")[:5000]  # 限制内容长度
+                if content:
+                    summaries.append({
+                        "filename": doc.get("metadata", {}).get("original_filename", "未知"),
+                        "content_preview": content[:500] + "..." if len(content) > 500 else content
+                    })
+
+            return {
+                "success": True,
+                "intent": "summarize",
+                "summaries": summaries,
+                "message": f"找到 {len(summaries)} 个文档可供参考"
+            }
+
+        except Exception as e:
+            logger.error(f"摘要执行失败: {e}")
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"摘要生成失败: {str(e)}"
+            }
+
+    async def _execute_question(self, params: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """执行问答"""
+        try:
+            question = params.get("question", "")
+            if not question:
+                return {
+                    "success": False,
+                    "error": "未提供问题",
+                    "message": "请输入要回答的问题"
+                }
+
+            # 使用 RAG 检索相关文档
+            docs = context.get("source_docs", [])
+            rag_results = []
+
+            for doc in docs:
+                doc_id = doc.get("_id", "")
+                if doc_id:
+                    results = rag_service.retrieve_by_doc_id(doc_id, top_k=3)
+                    rag_results.extend(results)
+
+            # 构建上下文
+            context_text = "\n\n".join([
+                r.get("content", "") for r in rag_results[:5]
+            ]) if rag_results else ""
+
+            # 如果没有 RAG 结果，使用文档内容
+            if not context_text:
+                context_text = "\n\n".join([
+                    doc.get("content", "")[:3000] for doc in docs[:3] if doc.get("content")
+                ])
+
+            return {
+                "success": True,
+                "intent": "question",
+                "question": question,
+                "context_preview": context_text[:500] + "..." if len(context_text) > 500 else context_text,
+                "message": "已找到相关上下文，可进行问答"
+            }
+
+        except Exception as e:
+            logger.error(f"问答执行失败: {e}")
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"问答处理失败: {str(e)}"
+            }
+
+    async def _execute_search(self, params: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """执行搜索"""
+        try:
+            field_refs = params.get("field_refs", [])
+            query = " ".join(field_refs) if field_refs else params.get("question", "")
+
+            if not query:
+                return {
+                    "success": False,
+                    "error": "未提供搜索关键词",
+                    "message": "请输入要搜索的关键词"
+                }
+
+            # 使用 RAG 检索
+            results = rag_service.retrieve(query, top_k=10, min_score=0.3)
+
+            return {
+                "success": True,
+                "intent": "search",
+                "query": query,
+                "results": [
+                    {
+                        "content": r.get("content", "")[:200],
+                        "score": r.get("score", 0),
+                        "doc_id": r.get("doc_id", "")
+                    }
+                    for r in results[:10]
+                ],
+                "message": f"找到 {len(results)} 条相关结果"
+            }
+
+        except Exception as e:
+            logger.error(f"搜索执行失败: {e}")
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"搜索失败: {str(e)}"
+            }
+
+    async def _execute_compare(self, params: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """执行对比分析"""
+        try:
+            docs = context.get("source_docs", [])
+            if len(docs) < 2:
+                return {
+                    "success": False,
+                    "error": "对比需要至少2个文档",
+                    "message": "请上传至少2个文档进行对比"
+                }
+
+            # 提取文档基本信息
+            comparison = []
+            for i, doc in enumerate(docs[:5]):
+                comparison.append({
+                    "index": i + 1,
+                    "filename": doc.get("metadata", {}).get("original_filename", "未知"),
+                    "doc_type": doc.get("doc_type", "未知"),
+                    "content_length": len(doc.get("content", "")),
+                    "has_tables": bool(doc.get("structured_data", {}).get("tables")),
+                })
+
+            return {
+                "success": True,
+                "intent": "compare",
+                "comparison": comparison,
+                "message": f"对比了 {len(comparison)} 个文档的基本信息"
+            }
+
+        except Exception as e:
+            logger.error(f"对比执行失败: {e}")
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"对比分析失败: {str(e)}"
+            }
+
+    async def _execute_edit(self, params: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """执行文档编辑操作"""
+        try:
+            docs = context.get("source_docs", [])
+            if not docs:
+                return {
+                    "success": False,
+                    "error": "没有可用的文档",
+                    "message": "请先上传要编辑的文档"
+                }
+
+            doc = docs[0]  # 默认编辑第一个文档
+            content = doc.get("content", "")
+            original_filename = doc.get("metadata", {}).get("original_filename", "未知文档")
+
+            if not content:
+                return {
+                    "success": False,
+                    "error": "文档内容为空",
+                    "message": "该文档没有可编辑的内容"
+                }
+
+            # 使用 LLM 进行文本润色/编辑
+            prompt = f"""请对以下文档内容进行编辑处理。
+
+原文内容：
+{content[:8000]}
+
+编辑要求：
+- 润色表述，使其更加专业流畅
+- 修正明显的语法错误
+- 保持原意不变
+- 只返回编辑后的内容，不要解释
+
+请直接输出编辑后的内容："""
+
+            messages = [
+                {"role": "system", "content": "你是一个专业的文本编辑助手。请直接输出编辑后的内容。"},
+                {"role": "user", "content": prompt}
+            ]
+
+            from app.services.llm_service import llm_service
+            response = await llm_service.chat(messages=messages, temperature=0.3, max_tokens=8000)
+            edited_content = llm_service.extract_message_content(response)
+
+            return {
+                "success": True,
+                "intent": "edit",
+                "edited_content": edited_content,
+                "original_filename": original_filename,
+                "message": "文档编辑完成，内容已返回"
+            }
+
+        except Exception as e:
+            logger.error(f"编辑执行失败: {e}")
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"编辑处理失败: {str(e)}"
+            }
+
+    async def _execute_transform(self, params: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """
+        执行格式转换操作
+
+        支持：
+        - Word -> Excel
+        - Excel -> Word
+        - Markdown -> Word
+        - Word -> Markdown
+        """
+        try:
+            docs = context.get("source_docs", [])
+            if not docs:
+                return {
+                    "success": False,
+                    "error": "没有可用的文档",
+                    "message": "请先上传要转换的文档"
+                }
+
+            # 获取目标格式
+            template_info = params.get("template", {})
+            target_type = template_info.get("type", "")
+
+            if not target_type:
+                # 尝试从指令中推断
+                instruction = params.get("instruction", "")
+                if "excel" in instruction.lower() or "xlsx" in instruction.lower():
+                    target_type = "xlsx"
+                elif "word" in instruction.lower() or "docx" in instruction.lower():
+                    target_type = "docx"
+                elif "markdown" in instruction.lower() or "md" in instruction.lower():
+                    target_type = "md"
+
+            if not target_type:
+                return {
+                    "success": False,
+                    "error": "未指定目标格式",
+                    "message": "请说明要转换成什么格式（如：转成Excel、转成Word）"
+                }
+
+            doc = docs[0]
+            content = doc.get("content", "")
+            structured_data = doc.get("structured_data", {})
+            original_filename = doc.get("metadata", {}).get("original_filename", "未知文档")
+
+            # 构建转换内容
+            if structured_data.get("tables"):
+                # 有表格数据，生成表格格式的内容
+                tables = structured_data.get("tables", [])
+                table_content = []
+                for i, table in enumerate(tables[:3]):  # 最多处理3个表格
+                    headers = table.get("headers", [])
+                    rows = table.get("rows", [])[:20]  # 最多20行
+                    if headers:
+                        table_content.append(f"【表格 {i+1}】")
+                        table_content.append(" | ".join(str(h) for h in headers))
+                        table_content.append(" | ".join(["---"] * len(headers)))
+                        for row in rows:
+                            if isinstance(row, list):
+                                table_content.append(" | ".join(str(c) for c in row))
+                            elif isinstance(row, dict):
+                                table_content.append(" | ".join(str(row.get(h, "")) for h in headers))
+                        table_content.append("")
+
+                if target_type == "xlsx":
+                    # 生成 Excel 格式的数据（JSON）
+                    excel_data = []
+                    for table in tables[:1]:  # 只处理第一个表格
+                        headers = table.get("headers", [])
+                        rows = table.get("rows", [])[:100]
+                        for row in rows:
+                            if isinstance(row, list):
+                                excel_data.append(dict(zip(headers, row)))
+                            elif isinstance(row, dict):
+                                excel_data.append(row)
+
+                    return {
+                        "success": True,
+                        "intent": "transform",
+                        "transform_type": "to_excel",
+                        "target_format": "xlsx",
+                        "excel_data": excel_data,
+                        "headers": headers,
+                        "message": f"已转换为 Excel 格式，包含 {len(excel_data)} 行数据"
+                    }
+                elif target_type in ["docx", "word"]:
+                    # 生成 Word 格式的文本
+                    word_content = f"# {original_filename}\n\n"
+                    word_content += "\n".join(table_content)
+
+                    return {
+                        "success": True,
+                        "intent": "transform",
+                        "transform_type": "to_word",
+                        "target_format": "docx",
+                        "content": word_content,
+                        "message": "已转换为 Word 格式"
+                    }
+                elif target_type == "md":
+                    # 生成 Markdown 格式
+                    md_content = f"# {original_filename}\n\n"
+                    md_content += "\n".join(table_content)
+
+                    return {
+                        "success": True,
+                        "intent": "transform",
+                        "transform_type": "to_markdown",
+                        "target_format": "md",
+                        "content": md_content,
+                        "message": "已转换为 Markdown 格式"
+                    }
+
+            # 无表格数据，使用纯文本内容转换
+            if target_type == "xlsx":
+                # 将文本内容转为 Excel 格式（每行作为一列）
+                lines = [line.strip() for line in content.split("\n") if line.strip()][:100]
+                excel_data = [{"行号": i+1, "内容": line} for i, line in enumerate(lines)]
+
+                return {
+                    "success": True,
+                    "intent": "transform",
+                    "transform_type": "to_excel",
+                    "target_format": "xlsx",
+                    "excel_data": excel_data,
+                    "headers": ["行号", "内容"],
+                    "message": f"已将文本内容转换为 Excel，包含 {len(excel_data)} 行"
+                }
+            elif target_type in ["docx", "word"]:
+                return {
+                    "success": True,
+                    "intent": "transform",
+                    "transform_type": "to_word",
+                    "target_format": "docx",
+                    "content": content,
+                    "message": "文档内容已准备好，可下载为 Word 格式"
+                }
+            elif target_type == "md":
+                # 简单的文本转 Markdown
+                md_lines = []
+                for line in content.split("\n"):
+                    line = line.strip()
+                    if line:
+                        # 简单处理：如果行不长且不是列表格式，作为段落
+                        if len(line) < 100 and not line.startswith(("-", "*", "1.", "2.", "3.")):
+                            md_lines.append(line)
+                        else:
+                            md_lines.append(line)
+                    else:
+                        md_lines.append("")
+
+                return {
+                    "success": True,
+                    "intent": "transform",
+                    "transform_type": "to_markdown",
+                    "target_format": "md",
+                    "content": "\n".join(md_lines),
+                    "message": "已转换为 Markdown 格式"
+                }
+
+            return {
+                "success": False,
+                "error": "不支持的目标格式",
+                "message": f"暂不支持转换为 {target_type} 格式"
+            }
+
+        except Exception as e:
+            logger.error(f"格式转换失败: {e}")
+            return {
+                "success": False,
+                "error": str(e),
+                "message": f"格式转换失败: {str(e)}"
+            }
+
+
+# 全局单例
+instruction_executor = InstructionExecutor()
--- a/backend/app/instruction/intent_parser.py
+++ b/backend/app/instruction/intent_parser.py
@@ -0,0 +1,242 @@
+"""
+意图解析器模块
+
+解析用户自然语言指令，识别意图和参数
+"""
+import re
+import logging
+from typing import Any, Dict, List, Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+
+class IntentParser:
+    """意图解析器"""
+
+    # 意图类型定义
+    INTENT_EXTRACT = "extract"           # 信息提取
+    INTENT_FILL_TABLE = "fill_table"    # 填表
+    INTENT_SUMMARIZE = "summarize"       # 摘要总结
+    INTENT_QUESTION = "question"         # 问答
+    INTENT_SEARCH = "search"              # 搜索
+    INTENT_COMPARE = "compare"            # 对比分析
+    INTENT_TRANSFORM = "transform"        # 格式转换
+    INTENT_EDIT = "edit"                  # 编辑文档
+    INTENT_UNKNOWN = "unknown"            # 未知
+
+    # 意图关键词映射
+    INTENT_KEYWORDS = {
+        INTENT_EXTRACT: ["提取", "抽取", "获取", "找出", "查找", "识别", "找到"],
+        INTENT_FILL_TABLE: ["填表", "填写", "填充", "录入", "导入到表格", "填写到"],
+        INTENT_SUMMARIZE: ["总结", "摘要", "概括", "概述", "归纳", "提炼"],
+        INTENT_QUESTION: ["问答", "回答", "解释", "什么是", "为什么", "如何", "怎样", "多少", "几个"],
+        INTENT_SEARCH: ["搜索", "查找", "检索", "查询", "找"],
+        INTENT_COMPARE: ["对比", "比较", "差异", "区别", "不同"],
+        INTENT_TRANSFORM: ["转换", "转化", "变成", "转为", "导出"],
+        INTENT_EDIT: ["修改", "编辑", "调整", "改写", "润色", "优化"],
+    }
+
+    # 实体模式定义
+    ENTITY_PATTERNS = {
+        "number": [r"\d+", r"[一二三四五六七八九十百千万]+"],
+        "date": [r"\d{4}年", r"\d{1,2}月", r"\d{1,2}日"],
+        "percentage": [r"\d+(\.\d+)?%", r"\d+(\.\d+)?‰"],
+        "currency": [r"\d+(\.\d+)?万元", r"\d+(\.\d+)?亿元", r"\d+(\.\d+)?元"],
+    }
+
+    def __init__(self):
+        self.intent_history: List[Dict[str, Any]] = []
+
+    async def parse(self, text: str) -> Tuple[str, Dict[str, Any]]:
+        """
+        解析自然语言指令
+
+        Args:
+            text: 用户输入的自然语言
+
+        Returns:
+            (意图类型, 参数字典)
+        """
+        text = text.strip()
+        if not text:
+            return self.INTENT_UNKNOWN, {}
+
+        # 记录历史
+        self.intent_history.append({"text": text, "intent": None})
+
+        # 识别意图
+        intent = self._recognize_intent(text)
+
+        # 提取参数
+        params = self._extract_params(text, intent)
+
+        # 更新历史
+        if self.intent_history:
+            self.intent_history[-1]["intent"] = intent
+
+        logger.info(f"意图解析: text={text[:50]}..., intent={intent}, params={params}")
+
+        return intent, params
+
+    def _recognize_intent(self, text: str) -> str:
+        """识别意图类型"""
+        intent_scores: Dict[str, float] = {}
+
+        for intent, keywords in self.INTENT_KEYWORDS.items():
+            score = 0
+            for keyword in keywords:
+                if keyword in text:
+                    score += 1
+            if score > 0:
+                intent_scores[intent] = score
+
+        if not intent_scores:
+            return self.INTENT_UNKNOWN
+
+        # 返回得分最高的意图
+        return max(intent_scores, key=intent_scores.get)
+
+    def _extract_params(self, text: str, intent: str) -> Dict[str, Any]:
+        """提取参数"""
+        params: Dict[str, Any] = {
+            "entities": self._extract_entities(text),
+            "document_refs": self._extract_document_refs(text),
+            "field_refs": self._extract_field_refs(text),
+            "template_refs": self._extract_template_refs(text),
+        }
+
+        # 根据意图类型提取特定参数
+        if intent == self.INTENT_QUESTION:
+            params["question"] = text
+            params["focus"] = self._extract_question_focus(text)
+        elif intent == self.INTENT_FILL_TABLE:
+            params["template"] = self._extract_template_info(text)
+        elif intent == self.INTENT_EXTRACT:
+            params["target_fields"] = self._extract_target_fields(text)
+
+        return params
+
+    def _extract_entities(self, text: str) -> Dict[str, List[str]]:
+        """提取实体"""
+        entities: Dict[str, List[str]] = {}
+
+        for entity_type, patterns in self.ENTITY_PATTERNS.items():
+            matches = []
+            for pattern in patterns:
+                found = re.findall(pattern, text)
+                matches.extend(found)
+            if matches:
+                entities[entity_type] = list(set(matches))
+
+        return entities
+
+    def _extract_document_refs(self, text: str) -> List[str]:
+        """提取文档引用"""
+        # 匹配 "文档1"、"doc1"、"第一个文档" 等
+        refs = []
+
+        # 数字索引: 文档1, doc1, 第1个文档
+        num_patterns = [
+            r"[文档doc]+(\d+)",
+            r"第(\d+)个文档",
+            r"第(\d+)份",
+        ]
+        for pattern in num_patterns:
+            matches = re.findall(pattern, text.lower())
+            refs.extend([f"doc_{m}" for m in matches])
+
+        # "所有文档"、"全部文档"
+        if any(kw in text for kw in ["所有", "全部", "整个"]):
+            refs.append("all_docs")
+
+        return refs
+
+    def _extract_field_refs(self, text: str) -> List[str]:
+        """提取字段引用"""
+        fields = []
+
+        # 匹配引号内的字段名
+        quoted = re.findall(r"['\"『「]([^'\"』」]+)['\"』」]", text)
+        fields.extend(quoted)
+
+        # 匹配 "xxx字段"、"xxx列" 等
+        field_patterns = [
+            r"([^\s]+)字段",
+            r"([^\s]+)列",
+            r"([^\s]+)数据",
+        ]
+        for pattern in field_patterns:
+            matches = re.findall(pattern, text)
+            fields.extend(matches)
+
+        return list(set(fields))
+
+    def _extract_template_refs(self, text: str) -> List[str]:
+        """提取模板引用"""
+        templates = []
+
+        # 匹配 "表格模板"、"Excel模板"、"表1" 等
+        template_patterns = [
+            r"([^\s]+模板)",
+            r"表(\d+)",
+            r"([^\s]+表格)",
+        ]
+        for pattern in template_patterns:
+            matches = re.findall(pattern, text)
+            templates.extend(matches)
+
+        return list(set(templates))
+
+    def _extract_question_focus(self, text: str) -> Optional[str]:
+        """提取问题焦点"""
+        # "什么是XXX"、"XXX是什么"
+        match = re.search(r"[什么是]([^?]+)", text)
+        if match:
+            return match.group(1).strip()
+
+        # "XXX有多少"
+        match = re.search(r"([^?]+)有多少", text)
+        if match:
+            return match.group(1).strip()
+
+        return None
+
+    def _extract_template_info(self, text: str) -> Optional[Dict[str, str]]:
+        """提取模板信息"""
+        template_info: Dict[str, str] = {}
+
+        # 提取模板类型
+        if "excel" in text.lower() or "xlsx" in text.lower() or "电子表格" in text:
+            template_info["type"] = "xlsx"
+        elif "word" in text.lower() or "docx" in text.lower() or "文档" in text:
+            template_info["type"] = "docx"
+
+        return template_info if template_info else None
+
+    def _extract_target_fields(self, text: str) -> List[str]:
+        """提取目标字段"""
+        fields = []
+
+        # 匹配 "提取XXX和YYY"、"抽取XXX、YYY"
+        patterns = [
+            r"提取([^(and|,|，)+]+?)(?:和|与|、|,|plus)",
+            r"抽取([^(and|,|，)+]+?)(?:和|与|、|,|plus)",
+        ]
+
+        for pattern in patterns:
+            matches = re.findall(pattern, text)
+            fields.extend([m.strip() for m in matches if m.strip()])
+
+        return list(set(fields))
+
+    def get_intent_history(self) -> List[Dict[str, Any]]:
+        """获取意图历史"""
+        return self.intent_history
+
+    def clear_history(self):
+        """清空历史"""
+        self.intent_history = []
+
+
+# 全局单例
+intent_parser = IntentParser()
--- a/backend/app/main.py
+++ b/backend/app/main.py
@@ -1,6 +1,13 @@
 """
 FastAPI 应用主入口
 """
+# ========== 压制 MongoDB 疯狂刷屏日志 ==========
+import logging
+logging.getLogger("pymongo").setLevel(logging.WARNING)
+logging.getLogger("pymongo.topology").setLevel(logging.WARNING)
+logging.getLogger("urllib3").setLevel(logging.WARNING)
+# ==============================================
+
 import logging
 import logging.handlers
 import sys
--- a/backend/app/services/llm_service.py
+++ b/backend/app/services/llm_service.py
@@ -65,7 +65,17 @@ class LLMService:
                return response.json()

        except httpx.HTTPStatusError as e:
-            logger.error(f"LLM API 请求失败: {e.response.status_code} - {e.response.text}")
+            error_detail = e.response.text
+            logger.error(f"LLM API 请求失败: {e.response.status_code} - {error_detail}")
+            # 尝试解析错误信息
+            try:
+                import json
+                err_json = json.loads(error_detail)
+                err_code = err_json.get("error", {}).get("code", "unknown")
+                err_msg = err_json.get("error", {}).get("message", "unknown")
+                logger.error(f"API 错误码: {err_code}, 错误信息: {err_msg}")
+            except:
+                pass
            raise
        except Exception as e:
            logger.error(f"LLM API 调用异常: {str(e)}")
@@ -328,6 +338,154 @@ Excel 数据概览:
                "analysis": None
            }

+    async def chat_with_images(
+        self,
+        text: str,
+        images: List[Dict[str, str]],
+        temperature: float = 0.7,
+        max_tokens: Optional[int] = None
+    ) -> Dict[str, Any]:
+        """
+        调用视觉模型 API（支持图片输入）
+
+        Args:
+            text: 文本内容
+            images: 图片列表，每项包含 base64 编码和 mime_type
+                   格式: [{"base64": "...", "mime_type": "image/png"}, ...]
+            temperature: 温度参数
+            max_tokens: 最大 token 数
+
+        Returns:
+            Dict[str, Any]: API 响应结果
+        """
+        headers = {
+            "Authorization": f"Bearer {self.api_key}",
+            "Content-Type": "application/json"
+        }
+
+        # 构建图片内容
+        image_contents = []
+        for img in images:
+            image_contents.append({
+                "type": "image_url",
+                "image_url": {
+                    "url": f"data:{img['mime_type']};base64,{img['base64']}"
+                }
+            })
+
+        # 构建消息
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "text",
+                        "text": text
+                    },
+                    *image_contents
+                ]
+            }
+        ]
+
+        payload = {
+            "model": self.model_name,
+            "messages": messages,
+            "temperature": temperature
+        }
+
+        if max_tokens:
+            payload["max_tokens"] = max_tokens
+
+        try:
+            async with httpx.AsyncClient(timeout=120.0) as client:
+                response = await client.post(
+                    f"{self.base_url}/chat/completions",
+                    headers=headers,
+                    json=payload
+                )
+                response.raise_for_status()
+                return response.json()
+
+        except httpx.HTTPStatusError as e:
+            error_detail = e.response.text
+            logger.error(f"视觉模型 API 请求失败: {e.response.status_code} - {error_detail}")
+            # 尝试解析错误信息
+            try:
+                import json
+                err_json = json.loads(error_detail)
+                err_code = err_json.get("error", {}).get("code", "unknown")
+                err_msg = err_json.get("error", {}).get("message", "unknown")
+                logger.error(f"API 错误码: {err_code}, 错误信息: {err_msg}")
+                logger.error(f"请求模型: {self.model_name}, base_url: {self.base_url}")
+            except:
+                pass
+            raise
+        except Exception as e:
+            logger.error(f"视觉模型 API 调用异常: {str(e)}")
+            raise
+
+    async def analyze_images(
+        self,
+        images: List[Dict[str, str]],
+        user_prompt: str = ""
+    ) -> Dict[str, Any]:
+        """
+        分析图片内容（使用视觉模型）
+
+        Args:
+            images: 图片列表，每项包含 base64 编码和 mime_type
+            user_prompt: 用户提示词
+
+        Returns:
+            Dict[str, Any]: 分析结果
+        """
+        prompt = f"""你是一个专业的视觉分析专家。请分析以下图片内容。
+
+{user_prompt if user_prompt else "请详细描述图片中的内容，包括文字、数据、图表、流程等所有可见信息。"}
+
+请按照以下 JSON 格式输出：
+{{
+    "description": "图片内容的详细描述",
+    "text_content": "图片中的文字内容（如有）",
+    "data_extracted": {{"键": "值"}}  // 如果图片中有表格或数据
+}}
+
+如果图片不包含有用信息，请返回空的描述。"""
+
+        try:
+            response = await self.chat_with_images(
+                text=prompt,
+                images=images,
+                temperature=0.1,
+                max_tokens=4000
+            )
+
+            content = self.extract_message_content(response)
+
+            # 解析 JSON
+            import json
+            try:
+                result = json.loads(content)
+                return {
+                    "success": True,
+                    "analysis": result,
+                    "model": self.model_name
+                }
+            except json.JSONDecodeError:
+                return {
+                    "success": True,
+                    "analysis": {"description": content},
+                    "model": self.model_name
+                }
+
+        except Exception as e:
+            logger.error(f"图片分析失败: {str(e)}")
+            return {
+                "success": False,
+                "error": str(e),
+                "analysis": None
+            }
+

 # 全局单例
 llm_service = LLMService()
--- a/backend/app/services/multi_doc_reasoning_service.py
+++ b/backend/app/services/multi_doc_reasoning_service.py
@@ -0,0 +1,446 @@
+"""
+多文档关联推理服务
+
+跨文档信息关联和推理
+"""
+import logging
+import re
+from typing import Any, Dict, List, Optional, Set, Tuple
+from collections import defaultdict
+
+from app.services.llm_service import llm_service
+from app.services.rag_service import rag_service
+
+logger = logging.getLogger(__name__)
+
+
+class MultiDocReasoningService:
+    """
+    多文档关联推理服务
+
+    功能：
+    1. 实体跨文档追踪 - 追踪同一实体在不同文档中的描述
+    2. 关系抽取与推理 - 抽取实体间关系并进行推理
+    3. 信息补全 - 根据多个文档的信息互补填充缺失数据
+    4. 冲突检测 - 检测不同文档间的信息冲突
+    """
+
+    def __init__(self):
+        self.llm = llm_service
+
+    async def analyze_cross_documents(
+        self,
+        documents: List[Dict[str, Any]],
+        query: Optional[str] = None,
+        entity_types: Optional[List[str]] = None
+    ) -> Dict[str, Any]:
+        """
+        跨文档分析
+
+        Args:
+            documents: 文档列表
+            query: 查询意图（可选）
+            entity_types: 要追踪的实体类型列表，如 ["机构", "人物", "地点", "数量"]
+
+        Returns:
+            跨文档分析结果
+        """
+        if not documents:
+            return {"success": False, "error": "没有可用的文档"}
+
+        entity_types = entity_types or ["机构", "数量", "时间", "地点"]
+
+        try:
+            # 1. 提取各文档中的实体
+            entities_per_doc = await self._extract_entities_from_docs(documents, entity_types)
+
+            # 2. 跨文档实体对齐
+            aligned_entities = self._align_entities_across_docs(entities_per_doc)
+
+            # 3. 关系抽取
+            relations = await self._extract_relations(documents)
+
+            # 4. 构建知识图谱
+            knowledge_graph = self._build_knowledge_graph(aligned_entities, relations)
+
+            # 5. 信息补全
+            completed_info = await self._complete_missing_info(knowledge_graph, documents)
+
+            # 6. 冲突检测
+            conflicts = self._detect_conflicts(aligned_entities)
+
+            return {
+                "success": True,
+                "entities": aligned_entities,
+                "relations": relations,
+                "knowledge_graph": knowledge_graph,
+                "completed_info": completed_info,
+                "conflicts": conflicts,
+                "summary": self._generate_summary(aligned_entities, conflicts)
+            }
+
+        except Exception as e:
+            logger.error(f"跨文档分析失败: {e}")
+            return {"success": False, "error": str(e)}
+
+    async def _extract_entities_from_docs(
+        self,
+        documents: List[Dict[str, Any]],
+        entity_types: List[str]
+    ) -> List[Dict[str, Any]]:
+        """从各文档中提取实体"""
+        entities_per_doc = []
+
+        for idx, doc in enumerate(documents):
+            doc_id = doc.get("_id", f"doc_{idx}")
+            content = doc.get("content", "")[:8000]  # 限制长度
+
+            # 使用 LLM 提取实体
+            prompt = f"""从以下文档中提取指定的实体类型信息。
+
+实体类型: {', '.join(entity_types)}
+
+文档内容:
+{content}
+
+请按以下 JSON 格式输出（只需输出 JSON）：
+{{
+    "entities": [
+        {{"type": "机构", "name": "实体名称", "value": "相关数值（如有）", "context": "上下文描述"}},
+        ...
+    ]
+}}
+
+只提取在文档中明确提到的实体，不要推测。"""
+
+            messages = [
+                {"role": "system", "content": "你是一个实体提取专家。请严格按JSON格式输出。"},
+                {"role": "user", "content": prompt}
+            ]
+
+            try:
+                response = await self.llm.chat(messages=messages, temperature=0.1, max_tokens=3000)
+                content_response = self.llm.extract_message_content(response)
+
+                # 解析 JSON
+                import json
+                import re
+                cleaned = content_response.strip()
+                json_match = re.search(r'\{[\s\S]*\}', cleaned)
+                if json_match:
+                    result = json.loads(json_match.group())
+                    entities = result.get("entities", [])
+                    entities_per_doc.append({
+                        "doc_id": doc_id,
+                        "doc_name": doc.get("metadata", {}).get("original_filename", f"文档{idx+1}"),
+                        "entities": entities
+                    })
+                    logger.info(f"文档 {doc_id} 提取到 {len(entities)} 个实体")
+            except Exception as e:
+                logger.warning(f"文档 {doc_id} 实体提取失败: {e}")
+
+        return entities_per_doc
+
+    def _align_entities_across_docs(
+        self,
+        entities_per_doc: List[Dict[str, Any]]
+    ) -> Dict[str, List[Dict[str, Any]]]:
+        """
+        跨文档实体对齐
+
+        将同一实体在不同文档中的描述进行关联
+        """
+        aligned: Dict[str, List[Dict[str, Any]]] = defaultdict(list)
+
+        for doc_data in entities_per_doc:
+            doc_id = doc_data["doc_id"]
+            doc_name = doc_data["doc_name"]
+
+            for entity in doc_data.get("entities", []):
+                entity_name = entity.get("name", "")
+                if not entity_name:
+                    continue
+
+                # 标准化实体名（去除空格和括号内容）
+                normalized = self._normalize_entity_name(entity_name)
+
+                aligned[normalized].append({
+                    "original_name": entity_name,
+                    "type": entity.get("type", "未知"),
+                    "value": entity.get("value", ""),
+                    "context": entity.get("context", ""),
+                    "source_doc": doc_name,
+                    "source_doc_id": doc_id
+                })
+
+        # 合并相同实体
+        result = {}
+        for normalized, appearances in aligned.items():
+            if len(appearances) > 1:
+                result[normalized] = appearances
+                logger.info(f"实体对齐: {normalized} 在 {len(appearances)} 个文档中出现")
+
+        return result
+
+    def _normalize_entity_name(self, name: str) -> str:
+        """标准化实体名称"""
+        # 去除空格
+        name = name.strip()
+        # 去除括号内容
+        name = re.sub(r'[（(].*?[）)]', '', name)
+        # 去除"第X名"等
+        name = re.sub(r'^第\d+[名位个]', '', name)
+        return name.strip()
+
+    async def _extract_relations(
+        self,
+        documents: List[Dict[str, Any]]
+    ) -> List[Dict[str, str]]:
+        """从文档中抽取关系"""
+        relations = []
+
+        # 合并所有文档内容
+        combined_content = "\n\n".join([
+            f"【{doc.get('metadata', {}).get('original_filename', f'文档{i}')}】\n{doc.get('content', '')[:3000]}"
+            for i, doc in enumerate(documents)
+        ])
+
+        prompt = f"""从以下文档内容中抽取实体之间的关系。
+
+文档内容:
+{combined_content[:8000]}
+
+请识别以下类型的关系：
+- 包含关系 (A包含B)
+- 隶属关系 (A隶属于B)
+- 合作关系 (A与B合作)
+- 对比关系 (A vs B)
+- 时序关系 (A先于B发生)
+
+请按以下 JSON 格式输出（只需输出 JSON）：
+{{
+    "relations": [
+        {{"entity1": "实体1", "entity2": "实体2", "relation": "关系类型", "description": "关系描述"}},
+        ...
+    ]
+}}
+
+如果没有找到明确的关系，返回空数组。"""
+
+        messages = [
+            {"role": "system", "content": "你是一个关系抽取专家。请严格按JSON格式输出。"},
+            {"role": "user", "content": prompt}
+        ]
+
+        try:
+            response = await self.llm.chat(messages=messages, temperature=0.1, max_tokens=3000)
+            content_response = self.llm.extract_message_content(response)
+
+            import json
+            import re
+            cleaned = content_response.strip()
+            json_match = re.search(r'\{{[\s\S]*\}}', cleaned)
+            if json_match:
+                result = json.loads(json_match.group())
+                relations = result.get("relations", [])
+                logger.info(f"抽取到 {len(relations)} 个关系")
+        except Exception as e:
+            logger.warning(f"关系抽取失败: {e}")
+
+        return relations
+
+    def _build_knowledge_graph(
+        self,
+        aligned_entities: Dict[str, List[Dict[str, Any]]],
+        relations: List[Dict[str, str]]
+    ) -> Dict[str, Any]:
+        """构建知识图谱"""
+        nodes = []
+        edges = []
+        node_ids = set()
+
+        # 添加实体节点
+        for entity_name, appearances in aligned_entities.items():
+            if len(appearances) < 1:
+                continue
+
+            first_appearance = appearances[0]
+            node_id = f"entity_{len(nodes)}"
+
+            # 收集该实体在所有文档中的值
+            values = [a.get("value", "") for a in appearances if a.get("value")]
+            primary_value = values[0] if values else ""
+
+            nodes.append({
+                "id": node_id,
+                "name": entity_name,
+                "type": first_appearance.get("type", "未知"),
+                "value": primary_value,
+                "occurrence_count": len(appearances),
+                "sources": [a.get("source_doc", "") for a in appearances]
+            })
+            node_ids.add(entity_name)
+
+        # 添加关系边
+        for relation in relations:
+            entity1 = self._normalize_entity_name(relation.get("entity1", ""))
+            entity2 = self._normalize_entity_name(relation.get("entity2", ""))
+
+            if entity1 in node_ids and entity2 in node_ids:
+                edges.append({
+                    "source": entity1,
+                    "target": entity2,
+                    "relation": relation.get("relation", "相关"),
+                    "description": relation.get("description", "")
+                })
+
+        return {
+            "nodes": nodes,
+            "edges": edges,
+            "stats": {
+                "entity_count": len(nodes),
+                "relation_count": len(edges)
+            }
+        }
+
+    async def _complete_missing_info(
+        self,
+        knowledge_graph: Dict[str, Any],
+        documents: List[Dict[str, Any]]
+    ) -> List[Dict[str, Any]]:
+        """根据多个文档补全信息"""
+        completed = []
+
+        for node in knowledge_graph.get("nodes", []):
+            if not node.get("value") and node.get("occurrence_count", 0) > 1:
+                # 实体在多个文档中出现但没有数值，尝试从 RAG 检索补充
+                query = f"{node['name']} 数值 数据"
+                results = rag_service.retrieve(query, top_k=3, min_score=0.3)
+
+                if results:
+                    completed.append({
+                        "entity": node["name"],
+                        "type": node.get("type", "未知"),
+                        "source": "rag_inference",
+                        "context": results[0].get("content", "")[:200],
+                        "confidence": results[0].get("score", 0)
+                    })
+
+        return completed
+
+    def _detect_conflicts(
+        self,
+        aligned_entities: Dict[str, List[Dict[str, Any]]]
+    ) -> List[Dict[str, Any]]:
+        """检测不同文档间的信息冲突"""
+        conflicts = []
+
+        for entity_name, appearances in aligned_entities.items():
+            if len(appearances) < 2:
+                continue
+
+            # 检查数值冲突
+            values = {}
+            for appearance in appearances:
+                val = appearance.get("value", "")
+                if val:
+                    source = appearance.get("source_doc", "未知来源")
+                    values[source] = val
+
+            if len(values) > 1:
+                unique_values = set(values.values())
+                if len(unique_values) > 1:
+                    conflicts.append({
+                        "entity": entity_name,
+                        "type": "value_conflict",
+                        "details": values,
+                        "description": f"实体 '{entity_name}' 在不同文档中有不同数值: {values}"
+                    })
+
+        return conflicts
+
+    def _generate_summary(
+        self,
+        aligned_entities: Dict[str, List[Dict[str, Any]]],
+        conflicts: List[Dict[str, Any]]
+    ) -> str:
+        """生成摘要"""
+        summary_parts = []
+
+        total_entities = sum(len(appearances) for appearances in aligned_entities.values())
+        multi_doc_entities = sum(1 for appearances in aligned_entities.values() if len(appearances) > 1)
+
+        summary_parts.append(f"跨文档分析完成：发现 {total_entities} 个实体")
+        summary_parts.append(f"其中 {multi_doc_entities} 个实体在多个文档中被提及")
+
+        if conflicts:
+            summary_parts.append(f"检测到 {len(conflicts)} 个潜在冲突")
+
+        return "; ".join(summary_parts)
+
+    async def answer_cross_doc_question(
+        self,
+        question: str,
+        documents: List[Dict[str, Any]]
+    ) -> Dict[str, Any]:
+        """
+        跨文档问答
+
+        Args:
+            question: 问题
+            documents: 文档列表
+
+        Returns:
+            答案结果
+        """
+        # 先进行跨文档分析
+        analysis_result = await self.analyze_cross_documents(documents, query=question)
+
+        # 构建上下文
+        context_parts = []
+
+        # 添加实体信息
+        for entity_name, appearances in analysis_result.get("entities", {}).items():
+            contexts = [f"{a.get('source_doc')}: {a.get('context', '')}" for a in appearances[:2]]
+            if contexts:
+                context_parts.append(f"【{entity_name}】{' | '.join(contexts)}")
+
+        # 添加关系信息
+        for relation in analysis_result.get("relations", [])[:5]:
+            context_parts.append(f"【关系】{relation.get('entity1')} {relation.get('relation')} {relation.get('entity2')}: {relation.get('description', '')}")
+
+        context_text = "\n\n".join(context_parts) if context_parts else "未找到相关实体和关系"
+
+        # 使用 LLM 生成答案
+        prompt = f"""基于以下跨文档分析结果，回答用户问题。
+
+问题: {question}
+
+分析结果:
+{context_text}
+
+请直接回答问题，如果分析结果中没有相关信息，请说明"根据提供的文档无法回答该问题"。"""
+
+        messages = [
+            {"role": "system", "content": "你是一个基于文档的问答助手。请根据提供的信息回答问题。"},
+            {"role": "user", "content": prompt}
+        ]
+
+        try:
+            response = await self.llm.chat(messages=messages, temperature=0.2, max_tokens=2000)
+            answer = self.llm.extract_message_content(response)
+
+            return {
+                "success": True,
+                "question": question,
+                "answer": answer,
+                "supporting_entities": list(analysis_result.get("entities", {}).keys())[:10],
+                "relations_count": len(analysis_result.get("relations", []))
+            }
+        except Exception as e:
+            logger.error(f"跨文档问答失败: {e}")
+            return {"success": False, "error": str(e)}
+
+
+# 全局单例
+multi_doc_reasoning_service = MultiDocReasoningService()
--- a/backend/app/services/rag_service.py
+++ b/backend/app/services/rag_service.py
@@ -2,21 +2,32 @@
 RAG 服务模块 - 检索增强生成

 使用 sentence-transformers + Faiss 实现向量检索
+支持 BM25 关键词检索 + 向量检索混合融合
 """
-import json
 import logging
 import os
 import pickle
-from typing import Any, Dict, List, Optional
+import re
+import math
+from typing import Any, Dict, List, Optional, Tuple
+from collections import Counter, defaultdict

 import faiss
 import numpy as np
-from sentence_transformers import SentenceTransformer

 from app.config import settings

 logger = logging.getLogger(__name__)

+# 尝试导入 sentence-transformers
+try:
+    from sentence_transformers import SentenceTransformer
+    SENTENCE_TRANSFORMERS_AVAILABLE = True
+except ImportError as e:
+    logger.warning(f"sentence-transformers 导入失败: {e}")
+    SENTENCE_TRANSFORMERS_AVAILABLE = False
+    SentenceTransformer = None
+

 class SimpleDocument:
    """简化文档对象"""
@@ -25,20 +36,156 @@ class SimpleDocument:
        self.metadata = metadata


+class BM25:
+    """
+    BM25 关键词检索算法
+
+    一种基于词频和文档频率的信息检索算法，比纯向量搜索更适合关键词精确匹配
+    """
+
+    def __init__(self, k1: float = 1.5, b: float = 0.75):
+        self.k1 = k1  # 词频饱和参数
+        self.b = b    # 文档长度归一化参数
+        self.documents: List[str] = []
+        self.doc_ids: List[str] = []
+        self.avg_doc_length = 0
+        self.doc_freqs: Dict[str, int] = {}  # 词 -> 包含该词的文档数
+        self.idf: Dict[str, float] = {}      # 词 -> IDF 值
+        self.doc_lengths: List[int] = []
+        self.doc_term_freqs: List[Dict[str, int]] = []  # 每个文档的词频
+
+    def _tokenize(self, text: str) -> List[str]:
+        """分词（简单的中文分词）"""
+        if not text:
+            return []
+        # 简单分词：按标点和空格分割
+        tokens = re.findall(r'[\u4e00-\u9fff]+|[a-zA-Z0-9]+', text.lower())
+        # 过滤单字符
+        return [t for t in tokens if len(t) > 1]
+
+    def fit(self, documents: List[str], doc_ids: List[str]):
+        """
+        构建 BM25 索引
+
+        Args:
+            documents: 文档内容列表
+            doc_ids: 文档 ID 列表
+        """
+        self.documents = documents
+        self.doc_ids = doc_ids
+        n = len(documents)
+
+        # 统计文档频率
+        self.doc_freqs = defaultdict(int)
+        self.doc_lengths = []
+        self.doc_term_freqs = []
+
+        for doc in documents:
+            tokens = self._tokenize(doc)
+            self.doc_lengths.append(len(tokens))
+            doc_tf = Counter(tokens)
+            self.doc_term_freqs.append(doc_tf)
+
+            for term in doc_tf:
+                self.doc_freqs[term] += 1
+
+        # 计算平均文档长度
+        self.avg_doc_length = sum(self.doc_lengths) / n if n > 0 else 0
+
+        # 计算 IDF
+        for term, df in self.doc_freqs.items():
+            # IDF = log((n - df + 0.5) / (df + 0.5))
+            self.idf[term] = math.log((n - df + 0.5) / (df + 0.5) + 1)
+
+        logger.info(f"BM25 索引构建完成: {n} 个文档, {len(self.idf)} 个词项")
+
+    def search(self, query: str, top_k: int = 10) -> List[Tuple[int, float]]:
+        """
+        搜索相关文档
+
+        Args:
+            query: 查询文本
+            top_k: 返回前 k 个结果
+
+        Returns:
+            [(文档索引, BM25分数), ...]
+        """
+        if not self.documents:
+            return []
+
+        query_tokens = self._tokenize(query)
+        if not query_tokens:
+            return []
+
+        scores = []
+        n = len(self.documents)
+
+        for idx in range(n):
+            score = self._calculate_score(query_tokens, idx)
+            scores.append((idx, score))
+
+        # 按分数降序排序
+        scores.sort(key=lambda x: x[1], reverse=True)
+
+        return scores[:top_k]
+
+    def _calculate_score(self, query_tokens: List[str], doc_idx: int) -> float:
+        """计算单个文档的 BM25 分数"""
+        doc_tf = self.doc_term_freqs[doc_idx]
+        doc_len = self.doc_lengths[doc_idx]
+        score = 0.0
+
+        for term in query_tokens:
+            if term not in self.idf:
+                continue
+
+            tf = doc_tf.get(term, 0)
+            idf = self.idf[term]
+
+            # BM25 公式
+            numerator = tf * (self.k1 + 1)
+            denominator = tf + self.k1 * (1 - self.b + self.b * doc_len / self.avg_doc_length)
+
+            score += idf * numerator / denominator
+
+        return score
+
+    def get_scores(self, query: str) -> List[float]:
+        """获取所有文档的 BM25 分数"""
+        if not self.documents:
+            return []
+
+        query_tokens = self._tokenize(query)
+        if not query_tokens:
+            return [0.0] * len(self.documents)
+
+        return [self._calculate_score(query_tokens, idx) for idx in range(len(self.documents))]
+
+
 class RAGService:
    """RAG 检索增强服务"""

+    # 默认分块参数
+    DEFAULT_CHUNK_SIZE = 500  # 每个文本块的大小（字符数）
+    DEFAULT_CHUNK_OVERLAP = 50  # 块之间的重叠（字符数）
+
    def __init__(self):
-        self.embedding_model: Optional[SentenceTransformer] = None
+        self.embedding_model = None
        self.index: Optional[faiss.Index] = None
        self.documents: List[Dict[str, Any]] = []
        self.doc_ids: List[str] = []
-        self._dimension: int = 0
+        self._dimension: int = 384  # 默认维度
        self._initialized = False
        self._persist_dir = settings.FAISS_INDEX_DIR
-        # 临时禁用 RAG API 调用，仅记录日志
-        self._disabled = True
-        logger.info("RAG 服务已禁用（_disabled=True），仅记录索引操作日志")
+        # BM25 索引
+        self.bm25: Optional[BM25] = None
+        self._bm25_enabled = True  # 始终启用 BM25
+        # 检查是否可用
+        self._disabled = not SENTENCE_TRANSFORMERS_AVAILABLE
+        if self._disabled:
+            logger.warning("RAG 服务已禁用（sentence-transformers 不可用），将使用 BM25 关键词检索")
+        else:
+            logger.info("RAG 服务已启用（向量检索 + BM25 混合检索）")

    def _init_embeddings(self):
        """初始化嵌入模型"""
@@ -88,6 +235,63 @@ class RAGService:
        norms = np.where(norms == 0, 1, norms)
        return vectors / norms

+    def _split_into_chunks(self, text: str, chunk_size: int = None, overlap: int = None) -> List[str]:
+        """
+        将长文本分割成块
+
+        Args:
+            text: 待分割的文本
+            chunk_size: 每个块的大小（字符数）
+            overlap: 块之间的重叠字符数
+
+        Returns:
+            文本块列表
+        """
+        if chunk_size is None:
+            chunk_size = self.DEFAULT_CHUNK_SIZE
+        if overlap is None:
+            overlap = self.DEFAULT_CHUNK_OVERLAP
+
+        if len(text) <= chunk_size:
+            return [text] if text.strip() else []
+
+        chunks = []
+        start = 0
+        text_len = len(text)
+
+        while start < text_len:
+            # 计算当前块的结束位置
+            end = start + chunk_size
+
+            # 如果不是最后一块，尝试在句子边界处切割
+            if end < text_len:
+                # 向前查找最后一个句号、逗号、换行或分号
+                cut_positions = []
+                for i in range(end, max(start, end - 100), -1):
+                    if text[i] in '。；，,\n、':
+                        cut_positions.append(i + 1)
+                        break
+
+                if cut_positions:
+                    end = cut_positions[0]
+                else:
+                    # 如果没找到句子边界，尝试向后查找
+                    for i in range(end, min(text_len, end + 50)):
+                        if text[i] in '。；，,\n、':
+                            end = i + 1
+                            break
+
+            chunk = text[start:end].strip()
+            if chunk:
+                chunks.append(chunk)
+
+            # 移动起始位置（考虑重叠）
+            start = end - overlap
+            if start <= 0:
+                start = end
+
+        return chunks
+
    def index_field(
        self,
        table_name: str,
@@ -124,9 +328,20 @@ class RAGService:
        self,
        doc_id: str,
        content: str,
-        metadata: Optional[Dict[str, Any]] = None
+        metadata: Optional[Dict[str, Any]] = None,
+        chunk_size: int = None,
+        chunk_overlap: int = None
    ):
-        """将文档内容索引到向量数据库"""
+        """
+        将文档内容索引到向量数据库（自动分块）
+
+        Args:
+            doc_id: 文档唯一标识
+            content: 文档内容
+            metadata: 文档元数据
+            chunk_size: 文本块大小（字符数），默认500
+            chunk_overlap: 块之间的重叠字符数，默认50
+        """
        if self._disabled:
            logger.info(f"[RAG DISABLED] 文档索引操作已跳过: {doc_id}")
            return
@@ -139,18 +354,70 @@ class RAGService:
            logger.debug(f"文档跳过索引 (无嵌入模型): {doc_id}")
            return

-        doc = SimpleDocument(
-            page_content=content,
-            metadata=metadata or {"doc_id": doc_id}
-        )
-        self._add_documents([doc], [doc_id])
-        logger.debug(f"已索引文档: {doc_id}")
+        # 分割文档为小块
+        if chunk_size is None:
+            chunk_size = self.DEFAULT_CHUNK_SIZE
+        if chunk_overlap is None:
+            chunk_overlap = self.DEFAULT_CHUNK_OVERLAP
+
+        chunks = self._split_into_chunks(content, chunk_size, chunk_overlap)
+
+        if not chunks:
+            logger.warning(f"文档内容为空，跳过索引: {doc_id}")
+            return
+
+        # 为每个块创建文档对象
+        documents = []
+        chunk_ids = []
+
+        for i, chunk in enumerate(chunks):
+            chunk_id = f"{doc_id}_chunk_{i}"
+            chunk_metadata = metadata.copy() if metadata else {}
+            chunk_metadata.update({
+                "chunk_index": i,
+                "total_chunks": len(chunks),
+                "doc_id": doc_id
+            })
+
+            documents.append(SimpleDocument(
+                page_content=chunk,
+                metadata=chunk_metadata
+            ))
+            chunk_ids.append(chunk_id)
+
+        # 批量添加文档
+        self._add_documents(documents, chunk_ids)
+        logger.info(f"已索引文档 {doc_id}，共 {len(chunks)} 个块")

    def _add_documents(self, documents: List[SimpleDocument], doc_ids: List[str]):
        """批量添加文档到向量索引"""
        if not documents:
            return

+        # 总是将文档存储在内存中（用于 BM25 和关键词搜索）
+        for doc, did in zip(documents, doc_ids):
+            self.documents.append({"id": did, "content": doc.page_content, "metadata": doc.metadata})
+            self.doc_ids.append(did)
+
+        # 构建 BM25 索引
+        if self._bm25_enabled and documents:
+            bm25_texts = [doc.page_content for doc in documents]
+            if self.bm25 is None:
+                self.bm25 = BM25()
+                self.bm25.fit(bm25_texts, doc_ids)
+            else:
+                # 增量添加：重新构建（BM25 不支持增量）
+                all_texts = [d["content"] for d in self.documents]
+                all_ids = self.doc_ids.copy()
+                self.bm25 = BM25()
+                self.bm25.fit(all_texts, all_ids)
+            logger.debug(f"BM25 索引更新: {len(documents)} 个文档")
+
+        # 如果没有嵌入模型，跳过向量索引
+        if self.embedding_model is None:
+            logger.debug(f"文档跳过向量索引 (无嵌入模型): {len(documents)} 个文档")
+            return
+
        texts = [doc.page_content for doc in documents]
        embeddings = self.embedding_model.encode(texts, convert_to_numpy=True)
        embeddings = self._normalize_vectors(embeddings).astype('float32')
@@ -162,12 +429,18 @@ class RAGService:
        id_array = np.array(id_list, dtype='int64')
        self.index.add_with_ids(embeddings, id_array)

-        for doc, did in zip(documents, doc_ids):
-            self.documents.append({"id": did, "content": doc.page_content, "metadata": doc.metadata})
-            self.doc_ids.append(did)
+    def retrieve(self, query: str, top_k: int = 5, min_score: float = 0.3) -> List[Dict[str, Any]]:
+        """
+        根据查询检索相关文档块（混合检索：向量 + BM25）

-    def retrieve(self, query: str, top_k: int = 5) -> List[Dict[str, Any]]:
-        """根据查询检索相关文档"""
+        Args:
+            query: 查询文本
+            top_k: 返回的最大结果数
+            min_score: 最低相似度分数阈值
+
+        Returns:
+            相关文档块列表，每项包含 content, metadata, score, doc_id, chunk_index
+        """
        if self._disabled:
            logger.info(f"[RAG DISABLED] 检索操作已跳过: query={query}, top_k={top_k}")
            return []
@@ -175,28 +448,241 @@ class RAGService:
        if not self._initialized:
            self._init_vector_store()

-        if self.index is None or self.index.ntotal == 0:
+        # 获取向量检索结果
+        vector_results = self._vector_search(query, top_k * 2, min_score)
+
+        # 获取 BM25 检索结果
+        bm25_results = self._bm25_search(query, top_k * 2)
+
+        # 混合融合
+        hybrid_results = self._hybrid_fusion(vector_results, bm25_results, top_k)
+
+        if hybrid_results:
+            logger.info(f"混合检索到 {len(hybrid_results)} 条相关文档块 (向量:{len(vector_results)}, BM25:{len(bm25_results)})")
+            return hybrid_results
+
+        # 降级：只使用 BM25
+        if bm25_results:
+            logger.info(f"降级到 BM25 检索: {len(bm25_results)} 条")
+            return bm25_results
+
+        # 降级：使用关键词搜索
+        logger.info("降级到关键词搜索")
+        return self._keyword_search(query, top_k)
+
+    def _vector_search(self, query: str, top_k: int, min_score: float) -> List[Dict[str, Any]]:
+        """向量检索"""
+        if self.index is None or self.index.ntotal == 0 or self.embedding_model is None:
            return []

-        query_embedding = self.embedding_model.encode([query], convert_to_numpy=True)
-        query_embedding = self._normalize_vectors(query_embedding).astype('float32')
+        try:
+            query_embedding = self.embedding_model.encode([query], convert_to_numpy=True)
+            query_embedding = self._normalize_vectors(query_embedding).astype('float32')

-        scores, indices = self.index.search(query_embedding, min(top_k, self.index.ntotal))
+            scores, indices = self.index.search(query_embedding, min(top_k * 2, self.index.ntotal))

-        results = []
-        for score, idx in zip(scores[0], indices[0]):
-            if idx < 0:
-                continue
-            doc = self.documents[idx]
-            results.append({
-                "content": doc["content"],
-                "metadata": doc["metadata"],
-                "score": float(score),
-                "doc_id": doc["id"]
+            results = []
+            for score, idx in zip(scores[0], indices[0]):
+                if idx < 0:
+                    continue
+                if score < min_score:
+                    continue
+                doc = self.documents[idx]
+                results.append({
+                    "content": doc["content"],
+                    "metadata": doc["metadata"],
+                    "score": float(score),
+                    "doc_id": doc["id"],
+                    "chunk_index": doc["metadata"].get("chunk_index", 0),
+                    "search_type": "vector"
+                })
+
+            return results
+        except Exception as e:
+            logger.warning(f"向量检索失败: {e}")
+            return []
+
+    def _bm25_search(self, query: str, top_k: int) -> List[Dict[str, Any]]:
+        """BM25 检索"""
+        if not self.bm25 or not self.documents:
+            return []
+
+        try:
+            bm25_scores = self.bm25.get_scores(query)
+            if not bm25_scores:
+                return []
+
+            # 归一化 BM25 分数到 [0, 1]
+            max_score = max(bm25_scores) if bm25_scores else 1
+            min_score_bm = min(bm25_scores) if bm25_scores else 0
+            score_range = max_score - min_score_bm if max_score != min_score_bm else 1
+
+            results = []
+            for idx, score in enumerate(bm25_scores):
+                if score <= 0:
+                    continue
+                # 归一化
+                normalized_score = (score - min_score_bm) / score_range if score_range > 0 else 0
+                doc = self.documents[idx]
+                results.append({
+                    "content": doc["content"],
+                    "metadata": doc["metadata"],
+                    "score": float(normalized_score),
+                    "doc_id": doc["id"],
+                    "chunk_index": doc["metadata"].get("chunk_index", 0),
+                    "search_type": "bm25"
+                })
+
+            # 按分数降序
+            results.sort(key=lambda x: x["score"], reverse=True)
+            return results[:top_k]
+
+        except Exception as e:
+            logger.warning(f"BM25 检索失败: {e}")
+            return []
+
+    def _hybrid_fusion(
+        self,
+        vector_results: List[Dict[str, Any]],
+        bm25_results: List[Dict[str, Any]],
+        top_k: int
+    ) -> List[Dict[str, Any]]:
+        """
+        混合融合向量和 BM25 检索结果
+
+        使用 RRFR (Reciprocal Rank Fusion) 算法:
+        Score = weight_vector * (1 / rank_vector) + weight_bm25 * (1 / rank_bm25)
+
+        Args:
+            vector_results: 向量检索结果
+            bm25_results: BM25 检索结果
+            top_k: 返回数量
+
+        Returns:
+            融合后的结果
+        """
+        if not vector_results and not bm25_results:
+            return []
+
+        # 融合权重
+        weight_vector = 0.6
+        weight_bm25 = 0.4
+
+        # 构建文档分数映射
+        doc_scores: Dict[str, Dict[str, float]] = {}
+
+        # 添加向量检索结果
+        for rank, result in enumerate(vector_results):
+            doc_id = result["doc_id"]
+            if doc_id not in doc_scores:
+                doc_scores[doc_id] = {"vector": 0, "bm25": 0, "content": result["content"], "metadata": result["metadata"]}
+            # 使用倒数排名 (Reciprocal Rank)
+            doc_scores[doc_id]["vector"] = weight_vector / (rank + 1)
+
+        # 添加 BM25 检索结果
+        for rank, result in enumerate(bm25_results):
+            doc_id = result["doc_id"]
+            if doc_id not in doc_scores:
+                doc_scores[doc_id] = {"vector": 0, "bm25": 0, "content": result["content"], "metadata": result["metadata"]}
+            doc_scores[doc_id]["bm25"] = weight_bm25 / (rank + 1)
+
+        # 计算融合分数
+        fused_results = []
+        for doc_id, scores in doc_scores.items():
+            fused_score = scores["vector"] + scores["bm25"]
+            # 使用向量检索结果的原始分数作为参考
+            vector_score = next((r["score"] for r in vector_results if r["doc_id"] == doc_id), 0.5)
+            fused_results.append({
+                "content": scores["content"],
+                "metadata": scores["metadata"],
+                "score": fused_score,
+                "doc_id": doc_id,
+                "vector_score": vector_score,
+                "bm25_score": scores["bm25"],
+                "search_type": "hybrid"
            })

-        logger.debug(f"检索到 {len(results)} 条相关文档")
-        return results
+        # 按融合分数降序排序
+        fused_results.sort(key=lambda x: x["score"], reverse=True)
+
+        logger.debug(f"混合融合: {len(fused_results)} 个文档, 向量:{len(vector_results)}, BM25:{len(bm25_results)}")
+
+        return fused_results[:top_k]
+
+    def _keyword_search(self, query: str, top_k: int = 5) -> List[Dict[str, Any]]:
+        """
+        关键词搜索后备方案
+
+        Args:
+            query: 查询文本
+            top_k: 返回的最大结果数
+
+        Returns:
+            相关文档块列表
+        """
+        if not self.documents:
+            return []
+
+        # 提取查询关键词
+        keywords = []
+        for char in query:
+            if '\u4e00' <= char <= '\u9fff':  # 中文字符
+                keywords.append(char)
+        # 添加英文单词
+        import re
+        english_words = re.findall(r'[a-zA-Z]+', query)
+        keywords.extend(english_words)
+
+        if not keywords:
+            return []
+
+        results = []
+        for doc in self.documents:
+            content = doc["content"]
+            # 计算关键词匹配分数
+            score = 0
+            matched_keywords = 0
+            for kw in keywords:
+                if kw in content:
+                    score += 1
+                    matched_keywords += 1
+
+            if matched_keywords > 0:
+                # 归一化分数
+                score = score / max(len(keywords), 1)
+                results.append({
+                    "content": content,
+                    "metadata": doc["metadata"],
+                    "score": score,
+                    "doc_id": doc["id"],
+                    "chunk_index": doc["metadata"].get("chunk_index", 0)
+                })
+
+        # 按分数排序
+        results.sort(key=lambda x: x["score"], reverse=True)
+
+        logger.debug(f"关键词搜索返回 {len(results[:top_k])} 条结果")
+        return results[:top_k]
+
+    def retrieve_by_doc_id(self, doc_id: str, top_k: int = 10) -> List[Dict[str, Any]]:
+        """
+        获取指定文档的所有块
+
+        Args:
+            doc_id: 文档ID
+            top_k: 返回的最大结果数
+
+        Returns:
+            该文档的所有块
+        """
+        # 获取属于该文档的所有块
+        doc_chunks = [d for d in self.documents if d["metadata"].get("doc_id") == doc_id]
+
+        # 按 chunk_index 排序
+        doc_chunks.sort(key=lambda x: x["metadata"].get("chunk_index", 0))
+
+        # 返回指定数量
+        return doc_chunks[:top_k]

    def retrieve_by_table(self, table_name: str, top_k: int = 5) -> List[Dict[str, Any]]:
        """检索指定表的字段"""
--- a/backend/app/services/template_fill_service.py
+++ b/backend/app/services/template_fill_service.py
--- a/backend/app/services/word_ai_service.py
+++ b/backend/app/services/word_ai_service.py
@@ -0,0 +1,639 @@
+"""
+Word 文档 AI 解析服务
+
+使用 LLM (GLM) 对 Word 文档进行深度理解，提取结构化数据
+"""
+import logging
+from typing import Dict, Any, List, Optional
+import json
+
+from app.services.llm_service import llm_service
+from app.core.document_parser.docx_parser import DocxParser
+
+logger = logging.getLogger(__name__)
+
+
+class WordAIService:
+    """Word 文档 AI 解析服务"""
+
+    def __init__(self):
+        self.llm = llm_service
+        self.parser = DocxParser()
+
+    async def parse_word_with_ai(
+        self,
+        file_path: str,
+        user_hint: str = ""
+    ) -> Dict[str, Any]:
+        """
+        使用 AI 解析 Word 文档，提取结构化数据
+
+        适用于从非结构化的 Word 文档中提取表格数据、键值对等信息
+
+        Args:
+            file_path: Word 文件路径
+            user_hint: 用户提示词，指定要提取的内容类型
+
+        Returns:
+            Dict: 包含结构化数据的解析结果
+        """
+        try:
+            # 1. 先用基础解析器提取原始内容
+            parse_result = self.parser.parse(file_path)
+
+            if not parse_result.success:
+                return {
+                    "success": False,
+                    "error": parse_result.error,
+                    "structured_data": None
+                }
+
+            # 2. 获取原始数据
+            raw_data = parse_result.data
+            paragraphs = raw_data.get("paragraphs", [])
+            paragraphs_with_style = raw_data.get("paragraphs_with_style", [])
+            tables = raw_data.get("tables", [])
+            content = raw_data.get("content", "")
+            images_info = raw_data.get("images", {})
+            metadata = parse_result.metadata or {}
+
+            image_count = images_info.get("image_count", 0)
+            image_descriptions = images_info.get("descriptions", [])
+
+            logger.info(f"Word 基础解析完成: {len(paragraphs)} 个段落, {len(tables)} 个表格, {image_count} 张图片")
+
+            # 3. 提取图片数据（用于视觉分析）
+            images_base64 = []
+            if image_count > 0:
+                try:
+                    images_base64 = self.parser.extract_images_as_base64(file_path)
+                    logger.info(f"提取到 {len(images_base64)} 张图片的 base64 数据")
+                except Exception as e:
+                    logger.warning(f"提取图片 base64 失败: {str(e)}")
+
+            # 4. 根据内容类型选择 AI 解析策略
+            # 如果有图片，先分析图片
+            image_analysis = ""
+            if images_base64:
+                image_analysis = await self._analyze_images_with_ai(images_base64, user_hint)
+                logger.info(f"图片 AI 分析完成: {len(image_analysis)} 字符")
+
+            # 优先处理：表格 > (表格+文本) > 纯文本
+            if tables and len(tables) > 0:
+                structured_data = await self._extract_tables_with_ai(
+                    tables, paragraphs, image_count, user_hint, metadata, image_analysis
+                )
+            elif paragraphs and len(paragraphs) > 0:
+                structured_data = await self._extract_from_text_with_ai(
+                    paragraphs, content, image_count, image_descriptions, user_hint, image_analysis
+                )
+            else:
+                structured_data = {
+                    "success": True,
+                    "type": "empty",
+                    "message": "文档内容为空"
+                }
+
+            # 添加图片分析结果
+            if image_analysis:
+                structured_data["image_analysis"] = image_analysis
+
+            return structured_data
+
+        except Exception as e:
+            logger.error(f"AI 解析 Word 文档失败: {str(e)}")
+            return {
+                "success": False,
+                "error": str(e),
+                "structured_data": None
+            }
+
+    async def _extract_tables_with_ai(
+        self,
+        tables: List[Dict],
+        paragraphs: List[str],
+        image_count: int,
+        user_hint: str,
+        metadata: Dict,
+        image_analysis: str = ""
+    ) -> Dict[str, Any]:
+        """
+        使用 AI 从 Word 表格和文本中提取结构化数据
+
+        Args:
+            tables: 表格列表
+            paragraphs: 段落列表
+            image_count: 图片数量
+            user_hint: 用户提示
+            metadata: 文档元数据
+            image_analysis: 图片 AI 分析结果
+
+        Returns:
+            结构化数据
+        """
+        try:
+            # 构建表格文本描述
+            tables_text = self._build_tables_description(tables)
+
+            # 构建段落描述
+            paragraphs_text = "\n".join(paragraphs[:50]) if paragraphs else "（无正文文本）"
+            if len(paragraphs) > 50:
+                paragraphs_text += f"\n...（共 {len(paragraphs)} 个段落，仅显示前50个）"
+
+            # 图片提示
+            image_hint = f"注意：此文档包含 {image_count} 张图片/图表。" if image_count > 0 else ""
+
+            prompt = f"""你是一个专业的数据提取专家。请从以下 Word 文档的完整内容中提取结构化数据。
+
+【用户需求】
+{user_hint if user_hint else "请提取文档中的所有结构化数据，包括表格数据、键值对、列表项等。"}
+
+【文档正文（段落）】
+{paragraphs_text}
+
+【文档表格】
+{tables_text}
+
+【文档图片信息】
+{image_hint}
+
+请按照以下 JSON 格式输出：
+{{
+    "type": "table_data",
+    "headers": ["列1", "列2", ...],
+    "rows": [["行1列1", "行1列2", ...], ["行2列1", "行2列2", ...], ...],
+    "key_values": {{"键1": "值1", "键2": "值2", ...}},
+    "list_items": ["项1", "项2", ...],
+    "description": "文档内容描述"
+}}
+
+重点：
+- 优先从表格中提取结构化数据
+- 如果表格中有表头，headers 是表头，rows 是数据行
+- 如果文档中有键值对（如 名称: 张三），提取到 key_values 中
+- 如果文档中有列表项，提取到 list_items 中
+- 图片内容无法直接提取，但请在 description 中说明图片的大致主题（如"包含流程图"、"包含数据图表"等）
+"""
+
+            messages = [
+                {"role": "system", "content": "你是一个专业的数据提取助手。请严格按JSON格式输出。"},
+                {"role": "user", "content": prompt}
+            ]
+
+            response = await self.llm.chat(
+                messages=messages,
+                temperature=0.1,
+                max_tokens=50000
+            )
+
+            content = self.llm.extract_message_content(response)
+
+            # 解析 JSON
+            result = self._parse_json_response(content)
+
+            if result:
+                logger.info(f"AI 表格提取成功: {len(result.get('rows', []))} 行数据, key_values={len(result.get('key_values', {}))}, list_items={len(result.get('list_items', []))}")
+                return {
+                    "success": True,
+                    "type": "table_data",
+                    "headers": result.get("headers", []),
+                    "rows": result.get("rows", []),
+                    "description": result.get("description", ""),
+                    "key_values": result.get("key_values", {}),
+                    "list_items": result.get("list_items", [])
+                }
+            else:
+                # 如果 AI 返回格式不对，尝试直接解析表格
+                return self._fallback_table_parse(tables)
+
+        except Exception as e:
+            logger.error(f"AI 表格提取失败: {str(e)}")
+            return self._fallback_table_parse(tables)
+
+    async def _extract_from_text_with_ai(
+        self,
+        paragraphs: List[str],
+        full_text: str,
+        image_count: int,
+        image_descriptions: List[str],
+        user_hint: str,
+        image_analysis: str = ""
+    ) -> Dict[str, Any]:
+        """
+        使用 AI 从 Word 纯文本中提取结构化数据
+
+        Args:
+            paragraphs: 段落列表
+            full_text: 完整文本
+            image_count: 图片数量
+            image_descriptions: 图片描述列表
+            user_hint: 用户提示
+            image_analysis: 图片 AI 分析结果
+
+        Returns:
+            结构化数据
+        """
+        try:
+            # 限制文本长度
+            text_preview = full_text[:8000] if len(full_text) > 8000 else full_text
+
+            # 图片提示
+            image_hint = f"\n【文档图片】此文档包含 {image_count} 张图片/图表。" if image_count > 0 else ""
+            if image_descriptions:
+                image_hint += "\n" + "\n".join(image_descriptions)
+
+            prompt = f"""你是一个专业的数据提取专家。请从以下 Word 文档的完整内容中提取结构化数据。
+
+【用户需求】
+{user_hint if user_hint else "请识别并提取文档中的关键信息，包括：表格数据、键值对、列表项等。"}
+
+【文档正文】{image_hint}
+{text_preview}
+
+请按照以下 JSON 格式输出：
+{{
+    "type": "structured_text",
+    "tables": [{{"headers": [...], "rows": [...]}}],
+    "key_values": {{"键1": "值1", "键2": "值2", ...}},
+    "list_items": ["项1", "项2", ...],
+    "summary": "文档内容摘要"
+}}
+
+重点：
+- 如果文档包含表格数据，提取到 tables 中
+- 如果文档包含键值对（如 名称: 张三），提取到 key_values 中
+- 如果文档包含列表项，提取到 list_items 中
+- 如果文档包含图片，请根据上下文推断图片内容（如"流程图"、"数据折线图"等）并在 description 中说明
+- 如果无法提取到结构化数据，至少提供一个详细的摘要
+"""
+
+            messages = [
+                {"role": "system", "content": "你是一个专业的数据提取助手。请严格按JSON格式输出。"},
+                {"role": "user", "content": prompt}
+            ]
+
+            response = await self.llm.chat(
+                messages=messages,
+                temperature=0.1,
+                max_tokens=50000
+            )
+
+            content = self.llm.extract_message_content(response)
+
+            result = self._parse_json_response(content)
+
+            if result:
+                logger.info(f"AI 文本提取成功: type={result.get('type')}")
+                return {
+                    "success": True,
+                    "type": result.get("type", "structured_text"),
+                    "tables": result.get("tables", []),
+                    "key_values": result.get("key_values", {}),
+                    "list_items": result.get("list_items", []),
+                    "summary": result.get("summary", ""),
+                    "raw_text_preview": text_preview[:500]
+                }
+            else:
+                return {
+                    "success": True,
+                    "type": "text",
+                    "summary": text_preview[:500],
+                    "raw_text_preview": text_preview[:500]
+                }
+
+        except Exception as e:
+            logger.error(f"AI 文本提取失败: {str(e)}")
+            return {
+                "success": False,
+                "error": str(e)
+            }
+
+    async def _analyze_images_with_ai(
+        self,
+        images: List[Dict[str, str]],
+        user_hint: str = ""
+    ) -> str:
+        """
+        使用视觉模型分析 Word 文档中的图片
+
+        Args:
+            images: 图片列表，每项包含 base64 和 mime_type
+            user_hint: 用户提示
+
+        Returns:
+            图片分析结果文本
+        """
+        try:
+            # 调用 LLM 的视觉分析功能
+            result = await self.llm.analyze_images(
+                images=images,
+                user_prompt=user_hint or "请详细描述图片内容，提取所有文字和数据信息。"
+            )
+
+            if result.get("success"):
+                analysis = result.get("analysis", {})
+                if isinstance(analysis, dict):
+                    description = analysis.get("description", "")
+                    text_content = analysis.get("text_content", "")
+                    data_extracted = analysis.get("data_extracted", {})
+
+                    result_text = f"【图片分析结果】\n{description}"
+                    if text_content:
+                        result_text += f"\n\n【图片中的文字】\n{text_content}"
+                    if data_extracted:
+                        result_text += f"\n\n【提取的数据】\n{json.dumps(data_extracted, ensure_ascii=False)}"
+                    return result_text
+                else:
+                    return str(analysis)
+            else:
+                logger.warning(f"图片 AI 分析失败: {result.get('error')}")
+                return ""
+
+        except Exception as e:
+            logger.error(f"图片 AI 分析异常: {str(e)}")
+            return ""
+
+    def _build_tables_description(self, tables: List[Dict]) -> str:
+        """构建表格的文本描述"""
+        result = []
+
+        for idx, table in enumerate(tables):
+            rows = table.get("rows", [])
+            if not rows:
+                continue
+
+            result.append(f"\n--- 表格 {idx + 1} ---")
+
+            for row_idx, row in enumerate(rows[:50]):  # 限制每表格最多50行
+                if isinstance(row, list):
+                    result.append(" | ".join(str(cell).strip() for cell in row))
+                elif isinstance(row, dict):
+                    result.append(str(row))
+
+            if len(rows) > 50:
+                result.append(f"...（共 {len(rows)} 行，仅显示前50行）")
+
+        return "\n".join(result) if result else "（无表格内容）"
+
+    def _parse_json_response(self, content: str) -> Optional[Dict]:
+        """解析 JSON 响应，处理各种格式问题"""
+        import re
+
+        # 清理 markdown 标记
+        cleaned = content.strip()
+        cleaned = re.sub(r'^```json\s*', '', cleaned, flags=re.MULTILINE)
+        cleaned = re.sub(r'^```\s*', '', cleaned, flags=re.MULTILINE)
+        cleaned = cleaned.strip()
+
+        # 找到 JSON 开始位置
+        json_start = -1
+        for i, c in enumerate(cleaned):
+            if c == '{':
+                json_start = i
+                break
+
+        if json_start == -1:
+            logger.warning("无法找到 JSON 开始位置")
+            return None
+
+        json_text = cleaned[json_start:]
+
+        # 尝试直接解析
+        try:
+            return json.loads(json_text)
+        except json.JSONDecodeError:
+            pass
+
+        # 尝试修复并解析
+        try:
+            # 找到闭合括号
+            depth = 0
+            end_pos = -1
+            for i, c in enumerate(json_text):
+                if c == '{':
+                    depth += 1
+                elif c == '}':
+                    depth -= 1
+                    if depth == 0:
+                        end_pos = i + 1
+                        break
+
+            if end_pos > 0:
+                fixed = json_text[:end_pos]
+                # 移除末尾逗号
+                fixed = re.sub(r',\s*([}]])', r'\1', fixed)
+                return json.loads(fixed)
+        except Exception as e:
+            logger.warning(f"JSON 修复失败: {e}")
+
+        return None
+
+    def _fallback_table_parse(self, tables: List[Dict]) -> Dict[str, Any]:
+        """当 AI 解析失败时，直接解析表格"""
+        if not tables:
+            return {
+                "success": True,
+                "type": "empty",
+                "data": {},
+                "message": "无表格内容"
+            }
+
+        all_rows = []
+        all_headers = None
+
+        for table in tables:
+            rows = table.get("rows", [])
+            if not rows:
+                continue
+
+            # 查找真正的表头行（跳过标题行）
+            header_row_idx = 0
+            for idx, row in enumerate(rows[:5]):  # 只检查前5行
+                if not isinstance(row, list):
+                    continue
+                # 如果某一行包含"表"字开头且单元格内容很长，这可能是标题行
+                first_cell = str(row[0]) if row else ""
+                if first_cell.startswith("表") and len(first_cell) > 15:
+                    header_row_idx = idx + 1
+                    continue
+                # 如果某一行有超过3个空单元格，可能是无效行
+                empty_count = sum(1 for cell in row if not str(cell).strip())
+                if empty_count > 3:
+                    header_row_idx = idx + 1
+                    continue
+                # 找到第一行看起来像表头的行（短单元格，大部分有内容）
+                avg_len = sum(len(str(c)) for c in row) / len(row) if row else 0
+                if avg_len < 20:  # 表头通常比数据行短
+                    header_row_idx = idx
+                    break
+
+            if header_row_idx >= len(rows):
+                continue
+
+            # 使用找到的表头行
+            if rows and isinstance(rows[header_row_idx], list):
+                headers = rows[header_row_idx]
+                if all_headers is None:
+                    all_headers = headers
+
+                # 数据行（从表头之后开始）
+                for row in rows[header_row_idx + 1:]:
+                    if isinstance(row, list) and len(row) == len(headers):
+                        all_rows.append(row)
+
+        if all_headers and all_rows:
+            return {
+                "success": True,
+                "type": "table_data",
+                "headers": all_headers,
+                "rows": all_rows,
+                "description": "直接从 Word 表格提取"
+            }
+
+        return {
+            "success": True,
+            "type": "raw",
+            "tables": tables,
+            "message": "表格数据（未AI处理）"
+        }
+
+    async def fill_template_with_ai(
+        self,
+        file_path: str,
+        template_fields: List[Dict[str, Any]],
+        user_hint: str = ""
+    ) -> Dict[str, Any]:
+        """
+        使用 AI 解析 Word 文档并填写模板
+
+        这是主要入口函数，前端调用此函数即可完成：
+        1. AI 解析 Word 文档
+        2. 根据模板字段提取数据
+        3. 返回填写结果
+
+        Args:
+            file_path: Word 文件路径
+            template_fields: 模板字段列表 [{"name": "字段名", "hint": "提示词"}, ...]
+            user_hint: 用户提示
+
+        Returns:
+            填写结果
+        """
+        try:
+            # 1. AI 解析文档
+            parse_result = await self.parse_word_with_ai(file_path, user_hint)
+
+            if not parse_result.get("success"):
+                return {
+                    "success": False,
+                    "error": parse_result.get("error", "解析失败"),
+                    "filled_data": {},
+                    "source": "ai_parse_failed"
+                }
+
+            # 2. 根据字段类型提取数据
+            filled_data = {}
+            extract_details = []
+
+            parse_type = parse_result.get("type", "")
+
+            if parse_type == "table_data":
+                # 表格数据：直接匹配列名
+                headers = parse_result.get("headers", [])
+                rows = parse_result.get("rows", [])
+
+                for field in template_fields:
+                    field_name = field.get("name", "")
+                    values = self._extract_field_from_table(headers, rows, field_name)
+                    filled_data[field_name] = values
+                    extract_details.append({
+                        "field": field_name,
+                        "values": values,
+                        "source": "ai_table_extraction",
+                        "confidence": 0.9 if values else 0.0
+                    })
+
+            elif parse_type == "structured_text":
+                # 结构化文本：尝试从 key_values 和 list_items 提取
+                key_values = parse_result.get("key_values", {})
+                list_items = parse_result.get("list_items", [])
+
+                for field in template_fields:
+                    field_name = field.get("name", "")
+                    value = key_values.get(field_name, "")
+                    if not value and list_items:
+                        value = list_items[0] if list_items else ""
+                    filled_data[field_name] = [value] if value else []
+                    extract_details.append({
+                        "field": field_name,
+                        "values": [value] if value else [],
+                        "source": "ai_text_extraction",
+                        "confidence": 0.7 if value else 0.0
+                    })
+
+            else:
+                # 其他类型：返回原始解析结果供后续处理
+                for field in template_fields:
+                    field_name = field.get("name", "")
+                    filled_data[field_name] = []
+                    extract_details.append({
+                        "field": field_name,
+                        "values": [],
+                        "source": "no_ai_data",
+                        "confidence": 0.0
+                    })
+
+            # 3. 返回结果
+            max_rows = max(len(v) for v in filled_data.values()) if filled_data else 1
+
+            return {
+                "success": True,
+                "filled_data": filled_data,
+                "fill_details": extract_details,
+                "ai_parse_result": {
+                    "type": parse_type,
+                    "description": parse_result.get("description", "")
+                },
+                "source_doc_count": 1,
+                "max_rows": max_rows
+            }
+
+        except Exception as e:
+            logger.error(f"AI 填表失败: {str(e)}")
+            return {
+                "success": False,
+                "error": str(e),
+                "filled_data": {},
+                "fill_details": []
+            }
+
+    def _extract_field_from_table(
+        self,
+        headers: List[str],
+        rows: List[List],
+        field_name: str
+    ) -> List[str]:
+        """从表格中提取指定字段的值"""
+        # 查找匹配的列
+        target_col_idx = None
+        for col_idx, header in enumerate(headers):
+            if field_name.lower() in str(header).lower() or str(header).lower() in field_name.lower():
+                target_col_idx = col_idx
+                break
+
+        if target_col_idx is None:
+            return []
+
+        # 提取该列所有值
+        values = []
+        for row in rows:
+            if isinstance(row, list) and target_col_idx < len(row):
+                val = str(row[target_col_idx]).strip()
+                if val:
+                    values.append(val)
+
+        return values
+
+
+# 全局单例
+word_ai_service = WordAIService()
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -1,4 +1,4 @@
-# ============================================================
+# ============================================================
 # 基于大语言模型的文档理解与多源数据融合系统
 # Python 依赖清单
 # ============================================================
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@@ -1,13 +1,16 @@
 import { RouterProvider } from 'react-router-dom';
-import { AuthProvider } from '@/context/AuthContext';
+import { AuthProvider } from '@/contexts/AuthContext';
+import { TemplateFillProvider } from '@/context/TemplateFillContext';
 import { router } from '@/routes';
 import { Toaster } from 'sonner';

 function App() {
  return (
    <AuthProvider>
-      <RouterProvider router={router} />
-      <Toaster position="top-right" richColors closeButton />
+      <TemplateFillProvider>
+        <RouterProvider router={router} />
+        <Toaster position="top-right" richColors closeButton />
+      </TemplateFillProvider>
    </AuthProvider>
  );
 }
--- a/frontend/src/components/common/RouteGuard.tsx
+++ b/frontend/src/components/common/RouteGuard.tsx
@@ -1,6 +1,6 @@
 import React from 'react';
 import { Navigate, useLocation } from 'react-router-dom';
-import { useAuth } from '@/context/AuthContext';
+import { useAuth } from '@/contexts/AuthContext';

 export const RouteGuard: React.FC<{ children: React.ReactNode }> = ({ children }) => {
  const { user, loading } = useAuth();
--- a/frontend/src/context/AuthContext.tsx
+++ b/frontend/src/context/AuthContext.tsx
@@ -1,85 +0,0 @@
-import React, { createContext, useContext, useEffect, useState } from 'react';
-import { supabase } from '@/db/supabase';
-import { User } from '@supabase/supabase-js';
-import { Profile } from '@/types/types';
-
-interface AuthContextType {
-  user: User | null;
-  profile: Profile | null;
-  signIn: (email: string, password: string) => Promise<{ error: any }>;
-  signUp: (email: string, password: string) => Promise<{ error: any }>;
-  signOut: () => Promise<{ error: any }>;
-  loading: boolean;
-}
-
-const AuthContext = createContext<AuthContextType | undefined>(undefined);
-
-export const AuthProvider: React.FC<{ children: React.ReactNode }> = ({ children }) => {
-  const [user, setUser] = useState<User | null>(null);
-  const [profile, setProfile] = useState<Profile | null>(null);
-  const [loading, setLoading] = useState(true);
-
-  useEffect(() => {
-    // Check active sessions and sets the user
-    supabase.auth.getSession().then(({ data: { session } }) => {
-      setUser(session?.user ?? null);
-      if (session?.user) fetchProfile(session.user.id);
-      else setLoading(false);
-    });
-
-    // Listen for changes on auth state (sign in, sign out, etc.)
-    const { data: { subscription } } = supabase.auth.onAuthStateChange((_event, session) => {
-      setUser(session?.user ?? null);
-      if (session?.user) fetchProfile(session.user.id);
-      else {
-        setProfile(null);
-        setLoading(false);
-      }
-    });
-
-    return () => subscription.unsubscribe();
-  }, []);
-
-  const fetchProfile = async (uid: string) => {
-    try {
-      const { data, error } = await supabase
-        .from('profiles')
-        .select('*')
-        .eq('id', uid)
-        .maybeSingle();
-      
-      if (error) throw error;
-      setProfile(data);
-    } catch (err) {
-      console.error('Error fetching profile:', err);
-    } finally {
-      setLoading(false);
-    }
-  };
-
-  const signIn = async (email: string, password: string) => {
-    return await supabase.auth.signInWithPassword({ email, password });
-  };
-
-  const signUp = async (email: string, password: string) => {
-    return await supabase.auth.signUp({ email, password });
-  };
-
-  const signOut = async () => {
-    return await supabase.auth.signOut();
-  };
-
-  return (
-    <AuthContext.Provider value={{ user, profile, signIn, signUp, signOut, loading }}>
-      {children}
-    </AuthContext.Provider>
-  );
-};
-
-export const useAuth = () => {
-  const context = useContext(AuthContext);
-  if (context === undefined) {
-    throw new Error('useAuth must be used within an AuthProvider');
-  }
-  return context;
-};
--- a/frontend/src/context/TemplateFillContext.tsx
+++ b/frontend/src/context/TemplateFillContext.tsx
@@ -0,0 +1,136 @@
+import React, { createContext, useContext, useState, ReactNode } from 'react';
+
+type SourceFile = {
+  file: File;
+  preview?: string;
+};
+
+type TemplateField = {
+  cell: string;
+  name: string;
+  field_type: string;
+  required: boolean;
+  hint?: string;
+};
+
+type Step = 'upload' | 'filling' | 'preview';
+
+interface TemplateFillState {
+  step: Step;
+  templateFile: File | null;
+  templateFields: TemplateField[];
+  sourceFiles: SourceFile[];
+  sourceFilePaths: string[];
+  sourceDocIds: string[];
+  templateId: string;
+  filledResult: any;
+  setStep: (step: Step) => void;
+  setTemplateFile: (file: File | null) => void;
+  setTemplateFields: (fields: TemplateField[]) => void;
+  setSourceFiles: (files: SourceFile[]) => void;
+  addSourceFiles: (files: SourceFile[]) => void;
+  removeSourceFile: (index: number) => void;
+  setSourceFilePaths: (paths: string[]) => void;
+  setSourceDocIds: (ids: string[]) => void;
+  addSourceDocId: (id: string) => void;
+  removeSourceDocId: (id: string) => void;
+  setTemplateId: (id: string) => void;
+  setFilledResult: (result: any) => void;
+  reset: () => void;
+}
+
+const initialState = {
+  step: 'upload' as Step,
+  templateFile: null,
+  templateFields: [],
+  sourceFiles: [],
+  sourceFilePaths: [],
+  sourceDocIds: [],
+  templateId: '',
+  filledResult: null,
+  setStep: () => {},
+  setTemplateFile: () => {},
+  setTemplateFields: () => {},
+  setSourceFiles: () => {},
+  addSourceFiles: () => {},
+  removeSourceFile: () => {},
+  setSourceFilePaths: () => {},
+  setSourceDocIds: () => {},
+  addSourceDocId: () => {},
+  removeSourceDocId: () => {},
+  setTemplateId: () => {},
+  setFilledResult: () => {},
+  reset: () => {},
+};
+
+const TemplateFillContext = createContext<TemplateFillState>(initialState);
+
+export const TemplateFillProvider: React.FC<{ children: ReactNode }> = ({ children }) => {
+  const [step, setStep] = useState<Step>('upload');
+  const [templateFile, setTemplateFile] = useState<File | null>(null);
+  const [templateFields, setTemplateFields] = useState<TemplateField[]>([]);
+  const [sourceFiles, setSourceFiles] = useState<SourceFile[]>([]);
+  const [sourceFilePaths, setSourceFilePaths] = useState<string[]>([]);
+  const [sourceDocIds, setSourceDocIds] = useState<string[]>([]);
+  const [templateId, setTemplateId] = useState<string>('');
+  const [filledResult, setFilledResult] = useState<any>(null);
+
+  const addSourceFiles = (files: SourceFile[]) => {
+    setSourceFiles(prev => [...prev, ...files]);
+  };
+
+  const removeSourceFile = (index: number) => {
+    setSourceFiles(prev => prev.filter((_, i) => i !== index));
+  };
+
+  const addSourceDocId = (id: string) => {
+    setSourceDocIds(prev => prev.includes(id) ? prev : [...prev, id]);
+  };
+
+  const removeSourceDocId = (id: string) => {
+    setSourceDocIds(prev => prev.filter(docId => docId !== id));
+  };
+
+  const reset = () => {
+    setStep('upload');
+    setTemplateFile(null);
+    setTemplateFields([]);
+    setSourceFiles([]);
+    setSourceFilePaths([]);
+    setSourceDocIds([]);
+    setTemplateId('');
+    setFilledResult(null);
+  };
+
+  return (
+    <TemplateFillContext.Provider
+      value={{
+        step,
+        templateFile,
+        templateFields,
+        sourceFiles,
+        sourceFilePaths,
+        sourceDocIds,
+        templateId,
+        filledResult,
+        setStep,
+        setTemplateFile,
+        setTemplateFields,
+        setSourceFiles,
+        addSourceFiles,
+        removeSourceFile,
+        setSourceFilePaths,
+        setSourceDocIds,
+        addSourceDocId,
+        removeSourceDocId,
+        setTemplateId,
+        setFilledResult,
+        reset,
+      }}
+    >
+      {children}
+    </TemplateFillContext.Provider>
+  );
+};
+
+export const useTemplateFill = () => useContext(TemplateFillContext);
--- a/frontend/src/db/backend-api.ts
+++ b/frontend/src/db/backend-api.ts
@@ -400,6 +400,49 @@ export const backendApi = {
    }
  },

+  /**
+   * 获取任务历史列表
+   */
+  async getTasks(
+    limit: number = 50,
+    skip: number = 0
+  ): Promise<{ success: boolean; tasks: any[]; count: number }> {
+    const url = `${BACKEND_BASE_URL}/tasks?limit=${limit}&skip=${skip}`;
+
+    try {
+      const response = await fetch(url);
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || '获取任务列表失败');
+      }
+      return await response.json();
+    } catch (error) {
+      console.error('获取任务列表失败:', error);
+      throw error;
+    }
+  },
+
+  /**
+   * 删除任务
+   */
+  async deleteTask(taskId: string): Promise<{ success: boolean; deleted: boolean }> {
+    const url = `${BACKEND_BASE_URL}/tasks/${taskId}`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'DELETE'
+      });
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || '删除任务失败');
+      }
+      return await response.json();
+    } catch (error) {
+      console.error('删除任务失败:', error);
+      throw error;
+    }
+  },
+
  /**
   * 轮询任务状态直到完成
   */
@@ -656,6 +699,46 @@ export const backendApi = {
    }
  },

+  /**
+   * 联合上传模板和源文档
+   */
+  async uploadTemplateAndSources(
+    templateFile: File,
+    sourceFiles: File[]
+  ): Promise<{
+    success: boolean;
+    template_id: string;
+    filename: string;
+    file_type: string;
+    fields: TemplateField[];
+    field_count: number;
+    source_file_paths: string[];
+    source_filenames: string[];
+    task_id: string;
+  }> {
+    const formData = new FormData();
+    formData.append('template_file', templateFile);
+    sourceFiles.forEach(file => formData.append('source_files', file));
+
+    const url = `${BACKEND_BASE_URL}/templates/upload-joint`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'POST',
+        body: formData,
+      });
+
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || '联合上传失败');
+      }
+      return await response.json();
+    } catch (error) {
+      console.error('联合上传失败:', error);
+      throw error;
+    }
+  },
+
  /**
   * 执行表格填写
   */
@@ -724,6 +807,41 @@ export const backendApi = {
    }
  },

+  /**
+   * 填充原始模板并导出
+   *
+   * 直接打开原始模板文件，将数据填入模板的表格/单元格中，然后导出
+   * 适用于比赛场景：保持原始模板格式不变
+   */
+  async fillAndExportTemplate(
+    templatePath: string,
+    filledData: Record<string, any>,
+    format: 'xlsx' | 'docx' = 'xlsx'
+  ): Promise<Blob> {
+    const url = `${BACKEND_BASE_URL}/templates/fill-and-export`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({
+          template_path: templatePath,
+          filled_data: filledData,
+          format,
+        }),
+      });
+
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || '填充模板失败');
+      }
+      return await response.blob();
+    } catch (error) {
+      console.error('填充模板失败:', error);
+      throw error;
+    }
+  },
+
  // ==================== Excel 专用接口 (保留兼容) ====================

  /**
@@ -1105,7 +1223,7 @@ export const aiApi = {

    try {
      const response = await fetch(url, {
-        method: 'GET',
+        method: 'POST',
        body: formData,
      });

@@ -1121,6 +1239,48 @@ export const aiApi = {
    }
  },

+  /**
+   * 上传并使用 AI 分析 TXT 文本文件，提取结构化数据
+   */
+  async analyzeTxt(
+    file: File
+  ): Promise<{
+    success: boolean;
+    filename?: string;
+    structured_data?: {
+      table?: {
+        columns?: string[];
+        rows?: string[][];
+      };
+      summary?: string;
+      key_value_pairs?: Array<{ key: string; value: string }>;
+      numeric_data?: Array<{ name: string; value: number; unit?: string }>;
+    };
+    error?: string;
+  }> {
+    const formData = new FormData();
+    formData.append('file', file);
+
+    const url = `${BACKEND_BASE_URL}/ai/analyze/txt`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'POST',
+        body: formData,
+      });
+
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || 'TXT AI 分析失败');
+      }
+
+      return await response.json();
+    } catch (error) {
+      console.error('TXT AI 分析失败:', error);
+      throw error;
+    }
+  },
+
  /**
   * 生成统计信息和图表
   */
@@ -1219,4 +1379,211 @@ export const aiApi = {
      throw error;
    }
  },
+
+  // ==================== Word AI 解析 ====================
+
+  /**
+   * 使用 AI 解析 Word 文档，提取结构化数据
+   */
+  async analyzeWordWithAI(
+    file: File,
+    userHint: string = ''
+  ): Promise<{
+    success: boolean;
+    type?: string;
+    headers?: string[];
+    rows?: string[][];
+    key_values?: Record<string, string>;
+    list_items?: string[];
+    summary?: string;
+    error?: string;
+  }> {
+    const formData = new FormData();
+    formData.append('file', file);
+    if (userHint) {
+      formData.append('user_hint', userHint);
+    }
+
+    const url = `${BACKEND_BASE_URL}/ai/analyze/word`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'POST',
+        body: formData,
+      });
+
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || 'Word AI 解析失败');
+      }
+
+      return await response.json();
+    } catch (error) {
+      console.error('Word AI 解析失败:', error);
+      throw error;
+    }
+  },
+
+  /**
+   * 使用 AI 解析 Word 文档并填写模板
+   * 一次性完成：AI解析 + 填表
+   */
+  async fillTemplateFromWordAI(
+    file: File,
+    templateFields: TemplateField[],
+    userHint: string = ''
+  ): Promise<FillResult> {
+    const formData = new FormData();
+    formData.append('file', file);
+    formData.append('template_fields', JSON.stringify(templateFields));
+    if (userHint) {
+      formData.append('user_hint', userHint);
+    }
+
+    const url = `${BACKEND_BASE_URL}/ai/analyze/word/fill-template`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'POST',
+        body: formData,
+      });
+
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || 'Word AI 填表失败');
+      }
+
+      return await response.json();
+    } catch (error) {
+      console.error('Word AI 填表失败:', error);
+      throw error;
+    }
+  },
+
+  // ==================== 智能指令 ====================
+
+  /**
+   * 识别自然语言指令的意图
+   */
+  async recognizeIntent(
+    instruction: string,
+    docIds?: string[]
+  ): Promise<{
+    success: boolean;
+    intent: string;
+    params: Record<string, any>;
+    message: string;
+  }> {
+    const url = `${BACKEND_BASE_URL}/instruction/recognize`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ instruction, doc_ids: docIds }),
+      });
+
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || '意图识别失败');
+      }
+
+      return await response.json();
+    } catch (error) {
+      console.error('意图识别失败:', error);
+      throw error;
+    }
+  },
+
+  /**
+   * 执行自然语言指令
+   */
+  async executeInstruction(
+    instruction: string,
+    docIds?: string[],
+    context?: Record<string, any>
+  ): Promise<{
+    success: boolean;
+    intent: string;
+    result: Record<string, any>;
+    message: string;
+  }> {
+    const url = `${BACKEND_BASE_URL}/instruction/execute`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ instruction, doc_ids: docIds, context }),
+      });
+
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || '指令执行失败');
+      }
+
+      return await response.json();
+    } catch (error) {
+      console.error('指令执行失败:', error);
+      throw error;
+    }
+  },
+
+  /**
+   * 智能对话（支持多轮对话的指令执行）
+   */
+  async instructionChat(
+    instruction: string,
+    docIds?: string[],
+    context?: Record<string, any>
+  ): Promise<{
+    success: boolean;
+    intent: string;
+    result: Record<string, any>;
+    message: string;
+    hint?: string;
+  }> {
+    const url = `${BACKEND_BASE_URL}/instruction/chat`;
+
+    try {
+      const response = await fetch(url, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ instruction, doc_ids: docIds, context }),
+      });
+
+      if (!response.ok) {
+        const error = await response.json();
+        throw new Error(error.detail || '对话处理失败');
+      }
+
+      return await response.json();
+    } catch (error) {
+      console.error('对话处理失败:', error);
+      throw error;
+    }
+  },
+
+  /**
+   * 获取支持的指令类型列表
+   */
+  async getSupportedIntents(): Promise<{
+    intents: Array<{
+      intent: string;
+      name: string;
+      examples: string[];
+      params: string[];
+    }>;
+  }> {
+    const url = `${BACKEND_BASE_URL}/instruction/intents`;
+
+    try {
+      const response = await fetch(url);
+      if (!response.ok) throw new Error('获取指令列表失败');
+      return await response.json();
+    } catch (error) {
+      console.error('获取指令列表失败:', error);
+      throw error;
+    }
+  },
 };
--- a/frontend/src/pages/Documents.tsx
+++ b/frontend/src/pages/Documents.tsx
@@ -1,4 +1,4 @@
-import React, { useState, useEffect, useCallback } from 'react';
+import React, { useState, useEffect, useCallback, useRef } from 'react';
 import { useDropzone } from 'react-dropzone';
 import {
  FileText,
@@ -23,7 +23,8 @@ import {
  List,
  MessageSquareCode,
  Tag,
-  HelpCircle
+  HelpCircle,
+  Plus
 } from 'lucide-react';
 import { Button } from '@/components/ui/button';
 import { Input } from '@/components/ui/input';
@@ -72,8 +73,10 @@ const Documents: React.FC = () => {
  // 上传相关状态
  const [uploading, setUploading] = useState(false);
  const [uploadedFile, setUploadedFile] = useState<File | null>(null);
+  const [uploadedFiles, setUploadedFiles] = useState<File[]>([]);
  const [parseResult, setParseResult] = useState<ExcelParseResult | null>(null);
  const [expandedSheet, setExpandedSheet] = useState<string | null>(null);
+  const [uploadExpanded, setUploadExpanded] = useState(false);

  // AI 分析相关状态
  const [analyzing, setAnalyzing] = useState(false);
@@ -210,75 +213,119 @@ const Documents: React.FC = () => {

  // 文件上传处理
  const onDrop = async (acceptedFiles: File[]) => {
-    const file = acceptedFiles[0];
-    if (!file) return;
+    if (acceptedFiles.length === 0) return;

-    setUploadedFile(file);
    setUploading(true);
-    setParseResult(null);
-    setAiAnalysis(null);
-    setAnalysisCharts(null);
-    setExpandedSheet(null);
-    setMdAnalysis(null);
-    setMdSections([]);
-    setMdStreamingContent('');
+    let successCount = 0;
+    let failCount = 0;
+    const successfulFiles: File[] = [];

-    const ext = file.name.split('.').pop()?.toLowerCase();
+    // 逐个上传文件
+    for (const file of acceptedFiles) {
+      const ext = file.name.split('.').pop()?.toLowerCase();

-    try {
-      // Excel 文件使用专门的上传接口
-      if (ext === 'xlsx' || ext === 'xls') {
-        const result = await backendApi.uploadExcel(file, {
-          parseAllSheets: parseOptions.parseAllSheets,
-          headerRow: parseOptions.headerRow
-        });
-        if (result.success) {
-          toast.success(`解析成功: ${file.name}`);
-          setParseResult(result);
-          loadDocuments(); // 刷新文档列表
-          if (result.metadata?.sheet_count === 1) {
-            setExpandedSheet(Object.keys(result.data?.sheets || {})[0] || null);
+      try {
+        if (ext === 'xlsx' || ext === 'xls') {
+          const result = await backendApi.uploadExcel(file, {
+            parseAllSheets: parseOptions.parseAllSheets,
+            headerRow: parseOptions.headerRow
+          });
+          if (result.success) {
+            successCount++;
+            successfulFiles.push(file);
+            // 第一个Excel文件设置解析结果供预览
+            if (successCount === 1) {
+              setUploadedFile(file);
+              setParseResult(result);
+              if (result.metadata?.sheet_count === 1) {
+                setExpandedSheet(Object.keys(result.data?.sheets || {})[0] || null);
+              }
+            }
+            loadDocuments();
+          } else {
+            failCount++;
+            toast.error(`${file.name}: ${result.error || '解析失败'}`);
+          }
+        } else if (ext === 'md' || ext === 'markdown') {
+          const result = await backendApi.uploadDocument(file);
+          if (result.task_id) {
+            successCount++;
+            successfulFiles.push(file);
+            if (successCount === 1) {
+              setUploadedFile(file);
+            }
+            // 轮询任务状态
+            let attempts = 0;
+            const checkStatus = async () => {
+              while (attempts < 30) {
+                try {
+                  const status = await backendApi.getTaskStatus(result.task_id);
+                  if (status.status === 'success') {
+                    loadDocuments();
+                    return;
+                  } else if (status.status === 'failure') {
+                    return;
+                  }
+                } catch (e) {
+                  console.error('检查状态失败', e);
+                }
+                await new Promise(resolve => setTimeout(resolve, 2000));
+                attempts++;
+              }
+            };
+            checkStatus();
+          } else {
+            failCount++;
          }
        } else {
-          toast.error(result.error || '解析失败');
-        }
-      } else if (ext === 'md' || ext === 'markdown') {
-        // Markdown 文件：获取大纲
-        await fetchMdOutline();
-      } else {
-        // 其他文档使用通用上传接口
-        const result = await backendApi.uploadDocument(file);
-        if (result.task_id) {
-          toast.success(`文件 ${file.name} 已提交处理`);
-          // 轮询任务状态
-          let attempts = 0;
-          const checkStatus = async () => {
-            while (attempts < 30) {
-              try {
-                const status = await backendApi.getTaskStatus(result.task_id);
-                if (status.status === 'success') {
-                  toast.success(`文件 ${file.name} 处理完成`);
-                  loadDocuments();
-                  return;
-                } else if (status.status === 'failure') {
-                  toast.error(`文件 ${file.name} 处理失败`);
-                  return;
-                }
-              } catch (e) {
-                console.error('检查状态失败', e);
-              }
-              await new Promise(resolve => setTimeout(resolve, 2000));
-              attempts++;
+          // 其他文档使用通用上传接口
+          const result = await backendApi.uploadDocument(file);
+          if (result.task_id) {
+            successCount++;
+            successfulFiles.push(file);
+            if (successCount === 1) {
+              setUploadedFile(file);
            }
-            toast.error(`文件 ${file.name} 处理超时`);
-          };
-          checkStatus();
+            // 轮询任务状态
+            let attempts = 0;
+            const checkStatus = async () => {
+              while (attempts < 30) {
+                try {
+                  const status = await backendApi.getTaskStatus(result.task_id);
+                  if (status.status === 'success') {
+                    loadDocuments();
+                    return;
+                  } else if (status.status === 'failure') {
+                    return;
+                  }
+                } catch (e) {
+                  console.error('检查状态失败', e);
+                }
+                await new Promise(resolve => setTimeout(resolve, 2000));
+                attempts++;
+              }
+            };
+            checkStatus();
+          } else {
+            failCount++;
+          }
        }
+      } catch (error: any) {
+        failCount++;
+        toast.error(`${file.name}: ${error.message || '上传失败'}`);
      }
-    } catch (error: any) {
-      toast.error(error.message || '上传失败');
-    } finally {
-      setUploading(false);
+    }
+
+    setUploading(false);
+    loadDocuments();
+
+    if (successCount > 0) {
+      toast.success(`成功上传 ${successCount} 个文件`);
+      setUploadedFiles(prev => [...prev, ...successfulFiles]);
+      setUploadExpanded(true);
+    }
+    if (failCount > 0) {
+      toast.error(`${failCount} 个文件上传失败`);
    }
  };

@@ -291,7 +338,7 @@ const Documents: React.FC = () => {
      'text/markdown': ['.md'],
      'text/plain': ['.txt']
    },
-    maxFiles: 1
+    multiple: true
  });

  // AI 分析处理
@@ -449,6 +496,7 @@ const Documents: React.FC = () => {

  const handleDeleteFile = () => {
    setUploadedFile(null);
+    setUploadedFiles([]);
    setParseResult(null);
    setAiAnalysis(null);
    setAnalysisCharts(null);
@@ -456,6 +504,17 @@ const Documents: React.FC = () => {
    toast.success('文件已清除');
  };

+  const handleRemoveUploadedFile = (index: number) => {
+    setUploadedFiles(prev => {
+      const newFiles = prev.filter((_, i) => i !== index);
+      if (newFiles.length === 0) {
+        setUploadedFile(null);
+      }
+      return newFiles;
+    });
+    toast.success('文件已从列表移除');
+  };
+
  const handleDelete = async (docId: string) => {
    try {
      const result = await backendApi.deleteDocument(docId);
@@ -615,7 +674,7 @@ const Documents: React.FC = () => {
          <h1 className="text-3xl font-extrabold tracking-tight">文档中心</h1>
          <p className="text-muted-foreground">上传文档，自动解析并使用 AI 进行深度分析</p>
        </div>
-        <Button variant="outline" className="rounded-xl gap-2" onClick={loadDocuments}>
+        <Button variant="outline" className="rounded-xl gap-2" onClick={() => loadDocuments()}>
          <RefreshCcw size={18} />
          <span>刷新</span>
        </Button>
@@ -640,7 +699,83 @@ const Documents: React.FC = () => {
            </CardHeader>
            {uploadPanelOpen && (
              <CardContent className="space-y-4">
-                {!uploadedFile ? (
+                {uploadedFiles.length > 0 || uploadedFile ? (
+                  <div className="space-y-3">
+                    {/* 文件列表头部 */}
+                    <div
+                      className="flex items-center justify-between p-3 bg-muted/50 rounded-xl cursor-pointer hover:bg-muted/70 transition-colors"
+                      onClick={() => setUploadExpanded(!uploadExpanded)}
+                    >
+                      <div className="flex items-center gap-3">
+                        <div className="w-10 h-10 rounded-lg bg-primary/10 text-primary flex items-center justify-center">
+                          <Upload size={20} />
+                        </div>
+                        <div>
+                          <p className="font-semibold text-sm">
+                            已上传 {(uploadedFiles.length > 0 ? uploadedFiles : [uploadedFile]).length} 个文件
+                          </p>
+                          <p className="text-xs text-muted-foreground">
+                            {uploadExpanded ? '点击收起' : '点击展开查看'}
+                          </p>
+                        </div>
+                      </div>
+                      <div className="flex items-center gap-2">
+                        <Button
+                          variant="ghost"
+                          size="sm"
+                          onClick={(e) => {
+                            e.stopPropagation();
+                            handleDeleteFile();
+                          }}
+                          className="text-destructive hover:text-destructive"
+                        >
+                          <Trash2 size={14} className="mr-1" />
+                          清空
+                        </Button>
+                        {uploadExpanded ? <ChevronUp size={16} /> : <ChevronDown size={16} />}
+                      </div>
+                    </div>
+
+                    {/* 展开的文件列表 */}
+                    {uploadExpanded && (
+                      <div className="space-y-2 border rounded-xl p-3">
+                        {(uploadedFiles.length > 0 ? uploadedFiles : [uploadedFile]).filter(Boolean).map((file, index) => (
+                          <div key={index} className="flex items-center gap-3 p-2 bg-background rounded-lg">
+                            <div className={cn(
+                              "w-8 h-8 rounded flex items-center justify-center",
+                              isExcelFile(file?.name || '') ? "bg-emerald-500/10 text-emerald-500" : "bg-blue-500/10 text-blue-500"
+                            )}>
+                              {isExcelFile(file?.name || '') ? <FileSpreadsheet size={16} /> : <FileText size={16} />}
+                            </div>
+                            <div className="flex-1 min-w-0">
+                              <p className="text-sm truncate">{file?.name}</p>
+                              <p className="text-xs text-muted-foreground">{formatFileSize(file?.size || 0)}</p>
+                            </div>
+                            <Button
+                              variant="ghost"
+                              size="icon"
+                              className="text-destructive hover:bg-destructive/10"
+                              onClick={() => handleRemoveUploadedFile(index)}
+                            >
+                              <Trash2 size={14} />
+                            </Button>
+                          </div>
+                        ))}
+
+                        {/* 继续添加按钮 */}
+                        <div
+                          {...getRootProps()}
+                          className="flex items-center justify-center gap-2 p-3 border-2 border-dashed rounded-lg cursor-pointer hover:border-primary/50 hover:bg-primary/5 transition-colors"
+                          onClick={(e) => e.stopPropagation()}
+                        >
+                          <input {...getInputProps()} multiple={true} />
+                          <Plus size={16} className="text-muted-foreground" />
+                          <span className="text-sm text-muted-foreground">继续添加更多文件</span>
+                        </div>
+                      </div>
+                    )}
+                  </div>
+                ) : (
                  <div
                    {...getRootProps()}
                    className={cn(
@@ -649,7 +784,7 @@ const Documents: React.FC = () => {
                      uploading && "opacity-50 pointer-events-none"
                    )}
                  >
-                    <input {...getInputProps()} />
+                    <input {...getInputProps()} multiple={true} />
                    <div className="w-14 h-14 rounded-xl bg-primary/10 text-primary flex items-center justify-center mb-4 group-hover:scale-110 transition-transform">
                      {uploading ? <Loader2 className="animate-spin" size={28} /> : <Upload size={28} />}
                    </div>
@@ -671,30 +806,6 @@ const Documents: React.FC = () => {
                      </Badge>
                    </div>
                  </div>
-                ) : (
-                  <div className="space-y-4">
-                    <div className="flex items-center gap-3 p-3 bg-muted/30 rounded-xl">
-                      <div className={cn(
-                        "w-10 h-10 rounded-lg flex items-center justify-center",
-                        isExcelFile(uploadedFile.name) ? "bg-emerald-500/10 text-emerald-500" : "bg-blue-500/10 text-blue-500"
-                      )}>
-                        {isExcelFile(uploadedFile.name) ? <FileSpreadsheet size={20} /> : <FileText size={20} />}
-                      </div>
-                      <div className="flex-1 min-w-0">
-                        <p className="font-semibold text-sm truncate">{uploadedFile.name}</p>
-                        <p className="text-xs text-muted-foreground">{formatFileSize(uploadedFile.size)}</p>
-                      </div>
-                      <Button variant="ghost" size="icon" className="text-destructive hover:bg-destructive/10" onClick={handleDeleteFile}>
-                        <Trash2 size={16} />
-                      </Button>
-                    </div>
-
-                    {isExcelFile(uploadedFile.name) && (
-                      <Button onClick={() => onDrop([uploadedFile])} className="w-full" disabled={uploading}>
-                        {uploading ? '解析中...' : '重新解析'}
-                      </Button>
-                    )}
-                  </div>
                )}
              </CardContent>
            )}
--- a/frontend/src/pages/ExcelParse.tsx
+++ b/frontend/src/pages/ExcelParse.tsx
--- a/frontend/src/pages/FormFill.tsx
+++ b/frontend/src/pages/FormFill.tsx
@@ -1,603 +0,0 @@
-import React, { useState, useEffect } from 'react';
-import {
-  TableProperties,
-  Plus,
-  FilePlus,
-  CheckCircle2,
-  Download,
-  Clock,
-  RefreshCcw,
-  Sparkles,
-  Zap,
-  FileCheck,
-  FileSpreadsheet,
-  Trash2,
-  ChevronDown,
-  ChevronUp,
-  BarChart3,
-  FileText,
-  TrendingUp,
-  Info,
-  AlertCircle,
-  Loader2
-} from 'lucide-react';
-import { Button } from '@/components/ui/button';
-import { Card, CardContent, CardHeader, CardTitle, CardDescription, CardFooter } from '@/components/ui/card';
-import { Badge } from '@/components/ui/badge';
-import { useAuth } from '@/context/AuthContext';
-import { templateApi, documentApi, taskApi } from '@/db/api';
-import { backendApi, aiApi } from '@/db/backend-api';
-import { supabase } from '@/db/supabase';
-import { format } from 'date-fns';
-import { toast } from 'sonner';
-import { cn } from '@/lib/utils';
-import { Skeleton } from '@/components/ui/skeleton';
-import {
-  Dialog,
-  DialogContent,
-  DialogHeader,
-  DialogTitle,
-  DialogTrigger,
-  DialogFooter,
-  DialogDescription
-} from '@/components/ui/dialog';
-import { Checkbox } from '@/components/ui/checkbox';
-import { ScrollArea } from '@/components/ui/scroll-area';
-import { Input } from '@/components/ui/input';
-import { Label } from '@/components/ui/label';
-import { Textarea } from '@/components/ui/textarea';
-import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select';
-import { useDropzone } from 'react-dropzone';
-import { Markdown } from '@/components/ui/markdown';
-
-type Template = any;
-type Document = any;
-type FillTask = any;
-
-const FormFill: React.FC = () => {
-  const { profile } = useAuth();
-  const [templates, setTemplates] = useState<Template[]>([]);
-  const [documents, setDocuments] = useState<Document[]>([]);
-  const [tasks, setTasks] = useState<any[]>([]);
-  const [loading, setLoading] = useState(true);
-
-  // Selection state
-  const [selectedTemplate, setSelectedTemplate] = useState<string | null>(null);
-  const [selectedDocs, setSelectedDocs] = useState<string[]>([]);
-  const [creating, setCreating] = useState(false);
-  const [openTaskDialog, setOpenTaskDialog] = useState(false);
-  const [viewingTask, setViewingTask] = useState<any | null>(null);
-
-  // Excel upload state
-  const [excelFile, setExcelFile] = useState<File | null>(null);
-  const [excelParseResult, setExcelParseResult] = useState<any>(null);
-  const [excelAnalysis, setExcelAnalysis] = useState<any>(null);
-  const [excelAnalyzing, setExcelAnalyzing] = useState(false);
-  const [expandedSheet, setExpandedSheet] = useState<string | null>(null);
-  const [aiOptions, setAiOptions] = useState({
-    userPrompt: '请分析这些数据，并提取关键信息用于填表，包括数值、分类、摘要等。',
-    analysisType: 'general' as 'general' | 'summary' | 'statistics' | 'insights'
-  });
-
-  const loadData = async () => {
-    if (!profile) return;
-    try {
-      const [t, d, ts] = await Promise.all([
-        templateApi.listTemplates((profile as any).id),
-        documentApi.listDocuments((profile as any).id),
-        taskApi.listTasks((profile as any).id)
-      ]);
-      setTemplates(t);
-      setDocuments(d);
-      setTasks(ts);
-    } catch (err: any) {
-      toast.error('数据加载失败');
-    } finally {
-      setLoading(false);
-    }
-  };
-
-  useEffect(() => {
-    loadData();
-  }, [profile]);
-
-  // Excel upload handlers
-  const onExcelDrop = async (acceptedFiles: File[]) => {
-    const file = acceptedFiles[0];
-    if (!file) return;
-
-    if (!file.name.match(/\.(xlsx|xls)$/i)) {
-      toast.error('仅支持 .xlsx 和 .xls 格式的 Excel 文件');
-      return;
-    }
-
-    setExcelFile(file);
-    setExcelParseResult(null);
-    setExcelAnalysis(null);
-    setExpandedSheet(null);
-
-    try {
-      const result = await backendApi.uploadExcel(file);
-      if (result.success) {
-        toast.success(`Excel 解析成功: ${file.name}`);
-        setExcelParseResult(result);
-      } else {
-        toast.error(result.error || '解析失败');
-      }
-    } catch (error: any) {
-      toast.error(error.message || '上传失败');
-    }
-  };
-
-  const { getRootProps, getInputProps, isDragActive } = useDropzone({
-    onDrop: onExcelDrop,
-    accept: {
-      'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx'],
-      'application/vnd.ms-excel': ['.xls']
-    },
-    maxFiles: 1
-  });
-
-  const handleAnalyzeExcel = async () => {
-    if (!excelFile || !excelParseResult?.success) {
-      toast.error('请先上传并解析 Excel 文件');
-      return;
-    }
-
-    setExcelAnalyzing(true);
-    setExcelAnalysis(null);
-
-    try {
-      const result = await aiApi.analyzeExcel(excelFile, {
-        userPrompt: aiOptions.userPrompt,
-        analysisType: aiOptions.analysisType
-      });
-
-      if (result.success) {
-        toast.success('AI 分析完成');
-        setExcelAnalysis(result);
-      } else {
-        toast.error(result.error || 'AI 分析失败');
-      }
-    } catch (error: any) {
-      toast.error(error.message || 'AI 分析失败');
-    } finally {
-      setExcelAnalyzing(false);
-    }
-  };
-
-  const handleUseExcelData = () => {
-    if (!excelParseResult?.success) {
-      toast.error('请先解析 Excel 文件');
-      return;
-    }
-
-    // 将 Excel 解析的数据标记为"文档"，添加到选择列表
-    toast.success('Excel 数据已添加到数据源，请在任务对话框中选择');
-    // 这里可以添加逻辑来将 Excel 数据传递给后端创建任务
-  };
-
-  const handleDeleteExcel = () => {
-    setExcelFile(null);
-    setExcelParseResult(null);
-    setExcelAnalysis(null);
-    setExpandedSheet(null);
-    toast.success('Excel 文件已清除');
-  };
-
-  const handleUploadTemplate = async (e: React.ChangeEvent<HTMLInputElement>) => {
-    const file = e.target.files?.[0];
-    if (!file || !profile) return;
-
-    try {
-      toast.loading('正在上传模板...');
-      await templateApi.uploadTemplate(file, (profile as any).id);
-      toast.dismiss();
-      toast.success('模板上传成功');
-      loadData();
-    } catch (err) {
-      toast.dismiss();
-      toast.error('上传模板失败');
-    }
-  };
-
-  const handleCreateTask = async () => {
-    if (!profile || !selectedTemplate || selectedDocs.length === 0) {
-      toast.error('请先选择模板和数据源文档');
-      return;
-    }
-
-    setCreating(true);
-    try {
-      const task = await taskApi.createTask((profile as any).id, selectedTemplate, selectedDocs);
-      if (task) {
-        toast.success('任务已创建，正在进行智能填表...');
-        setOpenTaskDialog(false);
-
-        // Invoke edge function
-        supabase.functions.invoke('fill-template', {
-          body: { taskId: task.id }
-        }).then(({ error }) => {
-          if (error) toast.error('填表任务执行失败');
-          else {
-            toast.success('表格填写完成！');
-            loadData();
-          }
-        });
-        loadData();
-      }
-    } catch (err: any) {
-      toast.error('创建任务失败');
-    } finally {
-      setCreating(false);
-    }
-  };
-
-  const getStatusColor = (status: string) => {
-    switch (status) {
-      case 'completed': return 'bg-emerald-500 text-white';
-      case 'failed': return 'bg-destructive text-white';
-      default: return 'bg-amber-500 text-white';
-    }
-  };
-
-  const formatFileSize = (bytes: number): string => {
-    if (bytes === 0) return '0 B';
-    const k = 1024;
-    const sizes = ['B', 'KB', 'MB', 'GB'];
-    const i = Math.floor(Math.log(bytes) / Math.log(k));
-    return `${(bytes / Math.pow(k, i)).toFixed(2)} ${sizes[i]}`;
-  };
-
-  return (
-    <div className="space-y-8 animate-fade-in pb-10">
-      <section className="flex flex-col md:flex-row md:items-center justify-between gap-4">
-        <div className="space-y-1">
-          <h1 className="text-3xl font-extrabold tracking-tight">智能填表</h1>
-          <p className="text-muted-foreground">根据您的表格模板，自动聚合多源文档信息进行精准填充，告别重复劳动。</p>
-        </div>
-        <div className="flex items-center gap-3">
-          <Dialog open={openTaskDialog} onOpenChange={setOpenTaskDialog}>
-            <DialogTrigger asChild>
-              <Button className="rounded-xl shadow-lg shadow-primary/20 gap-2 h-11 px-6">
-                <FilePlus size={18} />
-                <span>新建填表任务</span>
-              </Button>
-            </DialogTrigger>
-            <DialogContent className="max-w-4xl max-h-[90vh] flex flex-col p-0 overflow-hidden border-none shadow-2xl rounded-3xl">
-              <DialogHeader className="p-8 pb-4 bg-muted/50">
-                <DialogTitle className="text-2xl font-bold flex items-center gap-2">
-                  <Sparkles size={24} className="text-primary" />
-                  开启智能填表之旅
-                </DialogTitle>
-                <DialogDescription>
-                  选择一个表格模板及若干个数据源文档，AI 将自动为您分析并填写。
-                </DialogDescription>
-              </DialogHeader>
-
-              <ScrollArea className="flex-1 p-8 pt-4">
-                <div className="space-y-8">
-                  {/* Step 1: Select Template */}
-                  <div className="space-y-4">
-                    <div className="flex items-center justify-between">
-                      <h4 className="font-bold flex items-center gap-2 text-primary uppercase tracking-widest text-xs">
-                        <span className="w-5 h-5 rounded-full bg-primary text-white flex items-center justify-center text-[10px]">1</span>
-                        选择表格模板
-                      </h4>
-                      <label className="cursor-pointer text-xs font-semibold text-primary hover:underline flex items-center gap-1">
-                        <Plus size={12} /> 上传新模板
-                        <input type="file" className="hidden" onChange={handleUploadTemplate} accept=".docx,.xlsx" />
-                      </label>
-                    </div>
-                    {templates.length > 0 ? (
-                      <div className="grid grid-cols-1 sm:grid-cols-2 gap-3">
-                        {templates.map(t => (
-                          <div
-                            key={t.id}
-                            className={cn(
-                              "p-4 rounded-2xl border-2 transition-all cursor-pointer flex items-center gap-3 group relative overflow-hidden",
-                              selectedTemplate === t.id ? "border-primary bg-primary/5" : "border-border hover:border-primary/50"
-                            )}
-                            onClick={() => setSelectedTemplate(t.id)}
-                          >
-                            <div className={cn(
-                              "w-10 h-10 rounded-xl flex items-center justify-center shrink-0 transition-colors",
-                              selectedTemplate === t.id ? "bg-primary text-white" : "bg-muted text-muted-foreground"
-                            )}>
-                              <TableProperties size={20} />
-                            </div>
-                            <div className="flex-1 min-w-0">
-                              <p className="font-bold text-sm truncate">{t.name}</p>
-                              <p className="text-[10px] text-muted-foreground uppercase">{t.type}</p>
-                            </div>
-                            {selectedTemplate === t.id && (
-                              <div className="absolute top-0 right-0 w-8 h-8 bg-primary text-white flex items-center justify-center rounded-bl-xl">
-                                <CheckCircle2 size={14} />
-                              </div>
-                            )}
-                          </div>
-                        ))}
-                      </div>
-                    ) : (
-                      <div className="p-8 text-center bg-muted/30 rounded-2xl border border-dashed text-sm italic text-muted-foreground">
-                        暂无模板，请先点击右上角上传。
-                      </div>
-                    )}
-                  </div>
-
-                  {/* Step 2: Upload & Analyze Excel */}
-                  <div className="space-y-4">
-                    <h4 className="font-bold flex items-center gap-2 text-primary uppercase tracking-widest text-xs">
-                      <span className="w-5 h-5 rounded-full bg-primary text-white flex items-center justify-center text-[10px]">1.5</span>
-                      Excel 数据源
-                    </h4>
-                    <div className="bg-muted/20 rounded-2xl p-6">
-                      {!excelFile ? (
-                        <div
-                          {...getRootProps()}
-                          className={cn(
-                            "border-2 border-dashed rounded-xl p-8 transition-all duration-300 flex flex-col items-center justify-center text-center cursor-pointer group",
-                            isDragActive ? "border-primary bg-primary/5" : "border-muted-foreground/20 hover:border-primary/50 hover:bg-muted/30"
-                          )}
-                        >
-                          <input {...getInputProps()} />
-                          <div className="w-12 h-12 rounded-xl bg-primary/10 text-primary flex items-center justify-center mb-3 group-hover:scale-110 transition-transform">
-                            <FileSpreadsheet size={24} />
-                          </div>
-                          <p className="font-semibold text-sm">
-                            {isDragActive ? '释放以开始上传' : '点击或拖拽 Excel 文件'}
-                          </p>
-                          <p className="text-xs text-muted-foreground mt-1">支持 .xlsx 和 .xls 格式</p>
-                        </div>
-                      ) : (
-                        <div className="space-y-4">
-                          <div className="flex items-center gap-3 p-3 bg-background rounded-xl">
-                            <div className="w-10 h-10 rounded-lg bg-emerald-500/10 text-emerald-500 flex items-center justify-center">
-                              <FileSpreadsheet size={20} />
-                            </div>
-                            <div className="flex-1 min-w-0">
-                              <p className="font-semibold text-sm truncate">{excelFile.name}</p>
-                              <p className="text-xs text-muted-foreground">{formatFileSize(excelFile.size)}</p>
-                            </div>
-                            <div className="flex gap-2">
-                              <Button
-                                variant="ghost"
-                                size="icon"
-                                className="text-destructive hover:bg-destructive/10"
-                                onClick={handleDeleteExcel}
-                              >
-                                <Trash2 size={16} />
-                              </Button>
-                            </div>
-                          </div>
-
-                          {/* AI Analysis Options */}
-                          {excelParseResult?.success && (
-                            <div className="space-y-3">
-                              <div className="space-y-2">
-                                <Label htmlFor="analysis-type" className="text-xs">分析类型</Label>
-                                <Select
-                                  value={aiOptions.analysisType}
-                                  onValueChange={(value: any) => setAiOptions({ ...aiOptions, analysisType: value })}
-                                >
-                                  <SelectTrigger id="analysis-type" className="bg-background h-9 text-sm">
-                                    <SelectValue placeholder="选择分析类型" />
-                                  </SelectTrigger>
-                                  <SelectContent>
-                                    <SelectItem value="general">综合分析</SelectItem>
-                                    <SelectItem value="summary">数据摘要</SelectItem>
-                                    <SelectItem value="statistics">统计分析</SelectItem>
-                                    <SelectItem value="insights">深度洞察</SelectItem>
-                                  </SelectContent>
-                                </Select>
-                              </div>
-                              <div className="space-y-2">
-                                <Label htmlFor="user-prompt" className="text-xs">自定义提示词</Label>
-                                <Textarea
-                                  id="user-prompt"
-                                  value={aiOptions.userPrompt}
-                                  onChange={(e) => setAiOptions({ ...aiOptions, userPrompt: e.target.value })}
-                                  className="bg-background resize-none text-sm"
-                                  rows={2}
-                                />
-                              </div>
-                              <Button
-                                onClick={handleAnalyzeExcel}
-                                disabled={excelAnalyzing}
-                                className="w-full gap-2 h-9"
-                                variant="outline"
-                              >
-                                {excelAnalyzing ? <Loader2 className="animate-spin" size={14} /> : <Sparkles size={14} />}
-                                {excelAnalyzing ? '分析中...' : 'AI 分析'}
-                              </Button>
-                              {excelParseResult?.success && (
-                                <Button
-                                  onClick={handleUseExcelData}
-                                  className="w-full gap-2 h-9"
-                                >
-                                  <CheckCircle2 size={14} />
-                                  使用此数据源
-                                </Button>
-                              )}
-                            </div>
-                          )}
-
-                          {/* Excel Analysis Result */}
-                          {excelAnalysis && (
-                            <div className="mt-4 p-4 bg-background rounded-xl max-h-60 overflow-y-auto">
-                              <div className="flex items-center gap-2 mb-3">
-                                <Sparkles size={16} className="text-primary" />
-                                <span className="font-semibold text-sm">AI 分析结果</span>
-                              </div>
-                              <Markdown content={excelAnalysis.analysis?.analysis || ''} className="text-sm" />
-                            </div>
-                          )}
-                        </div>
-                      )}
-                    </div>
-                  </div>
-
-                  {/* Step 3: Select Documents */}
-                  <div className="space-y-4">
-                    <h4 className="font-bold flex items-center gap-2 text-primary uppercase tracking-widest text-xs">
-                      <span className="w-5 h-5 rounded-full bg-primary text-white flex items-center justify-center text-[10px]">2</span>
-                      选择其他数据源文档
-                    </h4>
-                    {documents.filter(d => d.status === 'completed').length > 0 ? (
-                      <div className="space-y-2 max-h-40 overflow-y-auto pr-2 custom-scrollbar">
-                        {documents.filter(d => d.status === 'completed').map(doc => (
-                          <div
-                            key={doc.id}
-                            className={cn(
-                              "flex items-center gap-3 p-3 rounded-xl border transition-all cursor-pointer",
-                              selectedDocs.includes(doc.id) ? "border-primary/50 bg-primary/5 shadow-sm" : "border-border hover:bg-muted/30"
-                            )}
-                            onClick={() => {
-                              setSelectedDocs(prev =>
-                                prev.includes(doc.id) ? prev.filter(id => id !== doc.id) : [...prev, doc.id]
-                              );
-                            }}
-                          >
-                            <Checkbox checked={selectedDocs.includes(doc.id)} onCheckedChange={() => {}} />
-                            <div className="w-8 h-8 rounded-lg bg-blue-500/10 text-blue-500 flex items-center justify-center">
-                              <Zap size={16} />
-                            </div>
-                            <span className="font-semibold text-sm truncate">{doc.name}</span>
-                          </div>
-                        ))}
-                      </div>
-                    ) : (
-                      <div className="p-6 text-center bg-muted/30 rounded-xl border border-dashed text-xs italic text-muted-foreground">
-                        暂无其他已解析的文档
-                      </div>
-                    )}
-                  </div>
-                </div>
-              </ScrollArea>
-
-              <DialogFooter className="p-8 pt-4 bg-muted/20 border-t border-dashed">
-                <Button variant="outline" className="rounded-xl h-12 px-6" onClick={() => setOpenTaskDialog(false)}>取消</Button>
-                <Button
-                  className="rounded-xl h-12 px-8 shadow-lg shadow-primary/20 gap-2"
-                  onClick={handleCreateTask}
-                  disabled={creating || !selectedTemplate || (selectedDocs.length === 0 && !excelParseResult?.success)}
-                >
-                  {creating ? <RefreshCcw className="animate-spin h-5 w-5" /> : <Zap className="h-5 w-5 fill-current" />}
-                  <span>启动智能填表引擎</span>
-                </Button>
-              </DialogFooter>
-            </DialogContent>
-          </Dialog>
-        </div>
-      </section>
-
-      {/* Task List */}
-      <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-6">
-        {loading ? (
-          Array.from({ length: 3 }).map((_, i) => (
-            <Skeleton key={i} className="h-48 w-full rounded-3xl bg-muted" />
-          ))
-        ) : tasks.length > 0 ? (
-          tasks.map((task) => (
-            <Card key={task.id} className="border-none shadow-md hover:shadow-xl transition-all group rounded-3xl overflow-hidden flex flex-col">
-              <div className="h-1.5 w-full" style={{ backgroundColor: task.status === 'completed' ? '#10b981' : task.status === 'failed' ? '#ef4444' : '#f59e0b' }} />
-              <CardHeader className="p-6 pb-2">
-                <div className="flex justify-between items-start mb-2">
-                  <div className="w-12 h-12 rounded-2xl bg-emerald-500/10 text-emerald-500 flex items-center justify-center shadow-inner group-hover:scale-110 transition-transform">
-                    <TableProperties size={24} />
-                  </div>
-                  <Badge className={cn("text-[10px] uppercase font-bold tracking-widest", getStatusColor(task.status))}>
-                    {task.status === 'completed' ? '已完成' : task.status === 'failed' ? '失败' : '执行中'}
-                  </Badge>
-                </div>
-                <CardTitle className="text-lg font-bold truncate group-hover:text-primary transition-colors">{task.templates?.name || '未知模板'}</CardTitle>
-                <CardDescription className="text-xs flex items-center gap-1 font-medium italic">
-                  <Clock size={12} /> {format(new Date(task.created_at!), 'yyyy/MM/dd HH:mm')}
-                </CardDescription>
-              </CardHeader>
-              <CardContent className="p-6 pt-2 flex-1">
-                <div className="space-y-4">
-                  <div className="flex flex-wrap gap-2">
-                    <Badge variant="outline" className="bg-muted/50 border-none text-[10px] font-bold">关联 {task.document_ids?.length} 份数据源</Badge>
-                  </div>
-                  {task.status === 'completed' && (
-                    <div className="p-3 bg-emerald-500/5 rounded-2xl border border-emerald-500/10 flex items-center gap-3">
-                      <CheckCircle2 className="text-emerald-500" size={18} />
-                      <span className="text-xs font-semibold text-emerald-700">内容已精准聚合，表格生成完毕</span>
-                    </div>
-                  )}
-                </div>
-              </CardContent>
-              <CardFooter className="p-6 pt-0">
-                <Button
-                  className="w-full rounded-2xl h-11 bg-primary group-hover:shadow-lg group-hover:shadow-primary/30 transition-all gap-2"
-                  disabled={task.status !== 'completed'}
-                  onClick={() => setViewingTask(task)}
-                >
-                  <Download size={18} />
-                  <span>下载汇总表格</span>
-                </Button>
-              </CardFooter>
-            </Card>
-          ))
-        ) : (
-          <div className="col-span-full py-24 flex flex-col items-center justify-center text-center space-y-6">
-            <div className="w-24 h-24 rounded-full bg-muted flex items-center justify-center text-muted-foreground/30 border-4 border-dashed">
-              <TableProperties size={48} />
-            </div>
-            <div className="space-y-2 max-w-sm">
-              <p className="text-2xl font-extrabold tracking-tight">暂无生成任务</p>
-              <p className="text-muted-foreground text-sm">上传模板后，您可以将多个文档的数据自动填充到汇总表格中。</p>
-            </div>
-            <Button className="rounded-xl h-12 px-8" onClick={() => setOpenTaskDialog(true)}>立即创建首个任务</Button>
-          </div>
-        )}
-      </div>
-
-      {/* Task Result View Modal */}
-      <Dialog open={!!viewingTask} onOpenChange={(open) => !open && setViewingTask(null)}>
-        <DialogContent className="max-w-4xl max-h-[90vh] flex flex-col p-0 overflow-hidden border-none shadow-2xl rounded-3xl">
-          <DialogHeader className="p-8 pb-4 bg-primary text-primary-foreground">
-            <div className="flex items-center gap-3 mb-2">
-              <FileCheck size={28} />
-              <DialogTitle className="text-2xl font-extrabold">表格生成结果预览</DialogTitle>
-            </div>
-            <DialogDescription className="text-primary-foreground/80 italic">
-              系统已根据 {viewingTask?.document_ids?.length} 份文档信息自动填充完毕。
-            </DialogDescription>
-          </DialogHeader>
-          <ScrollArea className="flex-1 p-8 bg-muted/10">
-            <div className="prose dark:prose-invert max-w-none">
-              <div className="bg-card p-8 rounded-2xl shadow-sm border min-h-[400px]">
-                <Badge variant="outline" className="mb-4">数据已脱敏</Badge>
-                <div className="whitespace-pre-wrap font-sans text-sm leading-relaxed">
-                  <h2 className="text-xl font-bold mb-4">汇总结果报告</h2>
-                  <p className="text-muted-foreground mb-6">以下是根据您上传的多个文档提取并生成的汇总信息：</p>
-
-                  <div className="p-4 bg-muted/30 rounded-xl border border-dashed border-primary/20 italic">
-                    正在从云端安全下载解析结果并渲染渲染视图...
-                  </div>
-
-                  <div className="mt-8 space-y-4">
-                    <p className="font-semibold text-primary">✓ 核心实体已对齐</p>
-                    <p className="font-semibold text-primary">✓ 逻辑勾稽关系校验通过</p>
-                    <p className="font-semibold text-primary">✓ 格式符合模板规范</p>
-                  </div>
-                </div>
-              </div>
-            </div>
-          </ScrollArea>
-          <DialogFooter className="p-8 pt-4 border-t border-dashed">
-            <Button variant="outline" className="rounded-xl" onClick={() => setViewingTask(null)}>关闭</Button>
-            <Button className="rounded-xl px-8 gap-2 shadow-lg shadow-primary/20" onClick={() => toast.success("正在导出文件...")}>
-              <Download size={18} />
-              导出为 {viewingTask?.templates?.type?.toUpperCase() || '文件'}
-            </Button>
-          </DialogFooter>
-        </DialogContent>
-      </Dialog>
-    </div>
-  );
-};
-
-export default FormFill;
--- a/frontend/src/pages/InstructionChat.tsx
+++ b/frontend/src/pages/InstructionChat.tsx
@@ -10,7 +10,11 @@ import {
  TableProperties,
  ChevronRight,
  ArrowRight,
-  Loader2
+  Loader2,
+  Download,
+  Search,
+  MessageSquare,
+  CheckCircle
 } from 'lucide-react';
 import { Button } from '@/components/ui/button';
 import { Input } from '@/components/ui/input';
@@ -26,12 +30,15 @@ type ChatMessage = {
  role: 'user' | 'assistant';
  content: string;
  created_at: string;
+  intent?: string;
+  result?: any;
 };

 const InstructionChat: React.FC = () => {
  const [messages, setMessages] = useState<ChatMessage[]>([]);
  const [input, setInput] = useState('');
  const [loading, setLoading] = useState(false);
+  const [currentDocIds, setCurrentDocIds] = useState<string[]>([]);
  const scrollAreaRef = useRef<HTMLDivElement>(null);

  useEffect(() => {
@@ -43,27 +50,47 @@ const InstructionChat: React.FC = () => {
          role: 'assistant',
          content: `您好！我是智联文档 AI 助手。

-我可以帮您完成以下操作：
+**📄 文档智能操作**
+- "提取文档中的医院数量和床位数"
+- "帮我找出所有机构的名称"

-📄 **文档管理**
- "帮我列出最近上传的所有文档"
- "删除三天前的 docx 文档"
+**📊 数据填表**
+- "根据这些数据填表"
+- "将提取的信息填写到Excel模板"

-📊 **Excel 分析**
- "分析一下最近上传的 Excel 文件"
- "帮我统计销售报表中的数据"
+**📝 内容处理**
+- "总结一下这份文档"
+- "对比这两个文档的差异"

-📝 **智能填表**
- "根据员工信息表创建一个考勤汇总表"
- "用财务文档填充报销模板"
+**🔍 智能问答**
+- "文档里说了些什么？"
+- "有多少家医院？"

 请告诉我您想做什么？`,
          created_at: new Date().toISOString()
        }
      ]);
+
+      // 获取已上传的文档ID列表
+      loadDocuments();
    }
  }, []);

+  const loadDocuments = async () => {
+    try {
+      const result = await backendApi.getDocuments(undefined, 50);
+      if (result.success && result.documents) {
+        const docIds = result.documents.map((d: any) => d.doc_id);
+        setCurrentDocIds(docIds);
+        if (docIds.length > 0) {
+          console.log(`已加载 ${docIds.length} 个文档`);
+        }
+      }
+    } catch (err) {
+      console.error('获取文档列表失败:', err);
+    }
+  };
+
  useEffect(() => {
    // Scroll to bottom
    if (scrollAreaRef.current) {
@@ -89,95 +116,126 @@ const InstructionChat: React.FC = () => {
    setLoading(true);

    try {
-      // TODO: 后端对话接口，暂用模拟响应
-      await new Promise(resolve => setTimeout(resolve, 1500));
+      // 使用真实的智能指令 API
+      const response = await backendApi.instructionChat(
+        input.trim(),
+        currentDocIds.length > 0 ? currentDocIds : undefined
+      );

-      // 简单的命令解析演示
-      const userInput = userMessage.content.toLowerCase();
-      let response = '';
+      // 根据意图类型生成友好响应
+      let responseContent = '';
+      const resultData = response.result;

-      if (userInput.includes('列出') || userInput.includes('列表')) {
-        const result = await backendApi.getDocuments(undefined, 10);
-        if (result.success && result.documents && result.documents.length > 0) {
-          response = `已为您找到 ${result.documents.length} 个文档：\n\n`;
-          result.documents.slice(0, 5).forEach((doc: any, idx: number) => {
-            response += `${idx + 1}. **${doc.original_filename}** (${doc.doc_type.toUpperCase()})\n`;
-            response += `   - 大小: ${(doc.file_size / 1024).toFixed(1)} KB\n`;
-            response += `   - 时间: ${new Date(doc.created_at).toLocaleDateString()}\n\n`;
-          });
-          if (result.documents.length > 5) {
-            response += `...还有 ${result.documents.length - 5} 个文档`;
+      switch (response.intent) {
+        case 'extract':
+          // 信息提取结果
+          const extracted = resultData?.extracted_data || {};
+          const keys = Object.keys(extracted);
+          if (keys.length > 0) {
+            responseContent = `✅ 已提取到 ${keys.length} 个字段的数据：\n\n`;
+            for (const [key, value] of Object.entries(extracted)) {
+              const values = Array.isArray(value) ? value : [value];
+              responseContent += `**${key}**: ${values.slice(0, 3).join(', ')}${values.length > 3 ? '...' : ''}\n`;
+            }
+            responseContent += `\n💡 您可以将这些数据填入表格。`;
+          } else {
+            responseContent = '未能从文档中提取到相关数据。请尝试更明确的字段名称。';
          }
-        } else {
-          response = '暂未找到已上传的文档，您可以先上传一些文档试试。';
-        }
-      } else if (userInput.includes('分析') || userInput.includes('excel') || userInput.includes('报表')) {
-        response = `好的，我可以帮您分析 Excel 文件。
+          break;

-请告诉我：
-1. 您想分析哪个 Excel 文件？
-2. 需要什么样的分析？（数据摘要/统计分析/图表生成）
+        case 'fill_table':
+          // 填表结果
+          const filled = resultData?.result?.filled_data || {};
+          const filledKeys = Object.keys(filled);
+          if (filledKeys.length > 0) {
+            responseContent = `✅ 填表完成！成功填写 ${filledKeys.length} 个字段：\n\n`;
+            for (const [key, value] of Object.entries(filled)) {
+              const values = Array.isArray(value) ? value : [value];
+              responseContent += `**${key}**: ${values.slice(0, 3).join(', ')}\n`;
+            }
+            responseContent += `\n📋 请到【智能填表】页面查看或导出结果。`;
+          } else {
+            responseContent = '填表未能提取到数据。请检查模板表头和数据源内容。';
+          }
+          break;

-或者您可以直接告诉我您想从数据中了解什么，我来为您生成分析。`;
-      } else if (userInput.includes('填表') || userInput.includes('模板')) {
-        response = `好的，要进行智能填表，我需要：
+        case 'summarize':
+          // 摘要结果
+          const summaries = resultData?.summaries || [];
+          if (summaries.length > 0) {
+            responseContent = `📄 找到 ${summaries.length} 个文档的摘要：\n\n`;
+            summaries.forEach((s: any, idx: number) => {
+              responseContent += `**${idx + 1}. ${s.filename}**\n${s.content_preview}\n\n`;
+            });
+          } else {
+            responseContent = '未能生成摘要。请确保已上传文档。';
+          }
+          break;

-1. **上传表格模板** - 您要填写的表格模板文件（Excel 或 Word 格式）
-2. **选择数据源** - 包含要填写内容的源文档
+        case 'question':
+          // 问答结果
+          if (resultData?.answer) {
+            responseContent = `**问题**: ${resultData.question}\n\n**答案**: ${resultData.answer}`;
+          } else {
+            responseContent = resultData?.message || '我找到了相关信息，请查看上文。';
+          }
+          break;

-您可以去【智能填表】页面完成这些操作，或者告诉我您具体想填什么类型的表格，我来指导您操作。`;
-      } else if (userInput.includes('删除')) {
-        response = `要删除文档，请告诉我：
+        case 'search':
+          // 搜索结果
+          const searchResults = resultData?.results || [];
+          if (searchResults.length > 0) {
+            responseContent = `🔍 找到 ${searchResults.length} 条相关内容：\n\n`;
+            searchResults.slice(0, 5).forEach((r: any, idx: number) => {
+              responseContent += `**${idx + 1}.** ${r.content?.substring(0, 100)}...\n\n`;
+            });
+          } else {
+            responseContent = '未找到相关内容。请尝试其他关键词。';
+          }
+          break;

- 要删除的文件名是什么？
- 或者您可以到【文档中心】页面手动选择并删除文档
+        case 'compare':
+          // 对比结果
+          const comparison = resultData?.comparison || [];
+          if (comparison.length > 0) {
+            responseContent = `📊 对比了 ${comparison.length} 个文档：\n\n`;
+            comparison.forEach((c: any) => {
+              responseContent += `- **${c.filename}**: ${c.doc_type}, ${c.content_length} 字\n`;
+            });
+          } else {
+            responseContent = '需要至少2个文档才能进行对比。';
+          }
+          break;

-⚠️ 删除操作不可恢复，请确认后再操作。`;
-      } else if (userInput.includes('帮助') || userInput.includes('help')) {
-        response = `**我可以帮您完成以下操作：**
+        case 'unknown':
+          responseContent = `我理解您想要： "${input.trim()}"\n\n但我目前无法完成此操作。您可以尝试：\n\n1. **提取数据**: "提取医院数量和床位数"\n2. **填表**: "根据这些数据填表"\n3. **总结**: "总结这份文档"\n4. **问答**: "文档里说了什么？"\n5. **搜索**: "搜索相关内容"`;
+          break;

-📄 **文档管理**
- 列出/搜索已上传的文档
- 查看文档详情和元数据
- 删除不需要的文档
-
-📊 **Excel 处理**
- 分析 Excel 文件内容
- 生成数据统计和图表
- 导出处理后的数据
-
-📝 **智能填表**
- 上传表格模板
- 从文档中提取信息填入模板
- 导出填写完成的表格
-
-📋 **任务历史**
- 查看历史处理任务
- 重新执行或导出结果
-
-请直接告诉我您想做什么！`;
-      } else {
-        response = `我理解您想要： "${input.trim()}"
-
-目前我还在学习如何更好地理解您的需求。您可以尝试：
-
-1. **上传文档** - 去【文档中心】上传 docx/md/txt 文件
-2. **分析 Excel** - 去【Excel解析】上传并分析 Excel 文件
-3. **智能填表** - 去【智能填表】创建填表任务
-
-或者您可以更具体地描述您想做的事情，我会尽力帮助您！`;
+        default:
+          responseContent = response.message || resultData?.message || '已完成您的请求。';
      }

      const assistantMessage: ChatMessage = {
        id: Math.random().toString(36).substring(7),
        role: 'assistant',
-        content: response,
-        created_at: new Date().toISOString()
+        content: responseContent,
+        created_at: new Date().toISOString(),
+        intent: response.intent,
+        result: resultData
      };

      setMessages(prev => [...prev, assistantMessage]);
    } catch (err: any) {
-      toast.error('请求失败，请重试');
+      console.error('指令执行失败:', err);
+      toast.error(err.message || '请求失败，请重试');
+
+      const errorMessage: ChatMessage = {
+        id: Math.random().toString(36).substring(7),
+        role: 'assistant',
+        content: `抱歉，处理您的请求时遇到了问题：${err.message}\n\n请稍后重试，或尝试更简单的指令。`,
+        created_at: new Date().toISOString()
+      };
+      setMessages(prev => [...prev, errorMessage]);
    } finally {
      setLoading(false);
    }
@@ -189,10 +247,10 @@ const InstructionChat: React.FC = () => {
  };

  const quickActions = [
-    { label: '列出所有文档', icon: FileText, action: () => setInput('列出所有已上传的文档') },
-    { label: '分析 Excel 数据', icon: TableProperties, action: () => setInput('分析一下 Excel 文件') },
-    { label: '智能填表', icon: Sparkles, action: () => setInput('我想进行智能填表') },
-    { label: '帮助', icon: Sparkles, action: () => setInput('帮助') }
+    { label: '提取医院数量', icon: Search, action: () => setInput('提取文档中的医院数量和床位数') },
+    { label: '智能填表', icon: TableProperties, action: () => setInput('根据这些数据填表') },
+    { label: '总结文档', icon: MessageSquare, action: () => setInput('总结一下这份文档') },
+    { label: '智能问答', icon: Bot, action: () => setInput('文档里说了些什么？') }
  ];

  return (
--- a/frontend/src/pages/Login.tsx
+++ b/frontend/src/pages/Login.tsx
@@ -1,184 +0,0 @@
-import React, { useState } from 'react';
-import { useNavigate, useLocation } from 'react-router-dom';
-import { useAuth } from '@/context/AuthContext';
-import { Button } from '@/components/ui/button';
-import { Input } from '@/components/ui/input';
-import { Label } from '@/components/ui/label';
-import { Card, CardContent, CardDescription, CardFooter, CardHeader, CardTitle } from '@/components/ui/card';
-import { Tabs, TabsContent, TabsList, TabsTrigger } from '@/components/ui/tabs';
-import { FileText, Lock, User, CheckCircle2, AlertCircle } from 'lucide-react';
-import { toast } from 'sonner';
-
-const Login: React.FC = () => {
-  const [username, setUsername] = useState('');
-  const [password, setPassword] = useState('');
-  const [loading, setLoading] = useState(false);
-  const { signIn, signUp } = useAuth();
-  const navigate = useNavigate();
-  const location = useLocation();
-
-  const handleLogin = async (e: React.FormEvent) => {
-    e.preventDefault();
-    if (!username || !password) return toast.error('请输入用户名和密码');
-    
-    setLoading(true);
-    try {
-      const email = `${username}@miaoda.com`;
-      const { error } = await signIn(email, password);
-      if (error) throw error;
-      toast.success('登录成功');
-      navigate('/');
-    } catch (err: any) {
-      toast.error(err.message || '登录失败');
-    } finally {
-      setLoading(false);
-    }
-  };
-
-  const handleSignUp = async (e: React.FormEvent) => {
-    e.preventDefault();
-    if (!username || !password) return toast.error('请输入用户名和密码');
-    
-    setLoading(true);
-    try {
-      const email = `${username}@miaoda.com`;
-      const { error } = await signUp(email, password);
-      if (error) throw error;
-      toast.success('注册成功，请登录');
-    } catch (err: any) {
-      toast.error(err.message || '注册失败');
-    } finally {
-      setLoading(false);
-    }
-  };
-
-  return (
-    <div className="min-h-screen flex items-center justify-center bg-[radial-gradient(ellipse_at_top_left,_var(--tw-gradient-stops))] from-primary/10 via-background to-background p-4 relative overflow-hidden">
-      {/* Decorative elements */}
-      <div className="absolute top-0 left-0 w-96 h-96 bg-primary/5 rounded-full blur-3xl -translate-x-1/2 -translate-y-1/2" />
-      <div className="absolute bottom-0 right-0 w-64 h-64 bg-primary/5 rounded-full blur-3xl translate-x-1/3 translate-y-1/3" />
-
-      <div className="w-full max-w-md space-y-8 relative animate-fade-in">
-        <div className="text-center space-y-2">
-          <div className="inline-flex items-center justify-center w-16 h-16 rounded-2xl bg-primary text-primary-foreground shadow-2xl shadow-primary/30 mb-4 animate-slide-in">
-            <FileText size={32} />
-          </div>
-          <h1 className="text-4xl font-extrabold tracking-tight gradient-text">智联文档</h1>
-          <p className="text-muted-foreground">多源数据融合与智能文档处理系统</p>
-        </div>
-
-        <Card className="border-border/50 shadow-2xl backdrop-blur-sm bg-card/95">
-          <Tabs defaultValue="login" className="w-full">
-            <TabsList className="grid w-full grid-cols-2 rounded-t-xl h-12 bg-muted/50 p-1">
-              <TabsTrigger value="login" className="rounded-lg data-[state=active]:bg-background data-[state=active]:shadow-sm">登录</TabsTrigger>
-              <TabsTrigger value="signup" className="rounded-lg data-[state=active]:bg-background data-[state=active]:shadow-sm">注册</TabsTrigger>
-            </TabsList>
-            
-            <TabsContent value="login">
-              <form onSubmit={handleLogin}>
-                <CardHeader>
-                  <CardTitle>欢迎回来</CardTitle>
-                  <CardDescription>使用您的账号登录智联文档系统</CardDescription>
-                </CardHeader>
-                <CardContent className="space-y-4">
-                  <div className="space-y-2">
-                    <Label htmlFor="username">用户名</Label>
-                    <div className="relative">
-                      <User className="absolute left-3 top-2.5 h-4 w-4 text-muted-foreground" />
-                      <Input 
-                        id="username" 
-                        placeholder="请输入用户名" 
-                        className="pl-9 bg-muted/30 border-none focus-visible:ring-primary"
-                        value={username} 
-                        onChange={(e) => setUsername(e.target.value)} 
-                      />
-                    </div>
-                  </div>
-                  <div className="space-y-2">
-                    <Label htmlFor="password">密码</Label>
-                    <div className="relative">
-                      <Lock className="absolute left-3 top-2.5 h-4 w-4 text-muted-foreground" />
-                      <Input 
-                        id="password" 
-                        type="password" 
-                        placeholder="请输入密码" 
-                        className="pl-9 bg-muted/30 border-none focus-visible:ring-primary"
-                        value={password} 
-                        onChange={(e) => setPassword(e.target.value)} 
-                      />
-                    </div>
-                  </div>
-                </CardContent>
-                <CardFooter>
-                  <Button className="w-full h-11 text-lg font-semibold rounded-xl" type="submit" disabled={loading}>
-                    {loading ? '登录中...' : '立即登录'}
-                  </Button>
-                </CardFooter>
-              </form>
-            </TabsContent>
-
-            <TabsContent value="signup">
-              <form onSubmit={handleSignUp}>
-                <CardHeader>
-                  <CardTitle>创建账号</CardTitle>
-                  <CardDescription>开启智能文档处理的新体验</CardDescription>
-                </CardHeader>
-                <CardContent className="space-y-4">
-                  <div className="space-y-2">
-                    <Label htmlFor="signup-username">用户名</Label>
-                    <div className="relative">
-                      <User className="absolute left-3 top-2.5 h-4 w-4 text-muted-foreground" />
-                      <Input 
-                        id="signup-username" 
-                        placeholder="仅字母、数字和下划线" 
-                        className="pl-9 bg-muted/30 border-none focus-visible:ring-primary"
-                        value={username} 
-                        onChange={(e) => setUsername(e.target.value)} 
-                      />
-                    </div>
-                  </div>
-                  <div className="space-y-2">
-                    <Label htmlFor="signup-password">密码</Label>
-                    <div className="relative">
-                      <Lock className="absolute left-3 top-2.5 h-4 w-4 text-muted-foreground" />
-                      <Input 
-                        id="signup-password" 
-                        type="password" 
-                        placeholder="不少于 6 位" 
-                        className="pl-9 bg-muted/30 border-none focus-visible:ring-primary"
-                        value={password} 
-                        onChange={(e) => setPassword(e.target.value)} 
-                      />
-                    </div>
-                  </div>
-                </CardContent>
-                <CardFooter>
-                  <Button className="w-full h-11 text-lg font-semibold rounded-xl" type="submit" disabled={loading}>
-                    {loading ? '注册中...' : '注册账号'}
-                  </Button>
-                </CardFooter>
-              </form>
-            </TabsContent>
-          </Tabs>
-        </Card>
-
-        <div className="grid grid-cols-2 gap-4 text-center text-xs text-muted-foreground">
-          <div className="flex flex-col items-center gap-1">
-            <CheckCircle2 size={16} className="text-primary" />
-            <span>智能解析</span>
-          </div>
-          <div className="flex flex-col items-center gap-1">
-            <CheckCircle2 size={16} className="text-primary" />
-            <span>极速填表</span>
-          </div>
-        </div>
-
-        <div className="text-center text-sm text-muted-foreground">
-          &copy; 2026 智联文档 | 多源数据融合系统
-        </div>
-      </div>
-    </div>
-  );
-};
-
-export default Login;
--- a/frontend/src/pages/SamplePage.tsx
+++ b/frontend/src/pages/SamplePage.tsx
@@ -1,16 +0,0 @@
-/**
- * Sample Page
- */
-
-import PageMeta from "../components/common/PageMeta";
-
-export default function SamplePage() {
-  return (
-    <>
-      <PageMeta title="Home" description="Home Page Introduction" />
-      <div>
-        <h3>This is a sample page</h3>
-      </div>
-    </>
-  );
-}
--- a/frontend/src/pages/TaskHistory.tsx
+++ b/frontend/src/pages/TaskHistory.tsx
@@ -11,7 +11,8 @@ import {
  ChevronDown,
  ChevronUp,
  Trash2,
-  AlertCircle
+  AlertCircle,
+  HelpCircle
 } from 'lucide-react';
 import { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card';
 import { Button } from '@/components/ui/button';
@@ -24,9 +25,9 @@ import { Skeleton } from '@/components/ui/skeleton';

 type Task = {
  task_id: string;
-  status: 'pending' | 'processing' | 'success' | 'failure';
+  status: 'pending' | 'processing' | 'success' | 'failure' | 'unknown';
  created_at: string;
-  completed_at?: string;
+  updated_at?: string;
  message?: string;
  result?: any;
  error?: string;
@@ -38,54 +39,38 @@ const TaskHistory: React.FC = () => {
  const [loading, setLoading] = useState(true);
  const [expandedTask, setExpandedTask] = useState<string | null>(null);

-  // Mock data for demonstration
-  useEffect(() => {
-    // 模拟任务数据，实际应该从后端获取
-    setTasks([
-      {
-        task_id: 'task-001',
-        status: 'success',
-        created_at: new Date(Date.now() - 3600000).toISOString(),
-        completed_at: new Date(Date.now() - 3500000).toISOString(),
-        task_type: 'document_parse',
-        message: '文档解析完成',
-        result: {
-          doc_id: 'doc-001',
-          filename: 'report_q1_2026.docx',
-          extracted_fields: ['标题', '作者', '日期', '金额']
-        }
-      },
-      {
-        task_id: 'task-002',
-        status: 'success',
-        created_at: new Date(Date.now() - 7200000).toISOString(),
-        completed_at: new Date(Date.now() - 7100000).toISOString(),
-        task_type: 'excel_analysis',
-        message: 'Excel 分析完成',
-        result: {
-          filename: 'sales_data.xlsx',
-          row_count: 1250,
-          charts_generated: 3
-        }
-      },
-      {
-        task_id: 'task-003',
-        status: 'processing',
-        created_at: new Date(Date.now() - 600000).toISOString(),
-        task_type: 'template_fill',
-        message: '正在填充表格...'
-      },
-      {
-        task_id: 'task-004',
-        status: 'failure',
-        created_at: new Date(Date.now() - 86400000).toISOString(),
-        completed_at: new Date(Date.now() - 86390000).toISOString(),
-        task_type: 'document_parse',
-        message: '解析失败',
-        error: '文件格式不支持或文件已损坏'
+  // 获取任务历史数据
+  const fetchTasks = async () => {
+    try {
+      setLoading(true);
+      const response = await backendApi.getTasks(50, 0);
+      if (response.success && response.tasks) {
+        // 转换后端数据格式为前端格式
+        const convertedTasks: Task[] = response.tasks.map((t: any) => ({
+          task_id: t.task_id,
+          status: t.status || 'unknown',
+          created_at: t.created_at || new Date().toISOString(),
+          updated_at: t.updated_at,
+          message: t.message || '',
+          result: t.result,
+          error: t.error,
+          task_type: t.task_type || 'document_parse'
+        }));
+        setTasks(convertedTasks);
+      } else {
+        setTasks([]);
      }
-    ]);
-    setLoading(false);
+    } catch (error) {
+      console.error('获取任务列表失败:', error);
+      toast.error('获取任务列表失败');
+      setTasks([]);
+    } finally {
+      setLoading(false);
+    }
+  };
+
+  useEffect(() => {
+    fetchTasks();
  }, []);

  const getStatusBadge = (status: string) => {
@@ -96,6 +81,8 @@ const TaskHistory: React.FC = () => {
        return <Badge className="bg-destructive text-white text-[10px]"><XCircle size={12} className="mr-1" />失败</Badge>;
      case 'processing':
        return <Badge className="bg-amber-500 text-white text-[10px]"><Loader2 size={12} className="mr-1 animate-spin" />处理中</Badge>;
+      case 'unknown':
+        return <Badge className="bg-gray-500 text-white text-[10px]"><HelpCircle size={12} className="mr-1" />未知</Badge>;
      default:
        return <Badge className="bg-gray-500 text-white text-[10px]"><Clock size={12} className="mr-1" />等待</Badge>;
    }
@@ -133,15 +120,22 @@ const TaskHistory: React.FC = () => {
  };

  const handleDelete = async (taskId: string) => {
-    setTasks(prev => prev.filter(t => t.task_id !== taskId));
-    toast.success('任务已删除');
+    try {
+      await backendApi.deleteTask(taskId);
+      setTasks(prev => prev.filter(t => t.task_id !== taskId));
+      toast.success('任务已删除');
+    } catch (error) {
+      console.error('删除任务失败:', error);
+      toast.error('删除任务失败');
+    }
  };

  const stats = {
    total: tasks.length,
    success: tasks.filter(t => t.status === 'success').length,
    processing: tasks.filter(t => t.status === 'processing').length,
-    failure: tasks.filter(t => t.status === 'failure').length
+    failure: tasks.filter(t => t.status === 'failure').length,
+    unknown: tasks.filter(t => t.status === 'unknown').length
  };

  return (
@@ -151,7 +145,7 @@ const TaskHistory: React.FC = () => {
          <h1 className="text-3xl font-extrabold tracking-tight">任务历史</h1>
          <p className="text-muted-foreground">查看和管理您所有的文档处理任务记录</p>
        </div>
-        <Button variant="outline" className="rounded-xl gap-2" onClick={() => window.location.reload()}>
+        <Button variant="outline" className="rounded-xl gap-2" onClick={() => fetchTasks()}>
          <RefreshCcw size={18} />
          <span>刷新</span>
        </Button>
@@ -194,7 +188,8 @@ const TaskHistory: React.FC = () => {
                    "w-12 h-12 rounded-xl flex items-center justify-center shrink-0",
                    task.status === 'success' ? "bg-emerald-500/10 text-emerald-500" :
                    task.status === 'failure' ? "bg-destructive/10 text-destructive" :
-                    "bg-amber-500/10 text-amber-500"
+                    task.status === 'processing' ? "bg-amber-500/10 text-amber-500" :
+                    "bg-gray-500/10 text-gray-500"
                  )}>
                    {task.status === 'processing' ? (
                      <Loader2 size={24} className="animate-spin" />
@@ -212,16 +207,16 @@ const TaskHistory: React.FC = () => {
                      </Badge>
                    </div>
                    <p className="text-sm text-muted-foreground">
-                      {task.message || '任务执行中...'}
+                      {task.message || (task.status === 'unknown' ? '无法获取状态' : '任务执行中...')}
                    </p>
                    <div className="flex items-center gap-4 text-xs text-muted-foreground">
                      <span className="flex items-center gap-1">
                        <Clock size={12} />
-                        {format(new Date(task.created_at), 'yyyy-MM-dd HH:mm:ss')}
+                        {task.created_at ? format(new Date(task.created_at), 'yyyy-MM-dd HH:mm:ss') : '时间未知'}
                      </span>
-                      {task.completed_at && (
+                      {task.updated_at && task.status !== 'processing' && (
                        <span>
-                          耗时: {Math.round((new Date(task.completed_at).getTime() - new Date(task.created_at).getTime()) / 1000)} 秒
+                          更新: {format(new Date(task.updated_at), 'HH:mm:ss')}
                        </span>
                      )}
                    </div>
--- a/frontend/src/pages/TemplateFill.tsx
+++ b/frontend/src/pages/TemplateFill.tsx
@@ -1,4 +1,4 @@
-import React, { useState, useEffect } from 'react';
+import React, { useState, useEffect, useCallback, useRef } from 'react';
 import { useDropzone } from 'react-dropzone';
 import {
  TableProperties,
@@ -14,7 +14,12 @@ import {
  RefreshCcw,
  ChevronDown,
  ChevronUp,
-  Loader2
+  Loader2,
+  Files,
+  Trash2,
+  Eye,
+  File,
+  Plus
 } from 'lucide-react';
 import { Button } from '@/components/ui/button';
 import { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card';
@@ -26,6 +31,14 @@ import { format } from 'date-fns';
 import { toast } from 'sonner';
 import { cn } from '@/lib/utils';
 import { Skeleton } from '@/components/ui/skeleton';
+import {
+  Dialog,
+  DialogContent,
+  DialogHeader,
+  DialogTitle,
+} from "@/components/ui/dialog";
+import { ScrollArea } from '@/components/ui/scroll-area';
+import { useTemplateFill } from '@/context/TemplateFillContext';

 type DocumentItem = {
  doc_id: string;
@@ -41,73 +54,34 @@ type DocumentItem = {
  };
 };

-type TemplateField = {
-  cell: string;
-  name: string;
-  field_type: string;
-  required: boolean;
-  hint?: string;
-};
-
 const TemplateFill: React.FC = () => {
-  const [step, setStep] = useState<'upload-template' | 'select-source' | 'preview' | 'filling'>('upload-template');
-  const [templateFile, setTemplateFile] = useState<File | null>(null);
-  const [templateFields, setTemplateFields] = useState<TemplateField[]>([]);
-  const [sourceDocs, setSourceDocs] = useState<DocumentItem[]>([]);
-  const [selectedDocs, setSelectedDocs] = useState<string[]>([]);
+  const {
+    step, setStep,
+    templateFile, setTemplateFile,
+    templateFields, setTemplateFields,
+    sourceFiles, setSourceFiles, addSourceFiles, removeSourceFile,
+    sourceFilePaths, setSourceFilePaths,
+    sourceDocIds, setSourceDocIds, addSourceDocId, removeSourceDocId,
+    templateId, setTemplateId,
+    filledResult, setFilledResult,
+    reset
+  } = useTemplateFill();
+
  const [loading, setLoading] = useState(false);
-  const [filling, setFilling] = useState(false);
-  const [filledResult, setFilledResult] = useState<any>(null);
+  const [previewDoc, setPreviewDoc] = useState<{ name: string; content: string } | null>(null);
+  const [previewOpen, setPreviewOpen] = useState(false);
+  const [sourceMode, setSourceMode] = useState<'upload' | 'select'>('upload');
+  const [uploadedDocuments, setUploadedDocuments] = useState<DocumentItem[]>([]);
+  const [docsLoading, setDocsLoading] = useState(false);
+  const sourceFileInputRef = useRef<HTMLInputElement>(null);

-  // Load available source documents
-  useEffect(() => {
-    loadSourceDocuments();
-  }, []);
-
-  const loadSourceDocuments = async () => {
-    setLoading(true);
-    try {
-      const result = await backendApi.getDocuments(undefined, 100);
-      if (result.success) {
-        // Filter to only non-Excel documents that can be used as data sources
-        const docs = (result.documents || []).filter((d: DocumentItem) =>
-          ['docx', 'md', 'txt', 'xlsx'].includes(d.doc_type)
-        );
-        setSourceDocs(docs);
-      }
-    } catch (err: any) {
-      toast.error('加载数据源失败');
-    } finally {
-      setLoading(false);
-    }
-  };
-
-  const onTemplateDrop = async (acceptedFiles: File[]) => {
+  // 模板拖拽
+  const onTemplateDrop = useCallback((acceptedFiles: File[]) => {
    const file = acceptedFiles[0];
-    if (!file) return;
-
-    const ext = file.name.split('.').pop()?.toLowerCase();
-    if (!['xlsx', 'xls', 'docx'].includes(ext || '')) {
-      toast.error('仅支持 xlsx/xls/docx 格式的模板文件');
-      return;
+    if (file) {
+      setTemplateFile(file);
    }
-
-    setTemplateFile(file);
-    setLoading(true);
-
-    try {
-      const result = await backendApi.uploadTemplate(file);
-      if (result.success) {
-        setTemplateFields(result.fields || []);
-        setStep('select-source');
-        toast.success('模板上传成功');
-      }
-    } catch (err: any) {
-      toast.error('模板上传失败: ' + (err.message || '未知错误'));
-    } finally {
-      setLoading(false);
-    }
-  };
+  }, []);

  const { getRootProps: getTemplateProps, getInputProps: getTemplateInputProps, isDragActive: isTemplateDragActive } = useDropzone({
    onDrop: onTemplateDrop,
@@ -116,33 +90,157 @@ const TemplateFill: React.FC = () => {
      'application/vnd.ms-excel': ['.xls'],
      'application/vnd.openxmlformats-officedocument.wordprocessingml.document': ['.docx']
    },
-    maxFiles: 1
+    maxFiles: 1,
+    multiple: false
  });

-  const handleFillTemplate = async () => {
-    if (!templateFile || selectedDocs.length === 0) {
-      toast.error('请选择数据源文档');
+  // 源文档拖拽
+  const onSourceDrop = useCallback((e: React.DragEvent) => {
+    e.preventDefault();
+    const files = Array.from(e.dataTransfer.files).filter(f => {
+      const ext = f.name.split('.').pop()?.toLowerCase();
+      return ['xlsx', 'xls', 'docx', 'md', 'txt'].includes(ext || '');
+    });
+    if (files.length > 0) {
+      addSourceFiles(files.map(f => ({ file: f })));
+    }
+  }, [addSourceFiles]);
+
+  const handleSourceFileSelect = (e: React.ChangeEvent<HTMLInputElement>) => {
+    const files = Array.from(e.target.files || []);
+    if (files.length > 0) {
+      addSourceFiles(files.map(f => ({ file: f })));
+      toast.success(`已添加 ${files.length} 个文件`);
+    }
+    e.target.value = '';
+  };
+
+  // 仅添加源文档不上传
+  const handleAddSourceFiles = () => {
+    if (sourceFiles.length === 0) {
+      toast.error('请先选择源文档');
+      return;
+    }
+    toast.success(`已添加 ${sourceFiles.length} 个源文档，可继续添加更多`);
+  };
+
+  // 加载已上传文档
+  const loadUploadedDocuments = useCallback(async () => {
+    setDocsLoading(true);
+    try {
+      const result = await backendApi.getDocuments(undefined, 100);
+      if (result.success) {
+        // 过滤可作为数据源的文档类型
+        const docs = (result.documents || []).filter((d: DocumentItem) =>
+          ['docx', 'md', 'txt', 'xlsx', 'xls'].includes(d.doc_type)
+        );
+        setUploadedDocuments(docs);
+      }
+    } catch (err: any) {
+      console.error('加载文档失败:', err);
+    } finally {
+      setDocsLoading(false);
+    }
+  }, []);
+
+  // 删除文档
+  const handleDeleteDocument = async (docId: string, e: React.MouseEvent) => {
+    e.stopPropagation();
+    if (!confirm('确定要删除该文档吗？')) return;
+    try {
+      const result = await backendApi.deleteDocument(docId);
+      if (result.success) {
+        setUploadedDocuments(prev => prev.filter(d => d.doc_id !== docId));
+        removeSourceDocId(docId);
+        toast.success('文档已删除');
+      } else {
+        toast.error(result.message || '删除失败');
+      }
+    } catch (err: any) {
+      toast.error('删除失败: ' + (err.message || '未知错误'));
+    }
+  };
+
+  useEffect(() => {
+    if (sourceMode === 'select') {
+      loadUploadedDocuments();
+    }
+  }, [sourceMode, loadUploadedDocuments]);
+
+  const handleJointUploadAndFill = async () => {
+    if (!templateFile) {
+      toast.error('请先上传模板文件');
      return;
    }

-    setFilling(true);
-    setStep('filling');
+    // 检查是否选择了数据源
+    if (sourceMode === 'upload' && sourceFiles.length === 0) {
+      toast.error('请上传源文档或从已上传文档中选择');
+      return;
+    }
+    if (sourceMode === 'select' && sourceDocIds.length === 0) {
+      toast.error('请选择源文档');
+      return;
+    }
+
+    setLoading(true);

    try {
-      // 调用后端填表接口，传递选中的文档ID
-      const result = await backendApi.fillTemplate(
-        'temp-template-id',
-        templateFields,
-        selectedDocs  // 传递源文档ID列表
-      );
-      setFilledResult(result);
-      setStep('preview');
-      toast.success('表格填写完成');
+      if (sourceMode === 'select') {
+        // 使用已上传文档作为数据源
+        const result = await backendApi.uploadTemplate(templateFile);
+
+        if (result.success) {
+          setTemplateFields(result.fields || []);
+          setTemplateId(result.template_id || 'temp');
+          toast.success('开始智能填表');
+          setStep('filling');
+
+          // 使用 source_doc_ids 进行填表
+          const fillResult = await backendApi.fillTemplate(
+            result.template_id || 'temp',
+            result.fields || [],
+            sourceDocIds,
+            [],
+            '请从以下文档中提取相关信息填写表格'
+          );
+
+          setFilledResult(fillResult);
+          setStep('preview');
+          toast.success('表格填写完成');
+        }
+      } else {
+        // 使用联合上传API
+        const result = await backendApi.uploadTemplateAndSources(
+          templateFile,
+          sourceFiles.map(sf => sf.file)
+        );
+
+        if (result.success) {
+          setTemplateFields(result.fields || []);
+          setTemplateId(result.template_id);
+          setSourceFilePaths(result.source_file_paths || []);
+          toast.success('文档上传成功，开始智能填表');
+          setStep('filling');
+
+          // 自动开始填表
+          const fillResult = await backendApi.fillTemplate(
+            result.template_id,
+            result.fields || [],
+            [],
+            result.source_file_paths || [],
+            '请从以下文档中提取相关信息填写表格'
+          );
+
+          setFilledResult(fillResult);
+          setStep('preview');
+          toast.success('表格填写完成');
+        }
+      }
    } catch (err: any) {
-      toast.error('填表失败: ' + (err.message || '未知错误'));
-      setStep('select-source');
+      toast.error('处理失败: ' + (err.message || '未知错误'));
    } finally {
-      setFilling(false);
+      setLoading(false);
    }
  };

@@ -150,7 +248,11 @@ const TemplateFill: React.FC = () => {
    if (!templateFile || !filledResult) return;

    try {
-      const blob = await backendApi.exportFilledTemplate('temp', filledResult.filled_data || {}, 'xlsx');
+      const blob = await backendApi.exportFilledTemplate(
+        templateId || 'temp',
+        filledResult.filled_data || {},
+        'xlsx'
+      );
      const url = URL.createObjectURL(blob);
      const a = document.createElement('a');
      a.href = url;
@@ -163,12 +265,18 @@ const TemplateFill: React.FC = () => {
    }
  };

-  const resetFlow = () => {
-    setStep('upload-template');
-    setTemplateFile(null);
-    setTemplateFields([]);
-    setSelectedDocs([]);
-    setFilledResult(null);
+  const getFileIcon = (filename: string) => {
+    const ext = filename.split('.').pop()?.toLowerCase();
+    if (['xlsx', 'xls'].includes(ext || '')) {
+      return <FileSpreadsheet size={20} className="text-emerald-500" />;
+    }
+    if (ext === 'docx') {
+      return <FileText size={20} className="text-blue-500" />;
+    }
+    if (['md', 'txt'].includes(ext || '')) {
+      return <FileText size={20} className="text-orange-500" />;
+    }
+    return <File size={20} className="text-gray-500" />;
  };

  return (
@@ -180,208 +288,248 @@ const TemplateFill: React.FC = () => {
            根据您的表格模板，自动聚合多源文档信息进行精准填充
          </p>
        </div>
-        {step !== 'upload-template' && (
-          <Button variant="outline" className="rounded-xl gap-2" onClick={resetFlow}>
+        {step !== 'upload' && (
+          <Button variant="outline" className="rounded-xl gap-2" onClick={reset}>
            <RefreshCcw size={18} />
            <span>重新开始</span>
          </Button>
        )}
      </section>

-      {/* Progress Steps */}
-      <div className="flex items-center justify-center gap-4">
-        {['上传模板', '选择数据源', '填写预览'].map((label, idx) => {
-          const stepIndex = ['upload-template', 'select-source', 'preview'].indexOf(step);
-          const isActive = idx <= stepIndex;
-          const isCurrent = idx === stepIndex;
-
-          return (
-            <React.Fragment key={idx}>
-              <div className={cn(
-                "flex items-center gap-2 px-4 py-2 rounded-full transition-all",
-                isActive ? "bg-primary text-primary-foreground" : "bg-muted text-muted-foreground"
-              )}>
-                <div className={cn(
-                  "w-6 h-6 rounded-full flex items-center justify-center text-xs font-bold",
-                  isCurrent ? "bg-white/20" : ""
-                )}>
-                  {idx + 1}
-                </div>
-                <span className="text-sm font-medium">{label}</span>
-              </div>
-              {idx < 2 && (
-                <div className={cn(
-                  "w-12 h-0.5",
-                  idx < stepIndex ? "bg-primary" : "bg-muted"
-                )} />
-              )}
-            </React.Fragment>
-          );
-        })}
-      </div>
-
-      {/* Step 1: Upload Template */}
-      {step === 'upload-template' && (
-        <div
-          {...getTemplateProps()}
-          className={cn(
-            "border-2 border-dashed rounded-3xl p-16 transition-all duration-300 flex flex-col items-center justify-center text-center cursor-pointer group",
-            isTemplateDragActive ? "border-primary bg-primary/5" : "border-muted-foreground/20 hover:border-primary/50 hover:bg-primary/5"
-          )}
-        >
-          <input {...getTemplateInputProps()} />
-          <div className="w-20 h-20 rounded-2xl bg-primary/10 text-primary flex items-center justify-center mb-6 group-hover:scale-110 transition-transform">
-            {loading ? <Loader2 className="animate-spin" size={40} /> : <Upload size={40} />}
-          </div>
-          <div className="space-y-2 max-w-md">
-            <p className="text-xl font-bold tracking-tight">
-              {isTemplateDragActive ? '释放以开始上传' : '点击或拖拽上传表格模板'}
-            </p>
-            <p className="text-sm text-muted-foreground">
-              支持 Excel (.xlsx, .xls) 或 Word (.docx) 格式的表格模板
-            </p>
-          </div>
-          <div className="mt-6 flex gap-3">
-            <Badge variant="outline" className="bg-emerald-500/10 text-emerald-600 border-emerald-200">
-              <FileSpreadsheet size={14} className="mr-1" /> Excel 模板
-            </Badge>
-            <Badge variant="outline" className="bg-blue-500/10 text-blue-600 border-blue-200">
-              <FileText size={14} className="mr-1" /> Word 模板
-            </Badge>
-          </div>
-        </div>
-      )}
-
-      {/* Step 2: Select Source Documents */}
-      {step === 'select-source' && (
-        <div className="space-y-6">
-          {/* Template Info */}
+      {/* Step 1: Upload - Joint Upload of Template + Source Docs */}
+      {step === 'upload' && (
+        <div className="grid grid-cols-1 lg:grid-cols-2 gap-6">
+          {/* Template Upload */}
          <Card className="border-none shadow-md">
            <CardHeader className="pb-4">
              <CardTitle className="text-lg flex items-center gap-2">
                <FileSpreadsheet className="text-primary" size={20} />
-                已上传模板
-              </CardTitle>
-            </CardHeader>
-            <CardContent>
-              <div className="flex items-center gap-4">
-                <div className="w-12 h-12 rounded-xl bg-emerald-500/10 text-emerald-500 flex items-center justify-center">
-                  <FileSpreadsheet size={24} />
-                </div>
-                <div className="flex-1">
-                  <p className="font-bold">{templateFile?.name}</p>
-                  <p className="text-sm text-muted-foreground">
-                    {templateFields.length} 个字段待填写
-                  </p>
-                </div>
-                <Button variant="ghost" size="sm" onClick={() => setStep('upload-template')}>
-                  重新选择
-                </Button>
-              </div>
-
-              {/* Template Fields Preview */}
-              <div className="mt-4 p-4 bg-muted/30 rounded-xl">
-                <p className="text-xs font-bold uppercase tracking-widest text-muted-foreground mb-3">待填写字段</p>
-                <div className="flex flex-wrap gap-2">
-                  {templateFields.map((field, idx) => (
-                    <Badge key={idx} variant="outline" className="bg-background">
-                      {field.name}
-                    </Badge>
-                  ))}
-                </div>
-              </div>
-            </CardContent>
-          </Card>
-
-          {/* Source Documents Selection */}
-          <Card className="border-none shadow-md">
-            <CardHeader className="pb-4">
-              <CardTitle className="text-lg flex items-center gap-2">
-                <FileText className="text-primary" size={20} />
-                选择数据源文档
+                表格模板
              </CardTitle>
              <CardDescription>
-                从已上传的文档中选择作为填表的数据来源，支持 Excel 和非结构化文档
+                上传需要填写的 Excel/Word 模板文件
              </CardDescription>
            </CardHeader>
            <CardContent>
-              {loading ? (
-                <div className="space-y-3">
-                  {[1, 2, 3].map(i => <Skeleton key={i} className="h-16 w-full rounded-xl" />)}
-                </div>
-              ) : sourceDocs.length > 0 ? (
-                <div className="space-y-3">
-                  {sourceDocs.map(doc => (
-                    <div
-                      key={doc.doc_id}
-                      className={cn(
-                        "flex items-center gap-4 p-4 rounded-xl border-2 transition-all cursor-pointer",
-                        selectedDocs.includes(doc.doc_id)
-                          ? "border-primary bg-primary/5"
-                          : "border-border hover:bg-muted/30"
-                      )}
-                      onClick={() => {
-                        setSelectedDocs(prev =>
-                          prev.includes(doc.doc_id)
-                            ? prev.filter(id => id !== doc.doc_id)
-                            : [...prev, doc.doc_id]
-                        );
-                      }}
-                    >
-                      <div className={cn(
-                        "w-6 h-6 rounded-md border-2 flex items-center justify-center transition-all",
-                        selectedDocs.includes(doc.doc_id)
-                          ? "border-primary bg-primary text-white"
-                          : "border-muted-foreground/30"
-                      )}>
-                        {selectedDocs.includes(doc.doc_id) && <CheckCircle2 size={14} />}
-                      </div>
-                      <div className={cn(
-                        "w-10 h-10 rounded-lg flex items-center justify-center",
-                        doc.doc_type === 'xlsx' ? "bg-emerald-500/10 text-emerald-500" : "bg-blue-500/10 text-blue-500"
-                      )}>
-                        {doc.doc_type === 'xlsx' ? <FileSpreadsheet size={20} /> : <FileText size={20} />}
-                      </div>
-                      <div className="flex-1 min-w-0">
-                        <p className="font-semibold truncate">{doc.original_filename}</p>
-                        <p className="text-xs text-muted-foreground">
-                          {doc.doc_type.toUpperCase()} • {format(new Date(doc.created_at), 'yyyy-MM-dd')}
-                        </p>
-                      </div>
-                      {doc.metadata?.columns && (
-                        <Badge variant="outline" className="text-xs">
-                          {doc.metadata.columns.length} 列
-                        </Badge>
-                      )}
-                    </div>
-                  ))}
+              {!templateFile ? (
+                <div
+                  {...getTemplateProps()}
+                  className={cn(
+                    "border-2 border-dashed rounded-2xl p-8 transition-all duration-300 flex flex-col items-center justify-center text-center cursor-pointer group min-h-[200px]",
+                    isTemplateDragActive ? "border-primary bg-primary/5" : "border-muted-foreground/20 hover:border-primary/50 hover:bg-primary/5"
+                  )}
+                >
+                  <input {...getTemplateInputProps()} />
+                  <div className="w-14 h-14 rounded-xl bg-primary/10 text-primary flex items-center justify-center mb-4 group-hover:scale-110 transition-transform">
+                    {loading ? <Loader2 className="animate-spin" size={28} /> : <Upload size={28} />}
+                  </div>
+                  <p className="font-medium">
+                    {isTemplateDragActive ? '释放以上传' : '点击或拖拽上传模板'}
+                  </p>
+                  <p className="text-xs text-muted-foreground mt-1">
+                    支持 .xlsx .xls .docx
+                  </p>
                </div>
              ) : (
-                <div className="text-center py-12 text-muted-foreground">
-                  <FileText size={48} className="mx-auto mb-4 opacity-30" />
-                  <p>暂无数据源文档，请先上传文档</p>
+                <div className="flex items-center gap-3 p-4 bg-emerald-500/5 rounded-xl border border-emerald-200">
+                  <div className="w-10 h-10 rounded-lg bg-emerald-500/10 text-emerald-500 flex items-center justify-center">
+                    <FileSpreadsheet size={20} />
+                  </div>
+                  <div className="flex-1 min-w-0">
+                    <p className="font-medium truncate">{templateFile.name}</p>
+                    <p className="text-xs text-muted-foreground">
+                      {(templateFile.size / 1024).toFixed(1)} KB
+                    </p>
+                  </div>
+                  <Button variant="ghost" size="sm" onClick={() => setTemplateFile(null)}>
+                    <X size={16} />
+                  </Button>
                </div>
              )}
            </CardContent>
          </Card>

+          {/* Source Documents Upload */}
+          <Card className="border-none shadow-md">
+            <CardHeader className="pb-4">
+              <CardTitle className="text-lg flex items-center gap-2">
+                <Files className="text-primary" size={20} />
+                源文档
+              </CardTitle>
+              <CardDescription>
+                选择包含数据的源文档作为填表依据
+              </CardDescription>
+              {/* Source Mode Tabs */}
+              <div className="flex gap-2 mt-2">
+                <Button
+                  variant={sourceMode === 'upload' ? 'default' : 'outline'}
+                  size="sm"
+                  onClick={() => setSourceMode('upload')}
+                >
+                  <Upload size={14} className="mr-1" />
+                  上传文件
+                </Button>
+                <Button
+                  variant={sourceMode === 'select' ? 'default' : 'outline'}
+                  size="sm"
+                  onClick={() => setSourceMode('select')}
+                >
+                  <Files size={14} className="mr-1" />
+                  从文档中心选择
+                </Button>
+              </div>
+            </CardHeader>
+            <CardContent>
+              {sourceMode === 'upload' ? (
+                <>
+                  <div className="border-2 border-dashed rounded-2xl p-8 transition-all duration-300 flex flex-col items-center justify-center text-center cursor-pointer group min-h-[200px] border-muted-foreground/20 hover:border-primary/50 hover:bg-primary/5">
+                    <input
+                      id="source-file-input"
+                      type="file"
+                      multiple={true}
+                      accept=".xlsx,.xls,.docx,.md,.txt"
+                      onChange={handleSourceFileSelect}
+                      className="hidden"
+                    />
+                    <label htmlFor="source-file-input" className="cursor-pointer flex flex-col items-center">
+                      <div className="w-14 h-14 rounded-xl bg-blue-500/10 text-blue-500 flex items-center justify-center mb-4 group-hover:scale-110 transition-transform">
+                        {loading ? <Loader2 className="animate-spin" size={28} /> : <Upload size={28} />}
+                      </div>
+                      <p className="font-medium">
+                        点击上传源文档
+                      </p>
+                      <p className="text-xs text-muted-foreground mt-1">
+                        支持 .xlsx .xls .docx .md .txt
+                      </p>
+                    </label>
+                  </div>
+                  <div
+                    onDragOver={(e) => { e.preventDefault(); }}
+                    onDrop={onSourceDrop}
+                    className="mt-2 text-center text-xs text-muted-foreground"
+                  >
+                    或拖拽文件到此处
+                  </div>
+
+                  {/* Selected Source Files */}
+                  {sourceFiles.length > 0 && (
+                    <div className="mt-4 space-y-2">
+                      {sourceFiles.map((sf, idx) => (
+                        <div key={idx} className="flex items-center gap-3 p-3 bg-muted/50 rounded-xl">
+                          {getFileIcon(sf.file.name)}
+                          <div className="flex-1 min-w-0">
+                            <p className="text-sm font-medium truncate">{sf.file.name}</p>
+                            <p className="text-xs text-muted-foreground">
+                              {(sf.file.size / 1024).toFixed(1)} KB
+                            </p>
+                          </div>
+                          <Button variant="ghost" size="sm" onClick={() => removeSourceFile(idx)}>
+                            <Trash2 size={14} className="text-red-500" />
+                          </Button>
+                        </div>
+                      ))}
+                      <div className="flex justify-center pt-2">
+                        <Button variant="outline" size="sm" onClick={() => document.getElementById('source-file-input')?.click()}>
+                          <Plus size={14} className="mr-1" />
+                          继续添加更多文档
+                        </Button>
+                      </div>
+                    </div>
+                  )}
+                </>
+              ) : (
+                <>
+                  {/* Uploaded Documents Selection */}
+                  {docsLoading ? (
+                    <div className="space-y-2">
+                      {[1, 2, 3].map(i => (
+                        <Skeleton key={i} className="h-16 w-full rounded-xl" />
+                      ))}
+                    </div>
+                  ) : uploadedDocuments.length > 0 ? (
+                    <div className="space-y-2">
+                      {sourceDocIds.length > 0 && (
+                        <div className="flex items-center justify-between p-3 bg-primary/5 rounded-xl border border-primary/20">
+                          <span className="text-sm font-medium">已选择 {sourceDocIds.length} 个文档</span>
+                          <Button variant="ghost" size="sm" onClick={() => loadUploadedDocuments()}>
+                            <RefreshCcw size={14} className="mr-1" />
+                            刷新列表
+                          </Button>
+                        </div>
+                      )}
+                      <div className="max-h-[300px] overflow-y-auto space-y-2">
+                        {uploadedDocuments.map((doc) => (
+                          <div
+                            key={doc.doc_id}
+                            className={cn(
+                              "flex items-center gap-3 p-3 rounded-xl border-2 transition-all cursor-pointer",
+                              sourceDocIds.includes(doc.doc_id)
+                                ? "border-primary bg-primary/5"
+                                : "border-border hover:bg-muted/30"
+                            )}
+                            onClick={() => {
+                              if (sourceDocIds.includes(doc.doc_id)) {
+                                removeSourceDocId(doc.doc_id);
+                              } else {
+                                addSourceDocId(doc.doc_id);
+                              }
+                            }}
+                          >
+                            <div className={cn(
+                              "w-6 h-6 rounded-md border-2 flex items-center justify-center transition-all shrink-0",
+                              sourceDocIds.includes(doc.doc_id)
+                                ? "border-primary bg-primary text-white"
+                                : "border-muted-foreground/30"
+                            )}>
+                              {sourceDocIds.includes(doc.doc_id) && <CheckCircle2 size={14} />}
+                            </div>
+                            {getFileIcon(doc.original_filename)}
+                            <div className="flex-1 min-w-0">
+                              <p className="text-sm font-medium truncate">{doc.original_filename}</p>
+                              <p className="text-xs text-muted-foreground">
+                                {doc.doc_type.toUpperCase()} • {format(new Date(doc.created_at), 'yyyy-MM-dd')}
+                              </p>
+                            </div>
+                            <Button
+                              variant="ghost"
+                              size="sm"
+                              onClick={(e) => handleDeleteDocument(doc.doc_id, e)}
+                              className="shrink-0"
+                            >
+                              <Trash2 size={14} className="text-red-500" />
+                            </Button>
+                          </div>
+                        ))}
+                      </div>
+                    </div>
+                  ) : (
+                    <div className="text-center py-8 text-muted-foreground">
+                      <Files size={32} className="mx-auto mb-2 opacity-30" />
+                      <p className="text-sm">暂无可用的已上传文档</p>
+                    </div>
+                  )}
+                </>
+              )}
+            </CardContent>
+          </Card>
+
          {/* Action Button */}
-          <div className="flex justify-center">
+          <div className="col-span-1 lg:col-span-2 flex justify-center">
            <Button
              size="lg"
-              className="rounded-xl px-8 shadow-lg shadow-primary/20 gap-2"
-              disabled={selectedDocs.length === 0 || filling}
-              onClick={handleFillTemplate}
+              className="rounded-xl px-12 shadow-lg shadow-primary/20 gap-2"
+              disabled={!templateFile || loading}
+              onClick={handleJointUploadAndFill}
            >
-              {filling ? (
+              {loading ? (
                <>
                  <Loader2 className="animate-spin" size={20} />
-                  <span>AI 正在分析并填表...</span>
+                  <span>正在处理...</span>
                </>
              ) : (
                <>
                  <Sparkles size={20} />
-                  <span>开始智能填表</span>
+                  <span>上传并智能填表</span>
                </>
              )}
            </Button>
@@ -389,49 +537,7 @@ const TemplateFill: React.FC = () => {
        </div>
      )}

-      {/* Step 3: Preview Results */}
-      {step === 'preview' && filledResult && (
-        <Card className="border-none shadow-md">
-          <CardHeader>
-            <CardTitle className="text-lg flex items-center gap-2">
-              <CheckCircle2 className="text-emerald-500" size={20} />
-              填表完成
-            </CardTitle>
-            <CardDescription>
-              系统已根据 {selectedDocs.length} 份文档自动完成表格填写
-            </CardDescription>
-          </CardHeader>
-          <CardContent className="space-y-6">
-            {/* Filled Data Preview */}
-            <div className="p-6 bg-muted/30 rounded-2xl">
-              <div className="space-y-4">
-                {templateFields.map((field, idx) => (
-                  <div key={idx} className="flex items-center gap-4">
-                    <div className="w-32 text-sm font-medium text-muted-foreground">{field.name}</div>
-                    <div className="flex-1 p-3 bg-background rounded-xl border">
-                      {(filledResult.filled_data || {})[field.name] || '-'}
-                    </div>
-                  </div>
-                ))}
-              </div>
-            </div>
-
-            {/* Action Buttons */}
-            <div className="flex justify-center gap-4">
-              <Button variant="outline" className="rounded-xl gap-2" onClick={resetFlow}>
-                <RefreshCcw size={18} />
-                <span>继续填表</span>
-              </Button>
-              <Button className="rounded-xl gap-2 shadow-lg shadow-primary/20" onClick={handleExport}>
-                <Download size={18} />
-                <span>导出结果</span>
-              </Button>
-            </div>
-          </CardContent>
-        </Card>
-      )}
-
-      {/* Filling State */}
+      {/* Step 2: Filling State */}
      {step === 'filling' && (
        <Card className="border-none shadow-md">
          <CardContent className="py-16 flex flex-col items-center justify-center">
@@ -440,11 +546,117 @@ const TemplateFill: React.FC = () => {
            </div>
            <h3 className="text-xl font-bold mb-2">AI 正在智能分析并填表</h3>
            <p className="text-muted-foreground text-center max-w-md">
-              系统正在从 {selectedDocs.length} 份文档中检索相关信息，生成字段描述，并使用 RAG 增强填写准确性...
+              系统正在从 {sourceFiles.length || sourceFilePaths.length} 份文档中检索相关信息...
            </p>
          </CardContent>
        </Card>
      )}
+
+      {/* Step 3: Preview Results */}
+      {step === 'preview' && filledResult && (
+        <div className="space-y-6">
+          <Card className="border-none shadow-md">
+            <CardHeader>
+              <CardTitle className="text-lg flex items-center gap-2">
+                <CheckCircle2 className="text-emerald-500" size={20} />
+                填表完成
+              </CardTitle>
+              <CardDescription>
+                系统已根据 {sourceFiles.length || sourceFilePaths.length} 份文档自动完成表格填写
+              </CardDescription>
+            </CardHeader>
+            <CardContent>
+              {/* Filled Data Preview */}
+              <div className="p-6 bg-muted/30 rounded-2xl">
+                <div className="space-y-4">
+                  {templateFields.map((field, idx) => {
+                    const value = filledResult.filled_data?.[field.name];
+                    const displayValue = Array.isArray(value)
+                      ? value.filter(v => v && String(v).trim()).join(', ') || '-'
+                      : value || '-';
+                    return (
+                      <div key={idx} className="flex items-center gap-4">
+                        <div className="w-40 text-sm font-medium text-muted-foreground">{field.name}</div>
+                        <div className="flex-1 p-3 bg-background rounded-xl border">
+                          {displayValue}
+                        </div>
+                      </div>
+                    );
+                  })}
+                </div>
+              </div>
+
+              {/* Source Files Info */}
+              <div className="mt-4 flex flex-wrap gap-2">
+                {sourceFiles.map((sf, idx) => (
+                  <Badge key={idx} variant="outline" className="bg-blue-500/5">
+                    {getFileIcon(sf.file.name)}
+                    <span className="ml-1">{sf.file.name}</span>
+                  </Badge>
+                ))}
+              </div>
+
+              {/* Action Buttons */}
+              <div className="flex justify-center gap-4 mt-6">
+                <Button variant="outline" className="rounded-xl gap-2" onClick={reset}>
+                  <RefreshCcw size={18} />
+                  <span>继续填表</span>
+                </Button>
+                <Button className="rounded-xl gap-2 shadow-lg shadow-primary/20" onClick={handleExport}>
+                  <Download size={18} />
+                  <span>导出结果</span>
+                </Button>
+              </div>
+            </CardContent>
+          </Card>
+
+          {/* Fill Details */}
+          {filledResult.fill_details && filledResult.fill_details.length > 0 && (
+            <Card className="border-none shadow-md">
+              <CardHeader>
+                <CardTitle className="text-lg">填写详情</CardTitle>
+              </CardHeader>
+              <CardContent>
+                <div className="space-y-3">
+                  {filledResult.fill_details.map((detail: any, idx: number) => (
+                    <div key={idx} className="flex items-start gap-3 p-3 bg-muted/30 rounded-xl text-sm">
+                      <div className="w-1 h-1 rounded-full bg-primary mt-2" />
+                      <div className="flex-1">
+                        <div className="font-medium">{detail.field}</div>
+                        <div className="text-muted-foreground text-xs mt-1">
+                          来源: {detail.source} | 置信度: {detail.confidence ? (detail.confidence * 100).toFixed(0) + '%' : 'N/A'}
+                        </div>
+                        {detail.warning && (
+                          <div className="mt-2 p-2 bg-yellow-50 border border-yellow-200 rounded-lg text-yellow-700 text-xs">
+                            ⚠️ {detail.warning}
+                          </div>
+                        )}
+                        {detail.values && detail.values.length > 1 && !detail.warning && (
+                          <div className="mt-2 text-xs text-muted-foreground">
+                            多值: {detail.values.join(', ')}
+                          </div>
+                        )}
+                      </div>
+                    </div>
+                  ))}
+                </div>
+              </CardContent>
+            </Card>
+          )}
+        </div>
+      )}
+
+      {/* Preview Dialog */}
+      <Dialog open={previewOpen} onOpenChange={setPreviewOpen}>
+        <DialogContent className="max-w-2xl">
+          <DialogHeader>
+            <DialogTitle>{previewDoc?.name || '文档预览'}</DialogTitle>
+          </DialogHeader>
+          <ScrollArea className="max-h-[60vh]">
+            <pre className="text-sm whitespace-pre-wrap">{previewDoc?.content}</pre>
+          </ScrollArea>
+        </DialogContent>
+      </Dialog>
    </div>
  );
 };
--- a/logs/rag_disable_note.txt
+++ b/logs/rag_disable_note.txt
@@ -0,0 +1,59 @@
+RAG 服务临时禁用说明
+========================
+日期: 2026-04-08
+
+修改内容:
+----------
+应需求，RAG 向量检索功能已临时禁用，具体如下:
+
+1. 修改文件: backend/app/services/rag_service.py
+
+2. 关键变更:
+   - 在 RAGService.__init__ 中添加 self._disabled = True 标志
+   - index_field() - 添加 _disabled 检查，跳过实际索引操作并记录日志
+   - index_document_content() - 添加 _disabled 检查，跳过实际索引操作并记录日志
+   - retrieve() - 添加 _disabled 检查，返回空列表并记录日志
+   - get_vector_count() - 添加 _disabled 检查，返回 0 并记录日志
+   - clear() - 添加 _disabled 检查，跳过实际清空操作并记录日志
+
+3. 行为变更:
+   - 所有 RAG 索引构建操作会被记录到日志 ([RAG DISABLED] 前缀)
+   - 所有 RAG 检索操作返回空结果
+   - 向量计数始终返回 0
+   - 实际向量数据库操作被跳过
+
+4. 恢复方式:
+   - 将 RAGService.__init__ 中的 self._disabled = True 改为 self._disabled = False
+   - 重新启动服务即可恢复 RAG 功能
+
+目的:
+------
+保留 RAG 索引构建功能的前端界面和代码结构，暂不实际调用向量数据库 API，
+待后续需要时再启用。
+
+影响范围:
+---------
+- /api/v1/rag/search - RAG 搜索接口 (返回空结果)
+- /api/v1/rag/status - RAG 状态接口 (返回 vector_count=0)
+- /api/v1/rag/rebuild - RAG 重建接口 (仅记录日志)
+- Excel/文档上传时的 RAG 索引构建 (仅记录日志)
+
+========================
+后续补充 (2026-04-08):
+========================
+修改文件: backend/app/services/table_rag_service.py
+
+关键变更:
+- 在 TableRAGService.__init__ 中添加 self._disabled = True 标志
+- build_table_rag_index() - RAG 索引部分被跳过，仅记录日志
+- index_document_table() - RAG 索引部分被跳过，仅记录日志
+
+行为变更:
+- Excel 上传时，MySQL 存储仍然正常进行
+- AI 字段描述仍然正常生成（调用 LLM）
+- 只有向量数据库索引操作被跳过
+
+恢复方式:
+- 将 TableRAGService.__init__ 中的 self._disabled = True 改为 self._disabled = False
+- 或将 rag_service.py 中的 self._disabled = True 改为 self._disabled = False
+- 两者需同时改为 False 才能完全恢复 RAG 功能
--- a/比赛备赛规划.md
+++ b/比赛备赛规划.md
@@ -50,18 +50,18 @@
 | `prompt_service.py` | ✅ 已完成 | Prompt 模板管理 |
 | `text_analysis_service.py` | ✅ 已完成 | 文本分析 |
 | `chart_generator_service.py` | ✅ 已完成 | 图表生成服务 |
-| `template_fill_service.py` | ✅ 已完成 | 模板填写服务，支持直接读取源文档进行填表 |
+| `template_fill_service.py` | ✅ 已完成 | 模板填写服务，支持多行提取、直接从结构化数据提取、JSON容错、Word文档表格处理 |

 ### 2.2 API 接口 (`backend/app/api/endpoints/`)

 | 接口文件 | 路由 | 功能状态 |
 |----------|------|----------|
-| `upload.py` | `/api/v1/upload/excel` | ✅ Excel 文件上传与解析 |
+| `upload.py` | `/api/v1/upload/document` | ✅ 文档上传与解析 |
 | `documents.py` | `/api/v1/documents/*` | ✅ 文档管理（列表、删除、搜索） |
 | `ai_analyze.py` | `/api/v1/analyze/*` | ✅ AI 分析（Excel、Markdown、流式） |
 | `rag.py` | `/api/v1/rag/*` | ⚠️ RAG 检索（当前返回空） |
 | `tasks.py` | `/api/v1/tasks/*` | ✅ 异步任务状态查询 |
-| `templates.py` | `/api/v1/templates/*` | ✅ 模板管理 (含 Word 导出) |
+| `templates.py` | `/api/v1/templates/*` | ✅ 模板管理（含多行导出、Word导出、Word结构化字段解析） |
 | `visualization.py` | `/api/v1/visualization/*` | ✅ 可视化图表 |
 | `health.py` | `/api/v1/health` | ✅ 健康检查 |

@@ -70,71 +70,67 @@
 | 页面文件 | 功能 | 状态 |
 |----------|------|------|
 | `Documents.tsx` | 主文档管理页面 | ✅ 已完成 |
+| `TemplateFill.tsx` | 智能填表页面 | ✅ 已完成 |
 | `ExcelParse.tsx` | Excel 解析页面 | ✅ 已完成 |

 ### 2.4 文档解析能力

 | 格式 | 解析状态 | 说明 |
 |------|----------|------|
-| Excel (.xlsx/.xls) | ✅ 已完成 | pandas + XML 回退解析 |
+| Excel (.xlsx/.xls) | ✅ 已完成 | pandas + XML 回退解析，支持多sheet |
 | Markdown (.md) | ✅ 已完成 | 正则 + AI 分章节 |
 | Word (.docx) | ✅ 已完成 | python-docx 解析，支持表格提取和字段识别 |
 | Text (.txt) | ✅ 已完成 | chardet 编码检测，支持文本清洗和结构化提取 |

 ---

-## 三、待完成功能（核心缺块）
+## 三、核心功能实现详情

-### 3.1 模板填写模块（最优先）
-
-**当前状态**：✅ 已完成
+### 3.1 模板填写模块（✅ 已完成）

+**核心流程**：
 ```
-用户上传模板表格(Word/Excel)
+上传模板表格(Word/Excel)
         ↓
 解析模板，提取需要填写的字段和提示词
         ↓
-根据模板指定的源文档列表读取源数据
+根据源文档ID列表读取源数据（MongoDB或文件）
         ↓
-AI 根据字段提示词从源数据中提取信息
+优先从结构化数据直接提取（Excel rows）
         ↓
-将提取的数据填入模板对应位置
+无法直接提取时使用 LLM 从文本中提取
         ↓
-返回填写完成的表格
+将提取的数据填入原始模板对应位置（保持模板格式）
+         ↓
+导出填写完成的表格（Excel/Word）
 ```

-**已完成实现**：
- [x] `template_fill_service.py` - 模板填写核心服务
- [x] Word 模板解析 (`docx_parser.py` - parse_tables_for_template, extract_template_fields_from_docx)
- [x] Text 模板解析 (`txt_parser.py` - 已完成)
- [x] 模板字段识别与提示词提取
- [x] 多文档数据聚合与冲突处理
- [x] 结果导出为 Word/Excel
+**关键特性**：
+- **原始模板填充**：直接打开原始模板文件，填充数据到原表格/单元格
+- **多行数据支持**：每个字段可提取多个值，导出时自动扩展行数
+- **结构化数据优先**：直接从 Excel rows 提取，无需 LLM
+- **JSON 容错**：支持 LLM 返回的损坏/截断 JSON
+- **Markdown 清理**：自动清理 LLM 返回的 markdown 格式

-### 3.2 Word 文档解析
-
-**当前状态**：✅ 已完成
+### 3.2 Word 文档解析（✅ 已完成）

 **已实现功能**：
- [x] `docx_parser.py` - Word 文档解析器
- [x] 提取段落文本
- [x] 提取表格内容
- [x] 提取关键信息（标题、列表等）
- [x] 表格模板字段提取 (`parse_tables_for_template`, `extract_template_fields_from_docx`)
- [x] 字段类型推断 (`_infer_field_type_from_hint`)
+- `docx_parser.py` - Word 文档解析器
+- 提取段落文本
+- 提取表格内容（支持比赛表格格式：字段名 | 提示词 | 填写值）
+- `parse_tables_for_template()` - 解析表格模板，提取字段
+- `extract_template_fields_from_docx()` - 提取模板字段定义
+- `_infer_field_type_from_hint()` - 从提示词推断字段类型
+- **API 端点**：`/api/v1/templates/parse-word-structure` - 上传 Word 文档，提取结构化字段并存入 MongoDB
+- **API 端点**：`/api/v1/templates/word-fields/{doc_id}` - 获取已存文档的模板字段信息

-### 3.3 Text 文档解析
-
-**当前状态**：✅ 已完成
+### 3.3 Text 文档解析（✅ 已完成）

 **已实现功能**：
- [x] `txt_parser.py` - 文本文件解析器
- [x] 编码自动检测 (chardet)
- [x] 文本清洗
-
-### 3.4 文档模板匹配（已有框架）
-
-根据 Q&A，模板已指定数据文件，不需要算法匹配。当前已有上传功能，需确认模板与数据文件的关联逻辑是否完善。
+- `txt_parser.py` - 文本文件解析器
+- 编码自动检测 (chardet)
+- 文本清洗（去除控制字符、规范化空白）
+- 结构化数据提取（邮箱、URL、电话、日期、金额）

 ---

@@ -192,20 +188,20 @@ docs/test/

 ## 六、工作计划（建议）

-### 第一优先级：模板填写核心功能
- 完成 Word 文档解析
- 完成模板填写服务
- 端到端测试验证
+### 第一优先级：端到端测试
+- 使用真实测试数据进行准确率测试
+- 验证多行数据导出是否正确
+- 测试 Word 模板解析是否正常

 ### 第二优先级：Demo 打包与文档
 - 制作项目演示 PPT
 - 录制演示视频
 - 完善 README 部署文档

-### 第三优先级：测试优化
- 使用真实测试数据进行准确率测试
+### 第三优先级：优化
 - 优化响应时间
 - 完善错误处理
+- 增加更多测试用例

 ---

@@ -215,29 +211,32 @@ docs/test/
 2. **数据库**：不强制要求数据库存储，可跳过
 3. **部署**：本地部署即可，不需要公网服务器
 4. **评测数据**：初赛仅使用目前提供的数据
-5. **RAG 功能**：当前已临时禁用，不影响核心评测功能
+5. **RAG 功能**：当前已临时禁用，不影响核心评测功能（因为使用直接文件读取）

 ---

-*文档版本: v1.1*
-*最后更新: 2026-04-08*
+*文档版本: v1.5*
+*最后更新: 2026-04-09*

 ---

 ## 八、技术实现细节

-### 8.1 模板填表流程（已实现）
+### 8.1 模板填表流程

 #### 流程图
 ```
 ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
-│ 上传模板    │ ──► │ 选择数据源   │ ──► │ AI 智能填表  │
+│ 上传模板    │ ──► │ 选择数据源   │ ──► │ 智能填表    │
 └─────────────┘     └─────────────┘     └─────────────┘
-                                               │
-                                               ▼
-                                        ┌─────────────┐
-                                        │ 导出结果    │
-                                        └─────────────┘
+                                              │
+                    ┌─────────────────────────┼─────────────────────────┐
+                    │                         │                         │
+                    ▼                         ▼                         ▼
+           ┌───────────────┐         ┌───────────────┐         ┌───────────────┐
+           │ 结构化数据提取 │         │   LLM 提取    │         │   导出结果    │
+           │ (直接读rows)  │         │ (文本理解)    │         │ (Excel/Word) │
+           └───────────────┘         └───────────────┘         └───────────────┘
 ```

 #### 核心组件
@@ -247,8 +246,10 @@ docs/test/
 | 模板上传 | `templates.py` `/templates/upload` | 接收模板文件，提取字段 |
 | 字段提取 | `template_fill_service.py` | 从 Word/Excel 表格提取字段定义 |
 | 文档解析 | `docx_parser.py`, `xlsx_parser.py`, `txt_parser.py` | 解析源文档内容 |
-| 智能填表 | `template_fill_service.py` `fill_template()` | 使用 LLM 从源文档提取信息 |
-| 结果导出 | `templates.py` `/templates/export` | 导出为 Excel 或 Word |
+| 智能填表 | `template_fill_service.py` `fill_template()` | 结构化提取 + LLM 提取 |
+| 多行支持 | `template_fill_service.py` `FillResult` | values 数组支持 |
+| JSON 容错 | `template_fill_service.py` `_fix_json()` | 修复损坏的 JSON |
+| 结果导出 | `templates.py` `/templates/export` | 多行 Excel + Word 导出 |

 ### 8.2 源文档加载方式

@@ -268,7 +269,9 @@ docs/test/

 ```python
 # 提取表格模板字段
-fields = docx_parser.extract_template_fields_from_docx(file_path)
+from docx_parser import DocxParser
+parser = DocxParser()
+fields = parser.extract_template_fields_from_docx(file_path)

 # 返回格式
 # [
@@ -295,6 +298,24 @@ fields = docx_parser.extract_template_fields_from_docx(file_path)

 ### 8.5 API 接口

+#### POST `/api/v1/templates/upload`
+
+上传模板文件，提取字段定义。
+
+**响应**：
+```json
+{
+  "success": true,
+  "template_id": "/path/to/saved/template.docx",
+  "filename": "模板.docx",
+  "file_type": "docx",
+  "fields": [
+    {"cell": "A1", "name": "姓名", "field_type": "text", "required": true, "hint": "提取人员姓名"}
+  ],
+  "field_count": 1
+}
+```
+
 #### POST `/api/v1/templates/fill`

 填写请求：
@@ -306,35 +327,232 @@ fields = docx_parser.extract_template_fields_from_docx(file_path)
  ],
  "source_doc_ids": ["mongodb_doc_id_1", "mongodb_doc_id_2"],
  "source_file_paths": [],
-  "user_hint": "请从合同文档中提取"
+  "user_hint": "请从xxx文档中提取"
 }
 ```

-响应：
+**响应（含多行支持）**：
 ```json
 {
  "success": true,
-  "filled_data": {"姓名": "张三"},
+  "filled_data": {
+    "姓名": ["张三", "李四", "王五"],
+    "年龄": ["25", "30", "28"]
+  },
  "fill_details": [
    {
      "field": "姓名",
      "cell": "A1",
+      "values": ["张三", "李四", "王五"],
      "value": "张三",
-      "source": "来自：合同文档.docx",
-      "confidence": 0.95
+      "source": "结构化数据直接提取",
+      "confidence": 1.0
    }
  ],
-  "source_doc_count": 2
+  "source_doc_count": 2,
+  "max_rows": 3
 }
 ```

 #### POST `/api/v1/templates/export`

-导出请求：
+导出请求（创建新文件）：
 ```json
 {
  "template_id": "模板ID",
-  "filled_data": {"姓名": "张三", "金额": "10000"},
-  "format": "xlsx"  // 或 "docx"
+  "filled_data": {"姓名": ["张三", "李四"], "金额": ["10000", "20000"]},
+  "format": "xlsx"
 }
-```
+```
+
+#### POST `/api/v1/templates/fill-and-export`
+
+**填充原始模板并导出**（推荐用于比赛）
+
+直接打开原始模板文件，将数据填入模板的表格/单元格中，然后导出。**保持原始模板格式不变**。
+
+**请求**：
+```json
+{
+  "template_path": "/path/to/original/template.docx",
+  "filled_data": {
+    "姓名": ["张三", "李四", "王五"],
+    "年龄": ["25", "30", "28"]
+  },
+  "format": "docx"
+}
+```
+
+**响应**：填充后的 Word/Excel 文件（文件流）
+
+**特点**：
+- 打开原始模板文件
+- 根据表头行匹配字段名到列索引
+- 将数据填入对应列的单元格
+- 多行数据自动扩展表格行数
+- 保持原始模板格式和样式
+
+#### POST `/api/v1/templates/parse-word-structure`
+
+**上传 Word 文档并提取结构化字段**（比赛专用）
+
+上传 Word 文档，从表格模板中提取字段定义（字段名、提示词、字段类型）并存入 MongoDB。
+
+**请求**：multipart/form-data
+- file: Word 文件
+
+**响应**：
+```json
+{
+  "success": true,
+  "doc_id": "mongodb_doc_id",
+  "filename": "模板.docx",
+  "file_path": "/path/to/saved/template.docx",
+  "field_count": 5,
+  "fields": [
+    {
+      "cell": "T0R1",
+      "name": "字段名",
+      "hint": "提示词",
+      "field_type": "text",
+      "required": true
+    }
+  ],
+  "tables": [...],
+  "metadata": {
+    "paragraph_count": 10,
+    "table_count": 1,
+    "word_count": 500,
+    "has_tables": true
+  }
+}
+```
+
+#### GET `/api/v1/templates/word-fields/{doc_id}`
+
+**获取 Word 文档模板字段信息**
+
+根据 doc_id 获取已上传的 Word 文档的模板字段信息。
+
+**响应**：
+```json
+{
+  "success": true,
+  "doc_id": "mongodb_doc_id",
+  "filename": "模板.docx",
+  "fields": [...],
+  "tables": [...],
+  "field_count": 5,
+  "metadata": {...}
+}
+```
+
+### 8.6 多行数据处理
+
+**FillResult 数据结构**：
+```python
+@dataclass
+class FillResult:
+    field: str
+    values: List[Any] = None  # 支持多个值（数组）
+    value: Any = ""           # 保留兼容（第一个值）
+    source: str = ""          # 来源文档
+    confidence: float = 1.0   # 置信度
+```
+
+**导出逻辑**：
+- 计算所有字段的最大行数
+- 遍历每一行，取对应索引的值
+- 不足的行填空字符串
+
+### 8.7 JSON 容错处理
+
+当 LLM 返回的 JSON 损坏或被截断时，系统会：
+
+1. 清理 markdown 代码块标记（```json, ```）
+2. 尝试配对括号找到完整的 JSON
+3. 移除末尾多余的逗号
+4. 使用正则表达式提取 values 数组
+5. 备选方案：直接提取所有引号内的字符串
+
+### 8.8 结构化数据优先提取
+
+对于 Excel 等有 `rows` 结构的文档，系统会：
+
+1. 直接从 `structured_data.rows` 中查找匹配列
+2. 使用模糊匹配（字段名包含或被包含）
+3. 提取该列的所有行值
+4. 无需调用 LLM，速度更快，准确率更高
+
+```python
+# 内部逻辑
+if structured.get("rows"):
+    columns = structured.get("columns", [])
+    values = _extract_column_values(rows, columns, field_name)
+```
+
+---
+
+## 九、依赖说明
+
+### Python 依赖
+
+```
+# requirements.txt 中需要包含
+fastapi>=0.104.0
+uvicorn>=0.24.0
+motor>=3.3.0          # MongoDB 异步驱动
+sqlalchemy>=2.0.0     # MySQL ORM
+pandas>=2.0.0         # Excel 处理
+openpyxl>=3.1.0       # Excel 写入
+python-docx>=0.8.0    # Word 处理
+chardet>=4.0.0        # 编码检测
+httpx>=0.25.0         # HTTP 客户端
+```
+
+### 前端依赖
+
+```
+# package.json 中需要包含
+react>=18.0.0
+react-dropzone>=14.0.0
+lucide-react>=0.300.0
+sonner>=1.0.0         # toast 通知
+```
+
+---
+
+## 十、启动说明
+
+### 后端启动
+
+```bash
+cd backend
+.\venv\Scripts\Activate.ps1  # 或 Activate.bat
+pip install -r requirements.txt  # 确保依赖完整
+.\venv\Scripts\python.exe -m uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload
+```
+
+### 前端启动
+
+```bash
+cd frontend
+npm install
+npm run dev
+```
+
+### 环境变量
+
+在 `backend/.env` 中配置：
+```
+MONGODB_URL=mongodb://localhost:27017
+MONGODB_DB_NAME=document_system
+MYSQL_HOST=localhost
+MYSQL_PORT=3306
+MYSQL_USER=root
+MYSQL_PASSWORD=your_password
+MYSQL_DATABASE=document_system
+LLM_API_KEY=your_api_key
+LLM_BASE_URL=https://api.minimax.chat
+LLM_MODEL_NAME=MiniMax-Text-01
+```
Author	SHA1	Message	Date
dj	ecad9ccd82	feat: 实现智能指令的格式转换和文档编辑功能主要更新： - 新增 transform 意图：支持 Word/Excel/Markdown 格式互转 - 新增 edit 意图：使用 LLM 润色编辑文档内容 - 智能指令接口增加异步执行模式（async_execute 参数） - 修复 Word 模板导出文档损坏问题（改用临时文件方式） - 优化 intent_parser 增加 transform/edit 关键词识别新增文件： - app/api/endpoints/instruction.py: 智能指令 API 端点 - app/services/multi_doc_reasoning_service.py: 多文档推理服务其他优化： - RAG 服务混合搜索（BM25 + 向量）融合 - 模板填充服务表头匹配增强 - Word AI 解析服务返回结构完善 - 前端 InstructionChat 组件对接真实 API	2026-04-14 20:39:37 +08:00
dj	51350e3002	123	2026-04-14 17:35:40 +08:00
dj	8e713be1ca	Merge remote changes with RAG service optimization - Keep user's RAG service integration for faster extraction - Add remote's word_ai_service support - Preserve user's parallel extraction and field header optimizations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-14 17:25:13 +08:00
zzz	f2af27245d	增强 Word 文档 AI 解析和模板填充功能	2026-04-14 17:16:38 +08:00
dj	a9dc0d8b91	优化智能填表功能：提升速度、完善数据提取精度后端优化 (template_fill_service.py): 1. 速度优化: - 使用 asyncio.gather 实现字段并行提取 - 跳过 AI 审核步骤，减少 LLM 调用次数 - 新增 _extract_single_field_fast 方法 2. 数据提取优化: - 集成 RAG 服务进行智能内容检索 - 修复 Markdown 表格列匹配跳过空列 - 修复年份子表头行误识别问题 3. AI 表头生成优化: - 精简为 5-7 个代表性字段（原来 8-15 个） - 过滤非数据字段（source、备注、说明等） - 简化字段名，如"医院数量"而非"医院-公立医院数量" 4. AI 数据提取 prompt 优化: - 严格按表头提取，只返回相关数据 - 每个值必须带标注（年份/地区/分类） - 支持多种标注类型：2024年、北京、某省、公立医院、三级医院等 - 保留原始数值、单位和百分号格式 - 不返回大段来源说明 5. FillResult 新增 warning 字段: - 多值检测提示，如"检测到 2 个值" 前端优化 (TemplateFill.tsx): - 填写详情显示多值警告（黄色提示框） - 多值情况下直接显示所有值 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-14 17:14:59 +08:00
tl	902c28166b	tl	2026-04-14 15:18:50 +08:00
tl	4a53be7eeb	TL	2026-04-14 14:58:14 +08:00
tl	8b5b24fa2a	Merge branch 'main' of https://gitea.kronecker.cc/OurCodesAreAllRight/FilesReadSystem	2026-04-14 14:57:53 +08:00
tl	ed66aa346d	tl	2026-04-10 10:24:52 +08:00
zzz	5b82d40be0	Merge branch 'main' of https://gitea.kronecker.cc/OurCodesAreAllRight/FilesReadSystem	2026-04-10 10:10:41 +08:00
zzz	bedf1af9c0	增强 Word 文档 AI 解析和模板填充功能	2026-04-10 09:48:57 +08:00
KiriAky 107	5fca4eb094	添加临时文件清理异常处理和修改大纲接口为POST方法 - 在analyze_markdown、analyze_markdown_stream和get_markdown_outline函数中添加了 try-catch块来处理临时文件清理过程中的异常 - 将/analyze/md/outline接口从GET方法改为POST方法以支持文件上传 - 确保在所有情况下都能正确清理临时文件，并记录清理失败的日志 refactor(health): 改进健康检查逻辑验证实际数据库连接 - 修改MySQL健康检查，实际执行SELECT 1查询来验证连接 - 修改MongoDB健康检查，执行ping命令来验证连接 - 修改Redis健康检查，执行ping命令来验证连接 - 添加异常捕获并记录具体的错误日志 refactor(upload): 使用os.path.basename优化文件名提取 - 替换手动字符串分割为os.path.basename来获取文件名 - 统一Excel上传和导出中文件名的处理方式 feat(instruction): 新增指令执行框架模块 - 创建instruction包包含意图解析和指令执行的基础架构 - 添加IntentParser和InstructionExecutor抽象基类 - 提供默认实现但标记为未完成，为未来功能扩展做准备 refactor(frontend): 调整AuthContext导入路径并移除重复文件 - 将AuthContext从src/context移动到src/contexts目录 - 更新App.tsx和RouteGuard.tsx中的导入路径 - 移除旧的AuthContext.tsx文件 fix(backend-api): 修复AI分析API的HTTP方法错误 - 将aiApi中的fetch请求方法从GET改为POST以支持文件上传	2026-04-10 01:51:53 +08:00
KiriAky 107	0dbf74db9d	添加任务ID跟踪功能到模板填充接口 - 在FillRequest中添加可选的task_id字段，用于任务历史跟踪 - 实现任务状态管理，包括创建、更新和错误处理 - 集成MongoDB任务记录功能，在处理过程中更新进度 - 添加任务进度更新逻辑，支持开始、处理中、成功和失败状态 - 修改模板填充服务以接收并传递task_id参数	2026-04-10 01:27:26 +08:00
KiriAky 107	858b594171	添加任务状态双写机制和历史记录功能 - 实现任务状态同时写入Redis和MongoDB的双写机制 - 添加MongoDB任务集合及CRUD操作接口 - 新增任务历史记录查询、列表展示和删除功能 - 重构任务状态更新逻辑，统一使用update_task_status函数 - 添加模板填服务中AI审核字段值的功能 - 优化前端任务历史页面显示和交互体验	2026-04-10 01:15:53 +08:00
KiriAky 107	ed0f51f2a4	Merge branch 'main' of https://gitea.kronecker.cc/OurCodesAreAllRight/FilesReadSystem	2026-04-10 00:26:57 +08:00
KiriAky 107	ecc0c79475	增强模板填写服务支持表格内容摘要和表头重生成 - 在源文档解析过程中增加表格内容摘要功能，提取表格结构用于AI理解 - 新增表格摘要逻辑，包括表头和前3行数据的提取和格式化 - 添加模板文件类型识别，支持xlsx和docx格式判断 - 实现基于源文档内容的表头自动重生成功能 - 当检测到自动生成的表头时，使用源文档内容重新生成更准确的字段 - 增加详细的调试日志用于跟踪表格处理过程	2026-04-10 00:26:54 +08:00
dj	6befc510d8	刷新的debug	2026-04-10 00:23:23 +08:00
dj	8f66c235fa	实现并行多文件上传的功能并且在列表显示上传了哪些文件，支持多次上传	2026-04-10 00:16:28 +08:00
KiriAky 107	886d5ae0cc	Merge branch 'main' of https://gitea.kronecker.cc/OurCodesAreAllRight/FilesReadSystem	2026-04-09 22:44:01 +08:00
KiriAky 107	6752c5c231	优化联合模板上传逻辑支持源文档内容解析 - 移除模板文件字段提取步骤，改为直接保存模板文件 - 新增源文档解析功能，提取文档内容、标题和表格数量信息 - 修改模板填充服务，支持传入源文档内容用于AI表头生成 - 更新AI表头生成逻辑，基于源文档内容智能生成合适的表头字段 - 增强日志记录，显示源文档数量和处理进度	2026-04-09 22:43:51 +08:00
dj	610d475ce0	新增从文档中心选择源文档功能及删除功能智能填表模块新增"从文档中心选择"模式，支持选择已上传的文档作为数据源，同时支持从列表中删除文档。两种模式通过Tab切换。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 22:35:13 +08:00
dj	496b96508d	修复Excel解析和智能填表功能 - 增强Excel解析器支持多种命名空间和路径格式，解决英文表头Excel无法读取问题 - 当MongoDB中structured_data为空时，尝试用file_path重新解析文件 - 改进AI分析提示词，明确要求返回纯数值不要单位 - 修复max_tokens值(5000→4000)避免DeepSeek API报错 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 22:21:51 +08:00
dj	07ebdc09bc	Merge branch 'main' of https://gitea.kronecker.cc/OurCodesAreAllRight/FilesReadSystem	2026-04-09 22:18:12 +08:00
KiriAky 107	7f67fa89de	添加AI生成表头功能并重构前端状态管理 - 后端：实现AI生成表头逻辑，当模板为空或字段为自动生成时调用AI分析并生成合适字段 - 后端：添加_is_auto_generated_field方法识别自动生成的无效表头字段 - 后端：修改_get_template_fields_from_excel方法支持文件类型参数 - 前端：创建TemplateFillContext提供全局状态管理 - 前端：将TemplateFill页面状态迁移到Context中统一管理 - 前端：移除页面内重复的状态定义和方法实现	2026-04-09 22:15:37 +08:00
dj	c1886fb68f	Merge branch 'main' of https://gitea.kronecker.cc/OurCodesAreAllRight/FilesReadSystem	2026-04-09 21:42:14 +08:00
dj	78417c898a	改进智能填表功能：支持Markdown表格提取和修复LLM调用 - 新增对MongoDB存储的tables格式支持，直接从structured_data.tables提取数据 - 修复max_tokens值过大问题(50000→4000)，解决DeepSeek API限制 - 增强列名匹配算法，支持模糊匹配 - 添加详细日志便于调试结构化数据提取过程 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 21:42:07 +08:00
KiriAky 107	d5df5b8283	增强模板填充服务支持非结构化文档AI分析 - 引入markdown_ai_service服务支持Markdown文档处理 - 实现_nonstructured_docs_for_fields方法对非结构化文档进行AI分析 - 优化LLM提示词，改进数据提取的准确性和格式规范 - 支持从Markdown表格格式{tables: [{headers: [...], rows: [...]}]}中提取数据 - 添加文档章节结构解析，提升上下文理解能力 - 增加JSON响应格式修复功能，提高数据解析成功率	2026-04-09 21:00:31 +08:00
dj	718f864926	修改读取excel表时存在数字时浮点匹配生成不一致问题	2026-04-09 20:56:38 +08:00
KiriAky 107	e5711b3f05	新增联合上传模板和源文档功能新增 upload-joint 接口支持模板文件和源文档的一键式联合上传处理，包括异步文档解析和MongoDB存储功能；前端新增对应API调用方法和UI界面，优化表格填写流程，支持拖拽上传和实时预览功能。	2026-04-09 20:35:41 +08:00