Compare commits

..

12 Commits

Author SHA1 Message Date
8f6d8a43d3 feat(layout): 更新应用名称为表易智融
- 将MainLayout组件中的应用名称从"智联文档"更新为"表易智融"
- 更新侧边栏标题显示新的应用名称
- 统一所有相关界面的文字标识

fix(assistant): 同步更新AI助手介绍文本

- 更新Assistant页面中AI助手的自我介绍内容
- 将欢迎消息中的应用名称替换为"表易智融"

docs(dashboard): 更新欢迎页面应用名称

- 修改Dashboard页面欢迎消息中的应用名称展示
- 确保用户界面的一致性体验

refactor(instruction-chat): 更新指令聊天页面助手名称

- 同步更新InstructionChat页面中AI助手的介绍文本
- 保持整个应用中品牌名称的统一性
2026-05-05 15:02:45 +08:00
6ec45b73ad 更新LLM配置并改进文件路径管理
- 将LLM服务从智谱AI切换到DeepSeek
- 更新API密钥、基础URL和模型名称配置
- 改进文件路径配置说明,添加本地开发和Docker部署的路径差异说明
- 修复日志目录路径使用settings.BASE_DIR确保跨平台一致性
2026-04-21 21:20:14 +08:00
73f1c2804f 更新项目标题为智联文档
- 将项目标题从 "FilesReadSystem" 更改为 "智联文档"
- 保持了原有的项目介绍部分结构
2026-04-21 20:48:18 +08:00
74d40f91c5 添加项目架构图和程序流程图
- 添加了使用Mermaid语法的项目架构图,展示前端、后端和数据层的组件关系
- 添加了程序流程图,详细描述文档上传、解析、存储、向量化和异步处理的完整流程
- 使用中文和英文对照的方式呈现图表内容,便于理解系统整体设计
2026-04-21 20:47:28 +08:00
d2e3c2db3e 添加 Docker 部署支持和环境变量配置
添加了完整的 Docker 部署方案,包括:
- 创建 .env.example 环境变量配置模板文件
- 新增 docker-compose.yml 用于全栈服务编排
- 为前后端分别创建 Dockerfile 实现容器化部署
- 添加 nginx.conf 配置前端反向代理
- 在 README.md 中详细说明 Docker 部署流程
- 集成 Celery 任务队列支持异步处理
- 配置多数据库服务 (MongoDB, MySQL, Redis) 的连接
- 实现健康检查和服务依赖管理
2026-04-21 20:39:12 +08:00
dj
be302839ee feat: 添加文档转PDF转换功能
- 后端添加 PDF 转换服务,支持 Word(docx)、Excel(xlsx)、文本(txt)、Markdown(md) 格式转换为 PDF
- 使用 reportlab 库,支持中文字体(simhei.ttf)
- 添加 FastAPI 接口:POST /api/v1/pdf/convert 单文件转换,POST /api/v1/pdf/convert/batch 批量转换
- 前端添加 PdfConverter 页面,支持拖拽上传、转换进度显示、批量下载
- 转换流程:所有格式先转为 Markdown,再通过 Markdown 转 PDF,保证输出一致性
- DOCX 解析使用 zipfile 直接读取 XML,避免 python-docx 的兼容性问题的
2026-04-20 00:00:30 +08:00
dj
581e2b0ae0 添加系统架构图 2026-04-16 23:11:44 +08:00
dj
975ebf536b 添加系统架构图 2026-04-16 23:08:21 +08:00
dj
38b0c7e62e Merge branch 'main' of https://gitea.kronecker.cc/OurCodesAreAllRight/FilesReadSystem 2026-04-16 20:00:51 +08:00
dj
8e46e635f1 rag日志改为info级 2026-04-16 19:59:56 +08:00
c2f50d3bd8 支持从数据库读取文档进行AI分析
新增 doc_id 参数支持从数据库读取文档内容,同时保留文件上传功能,
实现两种方式的灵活切换。修改了 Markdown、TXT 和 Word 文档的分析接口,
添加从数据库获取文档的逻辑,并相应更新前端 API 调用。

BREAKING CHANGE: 分析接口现在支持文件上传和数据库文档 ID 两种方式
2026-04-16 19:43:43 +08:00
2adf9aef60 添加 TXT 和 Word 文件 AI 分析功能支持图表生成
- 新增 txt_ai_service 服务,支持 TXT 文件的结构化数据提取和图表生成
- 为 Word 分析添加图表生成功能,扩展 word_ai_service.generate_charts 方法
- 在前端添加 TXT 和 Word AI 分析界面,支持 structured 和 charts 两种分析模式
- 更新后端 API 接口,添加 analysis_type 参数控制分析类型
- 优化分析结果显示逻辑,区分结构化数据和图表结果展示
2026-04-16 10:02:18 +08:00
35 changed files with 3930 additions and 244 deletions

View File

@@ -0,0 +1,7 @@
{
"permissions": {
"allow": [
"WebSearch"
]
}
}

35
.env.example Normal file
View File

@@ -0,0 +1,35 @@
# ============================================================
# FilesReadSystem 环境变量配置模板
# 复制此文件为 .env 并填入实际值
# ============================================================
# ==================== 应用配置 ====================
DEBUG=false
# ==================== MongoDB ====================
MONGO_ROOT_USER=admin
MONGO_ROOT_PASSWORD=your_mongo_password
MONGODB_DB_NAME=document_system
# ==================== MySQL ====================
MYSQL_PASSWORD=your_mysql_password
MYSQL_DATABASE=document
# ==================== Redis ====================
REDIS_PASSWORD=your_redis_password
# ==================== LLM AI ====================
LLM_API_KEY=your_llm_api_key
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL_NAME=deepseek-chat
# ==================== Supabase ====================
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_KEY=your_service_key
# ==================== Embedding / RAG ====================
EMBEDDING_MODEL=all-MiniLM-L6-v2
# ==================== 前端配置 ====================
VITE_APP_ID=your_app_id

175
README.md
View File

@@ -1,4 +1,4 @@
# FilesReadSystem # 智联文档
## 项目介绍 / Project Introduction ## 项目介绍 / Project Introduction
@@ -26,37 +26,79 @@ A document understanding and multi-source data fusion system based on Large Lang
## 项目架构 / Project Architecture ## 项目架构 / Project Architecture
```mermaid
flowchart TB
subgraph UI["用户界面 / User Interface"]
Frontend["React + TypeScript + shadcn/ui"]
end
subgraph Backend["FastAPI 后端 / Backend"]
Upload["上传 API<br/>/upload"]
Documents["文档管理<br/>/documents"]
RAG["RAG 检索<br/>/rag/search"]
AI["AI 分析<br/>/ai/analyze"]
Template["模板填充<br/>/templates/fill"]
Instruction["自然语言指令<br/>/instruction/execute"]
Visual["可视化<br/>/visualization"]
end
subgraph Data["数据层 / Data Layer"]
MongoDB["MongoDB<br/>文档存储"]
MySQL["MySQL<br/>结构化数据"]
Redis["Redis<br/>缓存/队列"]
FAISS["FAISS<br/>向量索引"]
end
UI --> Backend
Backend --> MongoDB
Backend --> MySQL
Backend --> Redis
MongoDB --> FAISS
``` ```
┌─────────────────────────────────────────────────────────────────┐
│ User Interface │ ---
│ (React + TypeScript + shadcn/ui) │
└─────────────────────────────────────────────────────────────────┘ ## 程序流程 / Program Flow
```mermaid
┌─────────────────────────────────────────────────────────────────┐ flowchart TD
FastAPI Backend │ Start([用户上传文档<br/>User Uploads Document]) --> Parse{解析文档格式<br/>Parse Document Format}
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │
│ Upload API │ │ RAG Search │ │ Natural Language │ │ Parse -->|Excel| ParseXlsx["解析 Excel<br/>Parse XLSX"]
│ /documents │ │ /rag/search │ │ /instruction/execute │ │ Parse -->|Word| ParseDocx["解析 Word<br/>Parse DOCX"]
└─────────────┘ └──────────────┘ └─────────────────────────┘ │ Parse -->|Markdown| ParseMd["解析 Markdown<br/>Parse Markdown"]
┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │ Parse -->|Text| ParseTxt["解析文本<br/>Parse Text"]
│ │ AI Analyze │ │ Template Fill│ │ Visualization │ │
│ /ai/analyze │ │ /templates │ │ /visualization │ │ ParseXlsx --> Store1[(存储到<br/>MongoDB)]
└─────────────┘ └──────────────┘ └─────────────────────────┘ │ ParseDocx --> Store1
└─────────────────────────────────────────────────────────────────┘ ParseMd --> Store1
ParseTxt --> Store1
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼ Store1 --> Embed["Embedding 向量化<br/>Create Embeddings"]
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ Embed --> Index[(索引到<br/>FAISS)]
│ MongoDB │ │ MySQL │ │ Redis │
(Documents) │ │ (Structured) │ │ (Cache/Queue) │ Index --> TaskCreated{创建任务<br/>Create Task}
└─────────────────┘ └─────────────────┘ └─────────────────┘
TaskCreated -->|同步| ProcessSync["同步处理<br/>Sync Process"]
TaskCreated -->|异步| QueueTask["加入任务队列<br/>Queue to Celery"]
┌─────────────────┐
FAISS │ ProcessSync --> ReturnResult["返回结果<br/>Return Result"]
│ (Vector Index) │
└─────────────────┘ QueueTask --> CeleryWorker["Celery Worker<br/>异步处理"]
CeleryWorker --> LLM["调用 LLM<br/>Call LLM API"]
LLM --> StoreResult["存储结果<br/>Store Result"]
StoreResult --> ReturnAsync["返回任务ID<br/>Return Task ID"]
ReturnResult --> End([完成<br/>Complete])
ReturnAsync --> Poll{轮询任务状态<br/>Poll Task Status}
Poll -->|进行中| Poll
Poll -->|完成| GetResult["获取结果<br/>Get Result"]
GetResult --> End
style Start fill:#e1f5fe
style End fill:#c8e6c9
style LLM fill:#fff3e0
style CeleryWorker fill:#fff3e0
``` ```
--- ---
@@ -233,6 +275,77 @@ pnpm dev
--- ---
## Docker 部署 / Docker Deployment
### 快速启动 / Quick Start
```bash
# 1. 复制环境变量模板并编辑
cp .env.example .env
# 编辑 .env 填入实际配置
# 2. 启动所有服务
docker compose up -d
# 3. 查看日志
docker compose logs -f
# 4. 检查服务状态
docker compose ps
# 5. 更新部署
docker compose up -d --build
```
### 服务说明 / Services
| 服务 | 端口 | 说明 |
|:---|:---|:---|
| frontend | 80 | React 前端 (Nginx) |
| backend | 8000 | FastAPI 后端 |
| mongodb | 27017 | MongoDB 数据库 |
| mysql | 3306 | MySQL 数据库 |
| redis | 6379 | Redis 缓存/队列 |
### 环境变量 / Environment Variables
创建 `.env` 文件,参考 `.env.example`:
```bash
# 数据库配置
MONGO_ROOT_USER=admin
MONGO_ROOT_PASSWORD=your_password
MONGODB_DB_NAME=document_system
MYSQL_PASSWORD=your_password
MYSQL_DATABASE=document
REDIS_PASSWORD=your_password
# LLM 配置
LLM_API_KEY=your_api_key
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL_NAME=deepseek-chat
# Supabase 配置
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_KEY=your_service_key
```
### 验证部署 / Verify Deployment
```bash
# 检查所有服务状态
docker compose ps
# 访问前端
curl http://localhost
# 检查后端健康
curl http://localhost:8000/health
```
---
## 许可证 / License ## 许可证 / License
ISC ISC

View File

@@ -34,9 +34,9 @@ REDIS_URL="redis://localhost:6379/0"
# - 模型: glm-4-flash (快速文本模型), glm-4 (标准), glm-4-plus (高性能) # - 模型: glm-4-flash (快速文本模型), glm-4 (标准), glm-4-plus (高性能)
# - API: https://open.bigmodel.cn # - API: https://open.bigmodel.cn
# - API Key: https://open.bigmodel.cn/usercenter/apikeys # - API Key: https://open.bigmodel.cn/usercenter/apikeys
LLM_API_KEY="ca79ad9f96524cd5afc3e43ca97f347d.cpiLLx2oyitGvTeU" LLM_API_KEY="your_llm_api_key_here"
LLM_BASE_URL="https://open.bigmodel.cn/api/paas/v4" LLM_BASE_URL="https://api.deepseek.com"
LLM_MODEL_NAME="glm-4v-plus" LLM_MODEL_NAME="deepseek-chat"
# ==================== Supabase 配置 ==================== # ==================== Supabase 配置 ====================
# Supabase 项目配置 # Supabase 项目配置
@@ -45,10 +45,14 @@ SUPABASE_ANON_KEY="your_supabase_anon_key_here"
SUPABASE_SERVICE_KEY="your_supabase_service_key_here" SUPABASE_SERVICE_KEY="your_supabase_service_key_here"
# ==================== 文件路径配置 ==================== # ==================== 文件路径配置 ====================
# 上传文件存储目录 (相对于项目根目录) # 上传文件存储目录
# 本地开发: ./data/uploads
# Docker部署: /app/data/uploads
UPLOAD_DIR="./data/uploads" UPLOAD_DIR="./data/uploads"
# Faiss 向量数据库持久化目录 (LangChain + Faiss 实现) # Faiss 向量数据库持久化目录
# 本地开发: ./data/faiss
# Docker部署: /app/data/faiss
FAISS_INDEX_DIR="./data/faiss" FAISS_INDEX_DIR="./data/faiss"
# ==================== RAG 配置 ==================== # ==================== RAG 配置 ====================

7
backend/=4.0.0 Normal file
View File

@@ -0,0 +1,7 @@
Collecting reportlab
Using cached reportlab-4.4.10-py3-none-any.whl.metadata (1.7 kB)
Requirement already satisfied: pillow>=9.0.0 in d:\code\filesreadsystem\backend\venv\lib\site-packages (from reportlab) (12.1.1)
Requirement already satisfied: charset-normalizer in d:\code\filesreadsystem\backend\venv\lib\site-packages (from reportlab) (3.4.6)
Using cached reportlab-4.4.10-py3-none-any.whl (2.0 MB)
Installing collected packages: reportlab
Successfully installed reportlab-4.4.10

40
backend/Dockerfile Normal file
View File

@@ -0,0 +1,40 @@
# ============================================================
# FilesReadSystem Backend Docker Image
# ============================================================
FROM python:3.12-slim
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# 安装系统依赖 (FAISS, Pillow, tesseract 等)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
g++ \
libgl1-mesa-glx \
libglib2.0-0 \
tesseract-ocr \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# 先复制依赖文件,再安装(利用 Docker 缓存)
COPY requirements.txt .
# 安装 Python 依赖
RUN pip install --no-cache-dir -r requirements.txt
# 复制应用代码
COPY app/ ./app/
# 创建数据目录
RUN mkdir -p /app/data/uploads /app/data/faiss /app/data/logs
# 暴露端口
EXPOSE 8000
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD python -c "import httpx; httpx.get('http://localhost:8000/health')" || exit 1
# 启动命令
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

View File

@@ -15,6 +15,7 @@ from app.api.endpoints import (
health, health,
instruction, # 智能指令 instruction, # 智能指令
conversation, # 对话历史 conversation, # 对话历史
pdf_converter, # PDF转换
) )
# 创建主路由 # 创建主路由
@@ -33,3 +34,4 @@ api_router.include_router(visualization.router) # 可视化
api_router.include_router(analysis_charts.router) # 分析图表 api_router.include_router(analysis_charts.router) # 分析图表
api_router.include_router(instruction.router) # 智能指令 api_router.include_router(instruction.router) # 智能指令
api_router.include_router(conversation.router) # 对话历史 api_router.include_router(conversation.router) # 对话历史
api_router.include_router(pdf_converter.router) # PDF转换

View File

@@ -1,7 +1,7 @@
""" """
AI 分析 API 接口 AI 分析 API 接口
""" """
from fastapi import APIRouter, UploadFile, File, HTTPException, Query, Body from fastapi import APIRouter, UploadFile, File, HTTPException, Query, Body, Form
from fastapi.responses import StreamingResponse from fastapi.responses import StreamingResponse
from typing import Optional from typing import Optional
import logging import logging
@@ -12,6 +12,7 @@ from app.services.excel_ai_service import excel_ai_service
from app.services.markdown_ai_service import markdown_ai_service from app.services.markdown_ai_service import markdown_ai_service
from app.services.template_fill_service import template_fill_service from app.services.template_fill_service import template_fill_service
from app.services.word_ai_service import word_ai_service from app.services.word_ai_service import word_ai_service
from app.services.txt_ai_service import txt_ai_service
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -20,7 +21,8 @@ router = APIRouter(prefix="/ai", tags=["AI 分析"])
@router.post("/analyze/excel") @router.post("/analyze/excel")
async def analyze_excel( async def analyze_excel(
file: UploadFile = File(...), file: Optional[UploadFile] = File(None),
doc_id: Optional[str] = Form(None, description="文档ID从数据库读取"),
user_prompt: str = Query("", description="用户自定义提示词"), user_prompt: str = Query("", description="用户自定义提示词"),
analysis_type: str = Query("general", description="分析类型: general, summary, statistics, insights"), analysis_type: str = Query("general", description="分析类型: general, summary, statistics, insights"),
parse_all_sheets: bool = Query(False, description="是否分析所有工作表") parse_all_sheets: bool = Query(False, description="是否分析所有工作表")
@@ -29,7 +31,8 @@ async def analyze_excel(
上传并使用 AI 分析 Excel 文件 上传并使用 AI 分析 Excel 文件
Args: Args:
file: 上传的 Excel 文件 file: 上传的 Excel 文件(与 doc_id 二选一)
doc_id: 文档ID从数据库读取
user_prompt: 用户自定义提示词 user_prompt: 用户自定义提示词
analysis_type: 分析类型 analysis_type: 分析类型
parse_all_sheets: 是否分析所有工作表 parse_all_sheets: 是否分析所有工作表
@@ -37,7 +40,57 @@ async def analyze_excel(
Returns: Returns:
dict: 分析结果,包含 Excel 数据和 AI 分析结果 dict: 分析结果,包含 Excel 数据和 AI 分析结果
""" """
# 检查文件类型 filename = None
# 从数据库读取模式
if doc_id:
try:
from app.core.database.mongodb import mongodb
doc = await mongodb.get_document(doc_id)
if not doc:
raise HTTPException(status_code=404, detail=f"文档不存在: {doc_id}")
filename = doc.get("metadata", {}).get("original_filename", "unknown.xlsx")
file_ext = filename.split('.')[-1].lower()
if file_ext not in ['xlsx', 'xls']:
raise HTTPException(status_code=400, detail=f"文档类型不是 Excel: {file_ext}")
file_path = doc.get("metadata", {}).get("file_path")
if not file_path:
raise HTTPException(status_code=400, detail="文档没有存储文件路径,请重新上传")
# 使用文件路径进行 AI 分析
if parse_all_sheets:
result = await excel_ai_service.batch_analyze_sheets_from_path(
file_path=file_path,
filename=filename,
user_prompt=user_prompt,
analysis_type=analysis_type
)
else:
result = await excel_ai_service.analyze_excel_file_from_path(
file_path=file_path,
filename=filename,
user_prompt=user_prompt,
analysis_type=analysis_type
)
if result.get("success"):
return result
else:
return result
except HTTPException:
raise
except Exception as e:
logger.error(f"从数据库读取 Excel 文档失败: {str(e)}")
raise HTTPException(status_code=500, detail=f"读取文档失败: {str(e)}")
# 文件上传模式
if not file:
raise HTTPException(status_code=400, detail="请提供文件或文档ID")
if not file.filename: if not file.filename:
raise HTTPException(status_code=400, detail="文件名为空") raise HTTPException(status_code=400, detail="文件名为空")
@@ -60,7 +113,11 @@ async def analyze_excel(
# 读取文件内容 # 读取文件内容
content = await file.read() content = await file.read()
logger.info(f"开始分析文件: {file.filename}, 分析类型: {analysis_type}") # 验证文件内容不为空
if not content:
raise HTTPException(status_code=400, detail="文件内容为空,请确保文件已正确上传")
logger.info(f"开始分析文件: {file.filename}, 分析类型: {analysis_type}, 文件大小: {len(content)} bytes")
# 调用 AI 分析服务 # 调用 AI 分析服务
if parse_all_sheets: if parse_all_sheets:
@@ -153,8 +210,9 @@ async def analyze_text(
@router.post("/analyze/md") @router.post("/analyze/md")
async def analyze_markdown( async def analyze_markdown(
file: UploadFile = File(...), file: Optional[UploadFile] = File(None),
analysis_type: str = Query("summary", description="分析类型: summary, outline, key_points, questions, tags, qa, statistics, section"), doc_id: Optional[str] = Form(None, description="文档ID从数据库读取"),
analysis_type: str = Query("summary", description="分析类型: summary, outline, key_points, questions, tags, qa, statistics, section, charts"),
user_prompt: str = Query("", description="用户自定义提示词"), user_prompt: str = Query("", description="用户自定义提示词"),
section_number: Optional[str] = Query(None, description="指定章节编号,如 '''(一)'") section_number: Optional[str] = Query(None, description="指定章节编号,如 '''(一)'")
): ):
@@ -162,7 +220,8 @@ async def analyze_markdown(
上传并使用 AI 分析 Markdown 文件 上传并使用 AI 分析 Markdown 文件
Args: Args:
file: 上传的 Markdown 文件 file: 上传的 Markdown 文件(与 doc_id 二选一)
doc_id: 文档ID从数据库读取
analysis_type: 分析类型 analysis_type: 分析类型
user_prompt: 用户自定义提示词 user_prompt: 用户自定义提示词
section_number: 指定分析的章节编号 section_number: 指定分析的章节编号
@@ -170,16 +229,8 @@ async def analyze_markdown(
Returns: Returns:
dict: 分析结果 dict: 分析结果
""" """
# 检查文件类型 filename = None
if not file.filename: tmp_path = None
raise HTTPException(status_code=400, detail="文件名为空")
file_ext = file.filename.split('.')[-1].lower()
if file_ext not in ['md', 'markdown']:
raise HTTPException(
status_code=400,
detail=f"不支持的文件类型: {file_ext},仅支持 .md 和 .markdown"
)
# 验证分析类型 # 验证分析类型
supported_types = markdown_ai_service.get_supported_analysis_types() supported_types = markdown_ai_service.get_supported_analysis_types()
@@ -189,46 +240,96 @@ async def analyze_markdown(
detail=f"不支持的分析类型: {analysis_type},支持的类型: {', '.join(supported_types)}" detail=f"不支持的分析类型: {analysis_type},支持的类型: {', '.join(supported_types)}"
) )
try: if doc_id:
# 读取文件内容 # 从数据库读取文档
content = await file.read()
# 保存到临时文件
with tempfile.NamedTemporaryFile(mode='wb', suffix='.md', delete=False) as tmp:
tmp.write(content)
tmp_path = tmp.name
try: try:
logger.info(f"开始分析 Markdown 文件: {file.filename}, 分析类型: {analysis_type}, 章节: {section_number}") from app.core.database.mongodb import mongodb
doc = await mongodb.get_document(doc_id)
if not doc:
raise HTTPException(status_code=404, detail=f"文档不存在: {doc_id}")
# 调用 AI 分析服务 filename = doc.get("metadata", {}).get("original_filename", "unknown.md")
result = await markdown_ai_service.analyze_markdown( file_ext = filename.split('.')[-1].lower()
file_path=tmp_path,
analysis_type=analysis_type, if file_ext not in ['md', 'markdown']:
user_prompt=user_prompt, raise HTTPException(status_code=400, detail=f"文档类型不是 Markdown: {file_ext}")
section_number=section_number
content = doc.get("content") or ""
if not content:
raise HTTPException(status_code=400, detail="文档内容为空")
# 保存到临时文件
with tempfile.NamedTemporaryFile(mode='wb', suffix='.md', delete=False) as tmp:
tmp.write(content.encode('utf-8'))
tmp_path = tmp.name
logger.info(f"从数据库加载 Markdown 文档: {filename}, 长度: {len(content)}")
except HTTPException:
raise
except Exception as e:
logger.error(f"从数据库读取 Markdown 文档失败: {str(e)}")
raise HTTPException(status_code=500, detail=f"读取文档失败: {str(e)}")
else:
# 文件上传模式
if not file:
raise HTTPException(status_code=400, detail="请提供文件或文档ID")
if not file.filename:
raise HTTPException(status_code=400, detail="文件名为空")
file_ext = file.filename.split('.')[-1].lower()
if file_ext not in ['md', 'markdown']:
raise HTTPException(
status_code=400,
detail=f"不支持的文件类型: {file_ext},仅支持 .md 和 .markdown"
) )
logger.info(f"Markdown 分析完成: {file.filename}, 成功: {result['success']}") try:
# 读取文件内容
content = await file.read()
if not result['success']: # 保存到临时文件
raise HTTPException(status_code=500, detail=result.get('error', '分析失败')) with tempfile.NamedTemporaryFile(mode='wb', suffix='.md', delete=False) as tmp:
tmp.write(content)
tmp_path = tmp.name
return result filename = file.filename
finally: except Exception as e:
# 清理临时文件,确保在所有情况下都能清理 logger.error(f"读取 Markdown 文件失败: {str(e)}")
try: raise HTTPException(status_code=500, detail=f"读取文件失败: {str(e)}")
if tmp_path and os.path.exists(tmp_path):
os.unlink(tmp_path) try:
except Exception as cleanup_error: logger.info(f"开始分析 Markdown 文件: {filename}, 分析类型: {analysis_type}, 章节: {section_number}")
logger.warning(f"临时文件清理失败: {tmp_path}, error: {cleanup_error}")
# 调用 AI 分析服务
result = await markdown_ai_service.analyze_markdown(
file_path=tmp_path,
analysis_type=analysis_type,
user_prompt=user_prompt,
section_number=section_number
)
logger.info(f"Markdown 分析完成: {filename}, 成功: {result['success']}")
if not result['success']:
raise HTTPException(status_code=500, detail=result.get('error', '分析失败'))
return result
except HTTPException: except HTTPException:
raise raise
except Exception as e: except Exception as e:
logger.error(f"Markdown AI 分析过程中出错: {str(e)}") logger.error(f"Markdown AI 分析过程中出错: {str(e)}")
raise HTTPException(status_code=500, detail=f"分析失败: {str(e)}") raise HTTPException(status_code=500, detail=f"分析失败: {str(e)}")
finally:
# 清理临时文件
if tmp_path and os.path.exists(tmp_path):
try:
os.unlink(tmp_path)
except Exception as cleanup_error:
logger.warning(f"临时文件清理失败: {tmp_path}, error: {cleanup_error}")
@router.post("/analyze/md/stream") @router.post("/analyze/md/stream")
@@ -346,67 +447,100 @@ async def get_markdown_outline(
@router.post("/analyze/txt") @router.post("/analyze/txt")
async def analyze_txt( async def analyze_txt(
file: UploadFile = File(...), file: Optional[UploadFile] = File(None),
doc_id: Optional[str] = Form(None, description="文档ID从数据库读取"),
analysis_type: str = Query("structured", description="分析类型: structured, charts")
): ):
""" """
上传并使用 AI 分析 TXT 文本文件,提取结构化数据 上传并使用 AI 分析 TXT 文本文件,提取结构化数据或生成图表
将非结构化文本转换为结构化表格数据,便于后续填表使用 将非结构化文本转换为结构化表格数据,便于后续填表使用
当 analysis_type=charts 时,可生成可视化图表
Args: Args:
file: 上传的 TXT 文件 file: 上传的 TXT 文件(与 doc_id 二选一)
doc_id: 文档ID从数据库读取
analysis_type: 分析类型 - "structured"(默认,提取结构化数据)或 "charts"(生成图表)
Returns: Returns:
dict: 分析结果,包含结构化表格数据 dict: 分析结果,包含结构化表格数据或图表数据
""" """
if not file.filename: filename = None
raise HTTPException(status_code=400, detail="文件名为空") text_content = None
file_ext = file.filename.split('.')[-1].lower()
if file_ext not in ['txt', 'text']:
raise HTTPException(
status_code=400,
detail=f"不支持的文件类型: {file_ext},仅支持 .txt"
)
try:
# 读取文件内容
content = await file.read()
# 保存到临时文件
with tempfile.NamedTemporaryFile(mode='wb', suffix='.txt', delete=False) as tmp:
tmp.write(content)
tmp_path = tmp.name
if doc_id:
# 从数据库读取文档
try: try:
logger.info(f"开始 AI 分析 TXT 文件: {file.filename}") from app.core.database.mongodb import mongodb
doc = await mongodb.get_document(doc_id)
if not doc:
raise HTTPException(status_code=404, detail=f"文档不存在: {doc_id}")
# 使用 template_fill_service 的 AI 分析方法 filename = doc.get("metadata", {}).get("original_filename", "unknown.txt")
result = await template_fill_service.analyze_txt_with_ai( file_ext = filename.split('.')[-1].lower()
content=content.decode('utf-8', errors='replace'),
filename=file.filename if file_ext not in ['txt', 'text']:
raise HTTPException(status_code=400, detail=f"文档类型不是 TXT: {file_ext}")
# 使用数据库中的 content
text_content = doc.get("content") or ""
if not text_content:
raise HTTPException(status_code=400, detail="文档内容为空")
logger.info(f"从数据库加载 TXT 文档: {filename}, 长度: {len(text_content)}")
except HTTPException:
raise
except Exception as e:
logger.error(f"从数据库读取 TXT 文档失败: {str(e)}")
raise HTTPException(status_code=500, detail=f"读取文档失败: {str(e)}")
else:
# 文件上传模式
if not file:
raise HTTPException(status_code=400, detail="请提供文件或文档ID")
if not file.filename:
raise HTTPException(status_code=400, detail="文件名为空")
file_ext = file.filename.split('.')[-1].lower()
if file_ext not in ['txt', 'text']:
raise HTTPException(
status_code=400,
detail=f"不支持的文件类型: {file_ext},仅支持 .txt"
) )
if result: # 读取文件内容
logger.info(f"TXT AI 分析成功: {file.filename}") content = await file.read()
return { text_content = content.decode('utf-8', errors='replace')
"success": True, filename = file.filename
"filename": file.filename,
"structured_data": result
}
else:
logger.warning(f"TXT AI 分析返回空结果: {file.filename}")
return {
"success": False,
"filename": file.filename,
"error": "AI 分析未能提取到结构化数据",
"structured_data": None
}
finally: try:
# 清理临时文件 logger.info(f"开始 AI 分析 TXT 文件: {filename}, analysis_type={analysis_type}")
if os.path.exists(tmp_path):
os.unlink(tmp_path) # 使用 txt_ai_service 的 AI 分析方法
result = await txt_ai_service.analyze_txt_with_ai(
content=text_content,
filename=filename,
analysis_type=analysis_type
)
if result:
logger.info(f"TXT AI 分析成功: {filename}")
return {
"success": result.get("success", True),
"filename": filename,
"analysis_type": analysis_type,
"result": result
}
else:
logger.warning(f"TXT AI 分析返回空结果: {filename}")
return {
"success": False,
"filename": filename,
"error": "AI 分析未能提取到结构化数据",
"result": None
}
except HTTPException: except HTTPException:
raise raise
@@ -419,21 +553,90 @@ async def analyze_txt(
@router.post("/analyze/word") @router.post("/analyze/word")
async def analyze_word( async def analyze_word(
file: UploadFile = File(...), file: Optional[UploadFile] = File(None),
user_hint: str = Query("", description="用户提示词,如'请提取表格数据'") doc_id: Optional[str] = Form(None, description="文档ID从数据库读取"),
user_hint: str = Form("", description="用户提示词,如'请提取表格数据'"),
analysis_type: str = Query("structured", description="分析类型: structured, charts")
): ):
""" """
使用 AI 解析 Word 文档,提取结构化数据 使用 AI 解析 Word 文档,提取结构化数据或生成图表
适用于从非结构化的 Word 文档中提取表格数据、键值对等信息 适用于从非结构化的 Word 文档中提取表格数据、键值对等信息
当 analysis_type=charts 时,可生成可视化图表
Args: Args:
file: 上传的 Word 文件 file: 上传的 Word 文件(与 doc_id 二选一)
doc_id: 文档ID从数据库读取
user_hint: 用户提示词 user_hint: 用户提示词
analysis_type: 分析类型 - "structured"(默认,提取结构化数据)或 "charts"(生成图表)
Returns: Returns:
dict: 包含结构化数据的解析结果 dict: 包含结构化数据的解析结果或图表数据
""" """
# 获取文件名和扩展名
filename = None
file_ext = None
if doc_id:
# 从数据库读取文档
try:
from app.core.database.mongodb import mongodb
doc = await mongodb.get_document(doc_id)
if not doc:
raise HTTPException(status_code=404, detail=f"文档不存在: {doc_id}")
filename = doc.get("metadata", {}).get("original_filename", "unknown.docx")
file_ext = filename.split('.')[-1].lower()
if file_ext not in ['docx']:
raise HTTPException(status_code=400, detail=f"文档类型不是 Word: {file_ext}")
# 使用数据库中的 content 进行分析
content = doc.get("content", "") or ""
structured_data = doc.get("structured_data") or {}
tables = structured_data.get("tables", [])
# 调用 AI 分析服务,传入数据库内容
if analysis_type == "charts":
result = await word_ai_service.generate_charts_from_db(
content=content,
tables=tables,
filename=filename,
user_hint=user_hint
)
else:
result = await word_ai_service.parse_word_with_ai_from_db(
content=content,
tables=tables,
filename=filename,
user_hint=user_hint or "请提取文档中的所有结构化数据,包括表格、键值对等"
)
if result.get("success"):
return {
"success": True,
"filename": filename,
"analysis_type": analysis_type,
"result": result
}
else:
return {
"success": False,
"filename": filename,
"error": result.get("error", "AI 解析失败"),
"result": None
}
except HTTPException:
raise
except Exception as e:
logger.error(f"从数据库读取 Word 文档失败: {str(e)}")
raise HTTPException(status_code=500, detail=f"读取文档失败: {str(e)}")
# 文件上传模式
if not file:
raise HTTPException(status_code=400, detail="请提供文件或文档ID")
if not file.filename: if not file.filename:
raise HTTPException(status_code=400, detail="文件名为空") raise HTTPException(status_code=400, detail="文件名为空")
@@ -453,16 +656,25 @@ async def analyze_word(
tmp_path = tmp.name tmp_path = tmp.name
try: try:
# 使用 AI 解析 Word 文档 # 根据 analysis_type 选择处理方式
result = await word_ai_service.parse_word_with_ai( if analysis_type == "charts":
file_path=tmp_path, # 生成图表
user_hint=user_hint or "请提取文档中的所有结构化数据,包括表格、键值对等" result = await word_ai_service.generate_charts(
) file_path=tmp_path,
user_hint=user_hint
)
else:
# 提取结构化数据
result = await word_ai_service.parse_word_with_ai(
file_path=tmp_path,
user_hint=user_hint or "请提取文档中的所有结构化数据,包括表格、键值对等"
)
if result.get("success"): if result.get("success"):
return { return {
"success": True, "success": True,
"filename": file.filename, "filename": file.filename,
"analysis_type": analysis_type,
"result": result "result": result
} }
else: else:

View File

@@ -405,7 +405,7 @@ async def process_documents_batch(task_id: str, files: List[dict]):
if content and len(content) > 50: if content and len(content) > 50:
await index_document_to_rag(doc_id, filename, result, file_info["ext"]) await index_document_to_rag(doc_id, filename, result, file_info["ext"])
return {"index": index, "filename": filename, "doc_id": doc_id, "success": True} return {"index": index, "filename": filename, "doc_id": doc_id, "file_path": file_info["path"], "success": True}
except Exception as e: except Exception as e:
logger.error(f"处理文件 {filename} 失败: {e}") logger.error(f"处理文件 {filename} 失败: {e}")

View File

@@ -0,0 +1,208 @@
"""
PDF 转换 API 接口
提供将 Word、Excel、Txt、Markdown 转换为 PDF 的功能
"""
import logging
import uuid
from typing import Optional
from fastapi import APIRouter, UploadFile, File, Form, HTTPException
from fastapi.responses import StreamingResponse
from app.services.pdf_converter_service import pdf_converter_service
from app.services.file_service import file_service
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/pdf", tags=["PDF转换"])
# 临时存储转换后的 PDFkey: download_id, value: (pdf_content, original_filename)
_pdf_cache: dict = {}
# ==================== 请求/响应模型 ====================
class ConvertResponse:
"""转换响应"""
def __init__(self, success: bool, message: str = "", filename: str = ""):
self.success = success
self.message = message
self.filename = filename
# ==================== 接口 ====================
@router.post("/convert")
async def convert_to_pdf(
file: UploadFile = File(...),
):
"""
将上传的文件转换为 PDF
支持格式: docx, xlsx, txt, md
Args:
file: 上传的文件
Returns:
PDF 文件流
"""
try:
# 检查文件格式
filename = file.filename or "document"
file_ext = filename.rsplit('.', 1)[-1].lower() if '.' in filename else ''
if file_ext not in pdf_converter_service.supported_formats:
raise HTTPException(
status_code=400,
detail=f"不支持的格式: {file_ext},支持的格式: {', '.join(pdf_converter_service.supported_formats)}"
)
# 读取文件内容
content = await file.read()
if not content:
raise HTTPException(status_code=400, detail="文件内容为空")
logger.info(f"开始转换文件: {filename} ({file_ext})")
# 转换为 PDF
pdf_content, error = await pdf_converter_service.convert_to_pdf(
file_content=content,
source_format=file_ext,
filename=filename.rsplit('.', 1)[0] if '.' in filename else filename
)
if error:
raise HTTPException(status_code=500, detail=error)
# 直接返回 PDF 文件流
return StreamingResponse(
iter([pdf_content]),
media_type="application/pdf",
headers={
"Content-Disposition": f"attachment; filename*=UTF-8''converted.pdf"
}
)
except HTTPException:
raise
except Exception as e:
logger.error(f"PDF转换失败: {e}")
raise HTTPException(status_code=500, detail=f"转换失败: {str(e)}")
@router.get("/download/{download_id}")
async def download_pdf(download_id: str):
"""
通过下载 ID 下载 PDF支持 IDM 拦截)
"""
if download_id not in _pdf_cache:
raise HTTPException(status_code=404, detail="下载链接已过期或不存在")
pdf_content, filename = _pdf_cache.pop(download_id) # 下载后删除
# 使用 RFC 5987 编码支持中文文件名
from starlette.responses import StreamingResponse
import urllib.parse
# URL 编码中文文件名
encoded_filename = urllib.parse.quote(f"{filename}.pdf")
return StreamingResponse(
iter([pdf_content]),
media_type="application/pdf",
headers={
"Content-Disposition": f"attachment; filename*=UTF-8''{encoded_filename}"
}
)
@router.get("/formats")
async def get_supported_formats():
"""
获取支持的源文件格式
Returns:
支持的格式列表
"""
return {
"success": True,
"formats": pdf_converter_service.get_supported_formats()
}
@router.post("/convert/batch")
async def batch_convert_to_pdf(
files: list[UploadFile] = File(...),
):
"""
批量将多个文件转换为 PDF
注意: 批量转换会返回多个 PDF 文件打包的 zip
Args:
files: 上传的文件列表
Returns:
ZIP 压缩包包含所有PDF
"""
try:
import io
import zipfile
results = []
errors = []
for file in files:
try:
filename = file.filename or "document"
file_ext = filename.rsplit('.', 1)[-1].lower() if '.' in filename else ''
if file_ext not in pdf_converter_service.supported_formats:
errors.append(f"{filename}: 不支持的格式")
continue
content = await file.read()
pdf_content, error = await pdf_converter_service.convert_to_pdf(
file_content=content,
source_format=file_ext,
filename=filename.rsplit('.', 1)[0] if '.' in filename else filename
)
if error:
errors.append(f"{filename}: {error}")
else:
results.append((filename, pdf_content))
except Exception as e:
errors.append(f"{file.filename}: {str(e)}")
if not results:
raise HTTPException(
status_code=400,
detail=f"没有可转换的文件。错误: {'; '.join(errors)}"
)
# 创建 ZIP 包
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file:
for original_name, pdf_content in results:
pdf_name = f"{original_name.rsplit('.', 1)[0] if '.' in original_name else original_name}.pdf"
zip_file.writestr(pdf_name, pdf_content)
zip_buffer.seek(0)
return StreamingResponse(
iter([zip_buffer.getvalue()]),
media_type="application/zip",
headers={
"Content-Disposition": "attachment; filename*=UTF-8''converted_pdfs.zip"
}
)
except HTTPException:
raise
except Exception as e:
logger.error(f"批量PDF转换失败: {e}")
raise HTTPException(status_code=500, detail=f"批量转换失败: {str(e)}")

27
backend/app/celery_app.py Normal file
View File

@@ -0,0 +1,27 @@
# ============================================================
# Celery 应用配置
# ============================================================
from celery import Celery
# 优先使用环境变量,否则使用默认值
import os
CELERY_BROKER_URL = os.getenv("CELERY_BROKER_URL", "redis://localhost:6379/1")
CELERY_RESULT_BACKEND = os.getenv("CELERY_RESULT_BACKEND", "redis://localhost:6379/2")
celery_app = Celery(
"filesread",
broker=CELERY_BROKER_URL,
backend=CELERY_RESULT_BACKEND,
)
celery_app.conf.update(
task_serializer="json",
accept_content=["json"],
result_serializer="json",
timezone="Asia/Shanghai",
enable_utc=True,
task_track_started=True,
task_time_limit=3600, # 1小时超时
worker_prefetch_multiplier=1,
)

View File

@@ -91,11 +91,15 @@ class DocxParser(BaseParser):
table_rows.append(row_data) table_rows.append(row_data)
if table_rows: if table_rows:
# 第一行作为表头,其余行作为数据
headers = table_rows[0] if table_rows else []
data_rows = table_rows[1:] if len(table_rows) > 1 else []
tables_data.append({ tables_data.append({
"table_index": i, "table_index": i,
"rows": table_rows, "headers": headers, # 添加 headers 字段
"row_count": len(table_rows), "rows": data_rows, # 数据行(不含表头)
"column_count": len(table_rows[0]) if table_rows else 0 "row_count": len(data_rows),
"column_count": len(headers) if headers else 0
}) })
# 提取图片/嵌入式对象信息 # 提取图片/嵌入式对象信息

View File

@@ -34,8 +34,8 @@ def setup_logging():
# 根日志配置 # 根日志配置
log_level = logging.DEBUG if settings.DEBUG else logging.INFO log_level = logging.DEBUG if settings.DEBUG else logging.INFO
# 日志目录 # 日志目录 (使用 settings.BASE_DIR 确保跨平台一致)
log_dir = Path("data/logs") log_dir = settings.BASE_DIR / "data" / "logs"
log_dir.mkdir(parents=True, exist_ok=True) log_dir.mkdir(parents=True, exist_ok=True)
# 日志文件路径 # 日志文件路径

View File

@@ -223,6 +223,177 @@ class ExcelAIService:
} }
} }
async def analyze_excel_file_from_path(
self,
file_path: str,
filename: str,
user_prompt: str = "",
analysis_type: str = "general",
parse_options: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
从文件路径分析 Excel 文件(用于从数据库加载的文档)
Args:
file_path: Excel 文件路径
filename: 文件名
user_prompt: 用户自定义提示词
analysis_type: 分析类型
parse_options: 解析选项
Returns:
Dict[str, Any]: 分析结果
"""
# 1. 解析 Excel 文件
excel_data = None
parse_result_metadata = None
try:
parse_options = parse_options or {}
parse_result = self.parser.parse(file_path, **parse_options)
if not parse_result.success:
return {
"success": False,
"error": parse_result.error,
"analysis": None
}
excel_data = parse_result.data
parse_result_metadata = parse_result.metadata
logger.info(f"Excel 解析成功: {parse_result_metadata}")
except Exception as e:
logger.error(f"Excel 解析失败: {str(e)}")
return {
"success": False,
"error": f"Excel 解析失败: {str(e)}",
"analysis": None
}
# 2. 调用 LLM 进行分析
try:
if user_prompt and user_prompt.strip():
llm_result = await self.llm_service.analyze_with_template(
excel_data,
user_prompt
)
else:
llm_result = await self.llm_service.analyze_excel_data(
excel_data,
user_prompt,
analysis_type
)
logger.info(f"AI 分析完成: {llm_result['success']}")
return {
"success": True,
"excel": {
"data": excel_data,
"metadata": parse_result_metadata,
"saved_path": file_path
},
"analysis": llm_result
}
except Exception as e:
logger.error(f"AI 分析失败: {str(e)}")
return {
"success": False,
"error": f"AI 分析失败: {str(e)}",
"excel": {
"data": excel_data,
"metadata": parse_result_metadata
},
"analysis": None
}
async def batch_analyze_sheets_from_path(
self,
file_path: str,
filename: str,
user_prompt: str = "",
analysis_type: str = "general"
) -> Dict[str, Any]:
"""
从文件路径批量分析 Excel 文件的所有工作表(用于从数据库加载的文档)
Args:
file_path: Excel 文件路径
filename: 文件名
user_prompt: 用户自定义提示词
analysis_type: 分析类型
Returns:
Dict[str, Any]: 分析结果
"""
# 1. 解析所有工作表
try:
parse_result = self.parser.parse_all_sheets(file_path)
if not parse_result.success:
return {
"success": False,
"error": parse_result.error,
"analysis": None
}
sheets_data = parse_result.data.get("sheets", {})
logger.info(f"Excel 解析成功,共 {len(sheets_data)} 个工作表")
except Exception as e:
logger.error(f"Excel 解析失败: {str(e)}")
return {
"success": False,
"error": f"Excel 解析失败: {str(e)}",
"analysis": None
}
# 2. 批量分析每个工作表
sheet_analyses = {}
errors = {}
for sheet_name, sheet_data in sheets_data.items():
try:
if user_prompt and user_prompt.strip():
llm_result = await self.llm_service.analyze_with_template(
sheet_data,
user_prompt
)
else:
llm_result = await self.llm_service.analyze_excel_data(
sheet_data,
user_prompt,
analysis_type
)
sheet_analyses[sheet_name] = llm_result
if not llm_result["success"]:
errors[sheet_name] = llm_result.get("error", "未知错误")
logger.info(f"工作表 '{sheet_name}' 分析完成")
except Exception as e:
logger.error(f"工作表 '{sheet_name}' 分析失败: {str(e)}")
errors[sheet_name] = str(e)
# 3. 组合结果
return {
"success": len(errors) == 0,
"excel": {
"sheets": sheets_data,
"metadata": parse_result.metadata,
"saved_path": file_path
},
"analysis": {
"sheets": sheet_analyses,
"total_sheets": len(sheets_data),
"successful": len(sheet_analyses) - len(errors),
"errors": errors
}
}
def get_supported_analysis_types(self) -> List[str]: def get_supported_analysis_types(self) -> List[str]:
"""获取支持的分析类型""" """获取支持的分析类型"""
return [ return [

View File

@@ -54,15 +54,21 @@ class LLMService:
# 添加其他参数 # 添加其他参数
payload.update(kwargs) payload.update(kwargs)
import time
_start_time = time.time()
logger.info(f"🤖 [LLM] 正在调用 DeepSeek API... 模型: {self.model_name}")
try: try:
async with httpx.AsyncClient(timeout=60.0) as client: async with httpx.AsyncClient(timeout=120.0) as client:
response = await client.post( response = await client.post(
f"{self.base_url}/chat/completions", f"{self.base_url}/chat/completions",
headers=headers, headers=headers,
json=payload json=payload
) )
response.raise_for_status() response.raise_for_status()
return response.json() result = response.json()
_elapsed = time.time() - _start_time
logger.info(f"✅ [LLM] DeepSeek API 响应成功 | 模型: {self.model_name} | 耗时: {_elapsed:.2f}s | Token: {result.get('usage', {}).get('total_tokens', 'N/A')}")
return result
except httpx.HTTPStatusError as e: except httpx.HTTPStatusError as e:
error_detail = e.response.text error_detail = e.response.text
@@ -78,7 +84,7 @@ class LLMService:
pass pass
raise raise
except Exception as e: except Exception as e:
logger.error(f"LLM API 调用异常: {str(e)}") logger.error(f"LLM API 调用异常: {repr(e)} - {str(e)}")
raise raise
def extract_message_content(self, response: Dict[str, Any]) -> str: def extract_message_content(self, response: Dict[str, Any]) -> str:
@@ -133,6 +139,9 @@ class LLMService:
payload.update(kwargs) payload.update(kwargs)
import time
_start_time = time.time()
logger.info(f"🤖 [LLM] 正在调用 DeepSeek API (流式) | 模型: {self.model_name}")
try: try:
async with httpx.AsyncClient(timeout=120.0) as client: async with httpx.AsyncClient(timeout=120.0) as client:
async with client.stream( async with client.stream(
@@ -141,10 +150,13 @@ class LLMService:
headers=headers, headers=headers,
json=payload json=payload
) as response: ) as response:
_elapsed = time.time() - _start_time
logger.info(f"✅ [LLM] DeepSeek API 流式响应开始 | 模型: {self.model_name} | 耗时: {_elapsed:.2f}s")
async for line in response.aiter_lines(): async for line in response.aiter_lines():
if line.startswith("data: "): if line.startswith("data: "):
data = line[6:] # Remove "data: " prefix data = line[6:] # Remove "data: " prefix
if data == "[DONE]": if data == "[DONE]":
logger.info(f"✅ [LLM] DeepSeek API 流式响应完成")
break break
try: try:
import json as json_module import json as json_module

View File

@@ -0,0 +1,403 @@
"""
PDF 转换服务
支持将 Word(docx)、Excel(xlsx)、Txt、Markdown(md) 格式转换为 PDF
策略:所有格式先转为 Markdown再通过 Markdown 转 PDF
"""
import io
import logging
import platform
from pathlib import Path
from typing import List, Tuple
from reportlab.lib.pagesizes import A4
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.enums import TA_LEFT, TA_CENTER, TA_JUSTIFY
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
logger = logging.getLogger(__name__)
class PDFConverterService:
"""PDF 转换服务"""
def __init__(self):
self.supported_formats = ["docx", "xlsx", "txt", "md"]
self._font_name = None
self._styles = None
self._page_width = None
self._page_height = None
self._setup_fonts()
def _setup_fonts(self):
"""设置字体"""
try:
self._page_width, self._page_height = A4
# 查找中文字体
font_path = self._find_chinese_font()
if font_path:
try:
font = TTFont('ChineseFont', font_path)
pdfmetrics.registerFont(font)
from reportlab.pdfbase.pdfmetrics import registerFontFamily
registerFontFamily('ChineseFont', normal='ChineseFont')
self._font_name = 'ChineseFont'
logger.info(f"成功注册中文字体: {font_path}")
except Exception as e:
logger.warning(f"字体注册失败: {e}, 使用Helvetica")
self._font_name = 'Helvetica'
else:
self._font_name = 'Helvetica'
logger.warning("未找到中文字体,使用 Helvetica不支持中文")
# 创建样式
styles = getSampleStyleSheet()
styles.add(ParagraphStyle(
name='ChineseTitle',
fontName=self._font_name,
fontSize=16,
leading=22,
alignment=TA_CENTER,
spaceAfter=12,
))
styles.add(ParagraphStyle(
name='ChineseHeading',
fontName=self._font_name,
fontSize=14,
leading=20,
spaceBefore=10,
spaceAfter=8,
))
styles.add(ParagraphStyle(
name='ChineseBody',
fontName=self._font_name,
fontSize=10,
leading=14,
alignment=TA_JUSTIFY,
spaceAfter=6,
))
styles.add(ParagraphStyle(
name='ChineseCode',
fontName='Courier',
fontSize=9,
leading=12,
))
self._styles = styles
logger.info("PDF服务初始化完成")
except Exception as e:
logger.error(f"PDF服务初始化失败: {e}")
raise
def _find_chinese_font(self) -> str:
"""查找中文字体"""
system = platform.system()
if system == "Windows":
fonts = [
"C:/Windows/Fonts/simhei.ttf",
"C:/Windows/Fonts/simsun.ttc",
"C:/Windows/Fonts/msyh.ttc",
"C:/Windows/Fonts/simsun.ttf",
]
elif system == "Darwin":
fonts = [
"/System/Library/Fonts/STHeiti Light.ttc",
"/System/Library/Fonts/PingFang.ttc",
"/Library/Fonts/Arial Unicode.ttf",
]
else:
fonts = [
"/usr/share/fonts/truetype/wqy/wqy-microhei.ttc",
"/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc",
]
for font in fonts:
if Path(font).exists():
return font
return None
def _sanitize_text(self, text: str) -> str:
"""清理文本"""
if not text:
return ""
return text.replace('\x00', '')
async def convert_to_pdf(
self,
file_content: bytes,
source_format: str,
filename: str = "document"
) -> Tuple[bytes, str]:
"""将文档转换为 PDF"""
try:
if source_format.lower() not in self.supported_formats:
return b"", f"不支持的格式: {source_format}"
# 第一步:转换为 Markdown
markdown_content, error = await self._convert_to_markdown(file_content, source_format, filename)
if error:
return b"", error
# 第二步Markdown 转 PDF
return await self._convert_markdown_to_pdf(markdown_content, filename)
except Exception as e:
logger.error(f"PDF转换失败: {e}")
import traceback
logger.error(f"详细错误: {traceback.format_exc()}")
return b"", f"转换失败: {str(e)}"
async def _convert_to_markdown(
self,
file_content: bytes,
source_format: str,
filename: str
) -> Tuple[str, str]:
"""将各种格式转换为 Markdown"""
converters = {
"docx": self._convert_docx_to_markdown,
"xlsx": self._convert_xlsx_to_markdown,
"txt": self._convert_txt_to_markdown,
"md": self._convert_md_to_markdown,
}
return await converters[source_format.lower()](file_content, filename)
async def _convert_txt_to_markdown(self, file_content: bytes, filename: str) -> Tuple[str, str]:
"""Txt 转 Markdown"""
try:
text = self._decode_content(file_content)
text = self._sanitize_text(text)
return f"# {filename}\n\n{text}", ""
except Exception as e:
logger.error(f"Txt转Markdown失败: {e}")
return "", f"文本文件处理失败: {str(e)}"
async def _convert_md_to_markdown(self, file_content: bytes, filename: str) -> Tuple[str, str]:
"""Markdown 原样返回"""
try:
content = self._decode_content(file_content)
content = self._sanitize_text(content)
return f"# {filename}\n\n{content}", ""
except Exception as e:
logger.error(f"Markdown处理失败: {e}")
return "", f"Markdown处理失败: {str(e)}"
async def _convert_docx_to_markdown(self, file_content: bytes, filename: str) -> Tuple[str, str]:
"""Word 转 Markdown - 使用 zipfile 直接解析,更加健壮"""
try:
import zipfile
import re
lines = [f"# {filename}", ""]
# 直接使用 zipfile 解析 DOCX避免 python-docx 的严格验证
try:
with zipfile.ZipFile(io.BytesIO(file_content), 'r') as zf:
# 读取主文档内容
xml_content = zf.read('word/document.xml').decode('utf-8')
except zipfile.BadZipFile:
return "", "文件不是有效的 DOCX 格式"
except KeyError:
return "", "DOCX 文件损坏:找不到 document.xml"
# 简单的 XML 解析 - 提取文本段落
# 移除 XML 标签,提取纯文本
xml_content = re.sub(r'<w:br[^>]*>', '\n', xml_content)
xml_content = re.sub(r'</w:p>', '\n', xml_content)
xml_content = re.sub(r'<[^>]+>', '', xml_content)
xml_content = re.sub(r'\n\s*\n', '\n\n', xml_content)
# 解码 HTML 实体
xml_content = xml_content.replace('&amp;', '&')
xml_content = xml_content.replace('&lt;', '<')
xml_content = xml_content.replace('&gt;', '>')
xml_content = xml_content.replace('&quot;', '"')
xml_content = xml_content.replace('&#39;', "'")
# 清理空白
lines_text = [line.strip() for line in xml_content.split('\n') if line.strip()]
# 生成 Markdown
for text in lines_text[:500]: # 限制最多500行
if text:
lines.append(text)
return '\n'.join(lines), ""
except Exception as e:
logger.error(f"Word转Markdown失败: {e}")
import traceback
logger.error(traceback.format_exc())
return "", f"Word文档处理失败: {str(e)}"
for table in doc.tables:
lines.append("")
for row in table.rows:
row_data = [cell.text.strip() for cell in row.cells]
lines.append("| " + " | ".join(row_data) + " |")
# 表头分隔符
if table.rows:
lines.append("| " + " | ".join(["---"] * len(table.rows[0].cells)) + " |")
return "\n".join(lines), ""
except Exception as e:
logger.error(f"Word转Markdown失败: {e}")
return "", f"Word文档处理失败: {str(e)}"
async def _convert_xlsx_to_markdown(self, file_content: bytes, filename: str) -> Tuple[str, str]:
"""Excel 转 Markdown"""
try:
import openpyxl
wb = openpyxl.load_workbook(io.BytesIO(file_content))
lines = [f"# {filename} - Excel数据", ""]
for sheet_name in wb.sheetnames[:10]:
ws = wb[sheet_name]
lines.append(f"## 工作表: {sheet_name}")
lines.append("")
for row_idx, row in enumerate(ws.iter_rows(max_row=50, values_only=True)):
row_data = [str(cell) if cell is not None else "" for cell in row]
if not any(row_data):
continue
lines.append("| " + " | ".join(row_data) + " |")
if row_idx == 0:
lines.append("| " + " | ".join(["---"] * len(row_data)) + " |")
lines.append("")
return "\n".join(lines), ""
except Exception as e:
logger.error(f"Excel转Markdown失败: {e}")
return "", f"Excel处理失败: {str(e)}"
async def _convert_markdown_to_pdf(self, markdown_content: str, filename: str) -> Tuple[bytes, str]:
"""Markdown 转 PDF"""
try:
logger.info(f"Markdown转PDF开始 - filename={filename}, 字体={self._font_name}")
logger.info(f"styles['ChineseTitle'].fontName={self._styles['ChineseTitle'].fontName}")
buffer = io.BytesIO()
story = []
safe_filename = self._sanitize_text(filename)
logger.info(f"safe_filename={repr(safe_filename[:50])}")
story.append(Paragraph(text=safe_filename, style=self._styles['ChineseTitle']))
story.append(Spacer(1, 12))
in_code = False
for line in markdown_content.split('\n'):
line = line.strip()
if line.startswith('```'):
in_code = not in_code
story.append(Spacer(1, 6))
continue
if in_code:
story.append(Paragraph(text=self._sanitize_text(line), style=self._styles['ChineseCode']))
continue
if not line:
story.append(Spacer(1, 6))
continue
# 标题处理
if line.startswith('# '):
story.append(Paragraph(text=self._sanitize_text(line[2:]), style=self._styles['ChineseHeading']))
elif line.startswith('## '):
story.append(Paragraph(text=self._sanitize_text(line[3:]), style=self._styles['ChineseHeading']))
elif line.startswith('### '):
story.append(Paragraph(text=self._sanitize_text(line[4:]), style=self._styles['ChineseHeading']))
elif line.startswith('#### '):
story.append(Paragraph(text=self._sanitize_text(line[5:]), style=self._styles['ChineseHeading']))
elif line.startswith('- ') or line.startswith('* '):
story.append(Paragraph(text="" + self._sanitize_text(line[2:]), style=self._styles['ChineseBody']))
# 表格处理
elif line.startswith('|'):
# 跳过 markdown 表格分隔符
if set(line.replace('|', '').replace('-', '').replace(':', '').replace(' ', '')) == set():
continue
# 解析并创建表格
table_lines = []
for _ in range(50): # 最多50行
if line.startswith('|'):
row = [cell.strip() for cell in line.split('|')[1:-1]]
if not any(row) or set(''.join(row).replace('-', '').replace(':', '').replace(' ', '')) == set():
break
table_lines.append(row)
try:
line = next(markdown_content.split('\n').__iter__()).strip()
except StopIteration:
break
else:
break
if table_lines:
# 创建表格
t = Table(table_lines, colWidths=[100] * len(table_lines[0]))
t.setStyle(TableStyle([
('FONTNAME', (0, 0), (-1, -1), self._font_name),
('FONTSIZE', (0, 0), (-1, -1), 9),
('GRID', (0, 0), (-1, -1), 0.5, '#999999'),
('BACKGROUND', (0, 0), (-1, 0), '#4472C4'),
('TEXTCOLOR', (0, 0), (-1, 0), '#FFFFFF'),
]))
story.append(t)
story.append(Spacer(1, 6))
else:
story.append(Paragraph(text=self._sanitize_text(line), style=self._styles['ChineseBody']))
logger.info(f"准备构建PDFstory长度={len(story)}")
pdf_doc = SimpleDocTemplate(
buffer,
pagesize=(self._page_width, self._page_height),
rightMargin=72,
leftMargin=72,
topMargin=72,
bottomMargin=72
)
logger.info("调用pdf_doc.build()")
pdf_doc.build(story)
logger.info("pdf_doc.build()完成")
result = buffer.getvalue()
buffer.close()
return result, ""
except Exception as e:
logger.error(f"Markdown转PDF失败: {e}")
import traceback
logger.error(f"详细错误: {traceback.format_exc()}")
return b"", f"Markdown转PDF失败: {str(e)}"
def _decode_content(self, file_content: bytes) -> str:
"""解码文件内容"""
encodings = ['utf-8', 'gbk', 'gb2312', 'gb18030', 'latin-1']
for enc in encodings:
try:
return file_content.decode(enc)
except (UnicodeDecodeError, LookupError):
continue
return file_content.decode('utf-8', errors='replace')
def get_supported_formats(self) -> List[str]:
"""获取支持的格式"""
return self.supported_formats
# 全局单例
pdf_converter_service = PDFConverterService()

View File

@@ -669,7 +669,7 @@ class RAGService:
# 按融合分数降序排序 # 按融合分数降序排序
fused_results.sort(key=lambda x: x["score"], reverse=True) fused_results.sort(key=lambda x: x["score"], reverse=True)
logger.debug(f"混合融合: {len(fused_results)} 个文档, 向量:{len(vector_results)}, BM25:{len(bm25_results)}") logger.info(f"RRF 混合融合: {len(fused_results)} 个文档参与融合, 向量检索命中:{len(vector_results)}, BM25命中:{len(bm25_results)}")
return fused_results[:top_k] return fused_results[:top_k]

View File

@@ -0,0 +1,353 @@
"""
TXT 文档 AI 分析服务
使用 LLM 对 TXT 文本文件进行深度分析,提取结构化数据并生成可视化图表
"""
import logging
import re
from typing import Any, Dict, List, Optional
from app.services.llm_service import llm_service
from app.services.visualization_service import visualization_service
from app.core.document_parser.txt_parser import TxtParser
logger = logging.getLogger(__name__)
class TxtAIService:
"""TXT 文档 AI 分析服务"""
def __init__(self):
self.parser = TxtParser()
self.llm = llm_service
async def analyze_txt_with_ai(
self,
content: str,
filename: str = "",
user_hint: str = "",
analysis_type: str = "structured"
) -> Dict[str, Any]:
"""
使用 AI 解析 TXT 文本文件
Args:
content: 文本内容
filename: 文件名(可选)
user_hint: 用户提示词
analysis_type: 分析类型 - "structured"(默认,提取结构化数据)或 "charts"(生成图表)
Returns:
Dict: 包含结构化数据的分析结果
"""
try:
if not content or not content.strip():
return {
"success": False,
"error": "文档内容为空"
}
# 根据分析类型选择处理方式
if analysis_type == "charts":
return await self.generate_charts(content, filename, user_hint)
# 默认:提取结构化数据
return await self._extract_structured_data(content, filename, user_hint)
except Exception as e:
logger.error(f"TXT AI 分析失败: {str(e)}")
return {
"success": False,
"error": str(e)
}
async def _extract_structured_data(
self,
content: str,
filename: str = "",
user_hint: str = ""
) -> Dict[str, Any]:
"""
从文本中提取结构化数据
Args:
content: 文本内容
filename: 文件名
user_hint: 用户提示词
Returns:
结构化数据
"""
try:
# 截断内容避免超出 token 限制
max_content_len = 8000
text_preview = content[:max_content_len] if len(content) > max_content_len else content
prompt = f"""你是一个专业的数据提取专家。请从以下文本中提取结构化数据。
【用户需求】
{user_hint if user_hint else "请提取文档中的所有结构化数据,包括表格数据、键值对、列表项等。"}
【文档内容】({"" + str(max_content_len) + "字符,仅显示部分" if len(content) > max_content_len else "全文"}
{text_preview}
请按照以下 JSON 格式输出:
{{
"type": "structured_text",
"tables": [{{"headers": [...], "rows": [...]}}],
"key_values": {{"键1": "值1", "键2": "值2", ...}},
"list_items": ["项1", "项2", ...],
"summary": "文档内容摘要"
}}
重点:
- 如果文档包含表格数据(制表符、空格对齐等),提取到 tables 中
- 如果文档包含键值对(如 名称: 张三),提取到 key_values 中
- 如果文档包含列表项,提取到 list_items 中
- 如果无法提取到结构化数据,至少提供一个详细的摘要
"""
messages = [
{"role": "system", "content": "你是一个专业的数据提取助手。请严格按JSON格式输出。"},
{"role": "user", "content": prompt}
]
response = await self.llm.chat(
messages=messages,
temperature=0.1,
max_tokens=8000
)
content_text = self.llm.extract_message_content(response)
result = self._parse_json_response(content_text)
if result:
logger.info(f"TXT 结构化数据提取成功: type={result.get('type')}")
return {
"success": True,
"type": result.get("type", "structured_text"),
"tables": result.get("tables", []),
"key_values": result.get("key_values", {}),
"list_items": result.get("list_items", []),
"summary": result.get("summary", "")
}
else:
return {
"success": True,
"type": "text",
"summary": text_preview[:500],
"raw_text_preview": text_preview[:500]
}
except Exception as e:
logger.error(f"TXT 结构化数据提取失败: {str(e)}")
return {
"success": False,
"error": str(e)
}
async def generate_charts(
self,
content: str,
filename: str = "",
user_hint: str = ""
) -> Dict[str, Any]:
"""
从文本中提取数据并生成可视化图表
Args:
content: 文本内容
filename: 文件名
user_hint: 用户提示词
Returns:
包含图表数据和统计信息的结果
"""
try:
# 截断内容避免超出 token 限制
max_content_len = 8000
text_preview = content[:max_content_len] if len(content) > max_content_len else content
# 使用 LLM 提取可用于图表的数据
prompt = f"""你是一个专业的数据可视化助手。请从以下文本中提取可用于可视化的数据。
文档标题:{filename}
文档内容:
{text_preview}
请完成以下任务:
1. 识别文本中的表格数据(制表符分隔、空格对齐的表格等)
2. 识别文本中的关键统计数据(百分比、数量、趋势等)
3. 识别可用于比较的分类数据
请用 JSON 格式返回以下结构的数据(如果没有表格数据,返回空结构):
{{
"tables": [
{{
"description": "表格的描述",
"columns": ["列名1", "列名2", ...],
"rows": [
["值1", "值2", ...],
["值1", "值2", ...]
]
}}
],
"key_statistics": [
{{
"name": "指标名称",
"value": "数值",
"trend": "增长/下降/持平",
"description": "指标说明"
}}
],
"chart_suggestions": [
{{
"chart_type": "bar/line/pie",
"title": "图表标题",
"data_source": "数据来源说明"
}}
]
}}
如果没有表格数据,返回空结构:{{"tables": [], "key_statistics": [], "chart_suggestions": []}}
请确保返回的是合法的 JSON 格式。"""
messages = [
{"role": "system", "content": "你是一个专业的数据可视化助手,擅长从文本中提取数据并生成图表。"},
{"role": "user", "content": prompt}
]
response = await self.llm.chat(
messages=messages,
temperature=0.1,
max_tokens=8000
)
content_text = self.llm.extract_message_content(response)
chart_data = self._parse_json_response(content_text)
if not chart_data:
return {
"success": False,
"error": "无法从文本中提取有效的数据结构"
}
# 检查是否有表格数据
tables = chart_data.get("tables", [])
key_statistics = chart_data.get("key_statistics", [])
if not tables:
return {
"success": False,
"error": "文档中没有可用于图表的表格数据",
"key_statistics": key_statistics,
"chart_suggestions": chart_data.get("chart_suggestions", [])
}
# 使用第一个表格生成图表
first_table = tables[0]
columns = first_table.get("columns", [])
rows = first_table.get("rows", [])
if not columns or not rows:
return {
"success": False,
"error": "表格数据为空"
}
# 转换为 visualization_service 需要的格式
viz_data = {
"columns": columns,
"rows": rows
}
# 生成可视化图表
logger.info(f"开始生成图表,列数: {len(columns)}, 行数: {len(rows)}")
vis_result = visualization_service.analyze_and_visualize(viz_data)
if vis_result.get("success"):
return {
"success": True,
"charts": vis_result.get("charts", {}),
"statistics": vis_result.get("statistics", {}),
"distributions": vis_result.get("distributions", {}),
"row_count": vis_result.get("row_count", 0),
"column_count": vis_result.get("column_count", 0),
"key_statistics": key_statistics,
"chart_suggestions": chart_data.get("chart_suggestions", []),
"table_description": first_table.get("description", "")
}
else:
return {
"success": False,
"error": vis_result.get("error", "可视化生成失败"),
"key_statistics": key_statistics
}
except Exception as e:
logger.error(f"TXT 图表生成失败: {str(e)}")
return {
"success": False,
"error": str(e)
}
def _parse_json_response(self, content: str) -> Optional[Dict]:
"""解析 JSON 响应,处理各种格式问题"""
if not content:
return None
import json
# 清理 markdown 标记
cleaned = content.strip()
cleaned = re.sub(r'^```json\s*', '', cleaned, flags=re.MULTILINE)
cleaned = re.sub(r'^```\s*', '', cleaned, flags=re.MULTILINE)
cleaned = cleaned.strip()
# 找到 JSON 开始位置
json_start = -1
for i, c in enumerate(cleaned):
if c == '{':
json_start = i
break
if json_start == -1:
logger.warning("无法找到 JSON 开始位置")
return None
json_text = cleaned[json_start:]
# 尝试直接解析
try:
return json.loads(json_text)
except json.JSONDecodeError:
pass
# 尝试修复并解析
try:
# 找到闭合括号
depth = 0
end_pos = -1
for i, c in enumerate(json_text):
if c == '{':
depth += 1
elif c == '}':
depth -= 1
if depth == 0:
end_pos = i + 1
break
if end_pos > 0:
fixed = json_text[:end_pos]
# 移除末尾逗号
fixed = re.sub(r',\s*([}]])', r'\1', fixed)
return json.loads(fixed)
except Exception as e:
logger.warning(f"JSON 修复失败: {e}")
return None
# 全局单例
txt_ai_service = TxtAIService()

View File

@@ -53,7 +53,11 @@ class VisualizationService:
} }
# 转换为 DataFrame # 转换为 DataFrame
df = pd.DataFrame(rows, columns=columns) # 过滤掉行数与列数不匹配的数据
valid_rows = [row for row in rows if len(row) == len(columns)]
if len(valid_rows) < len(rows):
logger.warning(f"过滤了 {len(rows) - len(valid_rows)} 行无效数据(列数不匹配)")
df = pd.DataFrame(valid_rows, columns=columns)
# 根据列类型分类 # 根据列类型分类
numeric_columns = df.select_dtypes(include=[np.number]).columns.tolist() numeric_columns = df.select_dtypes(include=[np.number]).columns.tolist()
@@ -141,18 +145,18 @@ class VisualizationService:
charts = {} charts = {}
# 1. 数值型列的直方图 # 1. 数值型列的直方图
charts["histograms"] = [] charts["numeric_charts"] = []
for col in numeric_columns[:5]: # 限制最多 5 个数值列 for col in numeric_columns[:5]: # 限制最多 5 个数值列
chart_data = self._create_histogram(df[col], col) chart_data = self._create_histogram(df[col], col)
if chart_data: if chart_data:
charts["histograms"].append(chart_data) charts["numeric_charts"].append(chart_data)
# 2. 分类型列的条形图 # 2. 分类型列的条形图
charts["bar_charts"] = [] charts["categorical_charts"] = []
for col in categorical_columns[:5]: # 限制最多 5 个分类型列 for col in categorical_columns[:5]: # 限制最多 5 个分类型列
chart_data = self._create_bar_chart(df[col], col) chart_data = self._create_bar_chart(df[col], col)
if chart_data: if chart_data:
charts["bar_charts"].append(chart_data) charts["categorical_charts"].append(chart_data)
# 3. 数值型列的箱线图 # 3. 数值型列的箱线图
charts["box_plots"] = [] charts["box_plots"] = []

View File

@@ -8,6 +8,7 @@ from typing import Dict, Any, List, Optional
import json import json
from app.services.llm_service import llm_service from app.services.llm_service import llm_service
from app.services.visualization_service import visualization_service
from app.core.document_parser.docx_parser import DocxParser from app.core.document_parser.docx_parser import DocxParser
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -183,7 +184,7 @@ class WordAIService:
response = await self.llm.chat( response = await self.llm.chat(
messages=messages, messages=messages,
temperature=0.1, temperature=0.1,
max_tokens=50000 max_tokens=8000
) )
content = self.llm.extract_message_content(response) content = self.llm.extract_message_content(response)
@@ -275,7 +276,7 @@ class WordAIService:
response = await self.llm.chat( response = await self.llm.chat(
messages=messages, messages=messages,
temperature=0.1, temperature=0.1,
max_tokens=50000 max_tokens=8000
) )
content = self.llm.extract_message_content(response) content = self.llm.extract_message_content(response)
@@ -634,6 +635,281 @@ class WordAIService:
return values return values
async def generate_charts(
self,
file_path: str,
user_hint: str = ""
) -> Dict[str, Any]:
"""
使用 AI 解析 Word 文档并生成可视化图表
从 Word 文档中提取表格数据,然后生成统计图表
Args:
file_path: Word 文件路径
user_hint: 用户提示词,指定要提取的内容类型
Returns:
Dict: 包含图表数据和统计信息的结果
"""
try:
# 1. 先用基础解析器提取原始内容
parse_result = self.parser.parse(file_path)
if not parse_result.success:
return {
"success": False,
"error": parse_result.error,
"structured_data": None
}
# 2. 获取原始数据
raw_data = parse_result.data
paragraphs = raw_data.get("paragraphs", [])
tables = raw_data.get("tables", [])
content = raw_data.get("content", "")
logger.info(f"Word 基础解析完成: {len(paragraphs)} 个段落, {len(tables)} 个表格")
# 3. 优先处理表格数据
if tables and len(tables) > 0:
structured_data = await self._extract_tables_with_ai(
tables, paragraphs, 0, user_hint, parse_result.metadata
)
elif paragraphs and len(paragraphs) > 0:
structured_data = await self._extract_from_text_with_ai(
paragraphs, content, 0, [], user_hint
)
else:
return {
"success": False,
"error": "文档内容为空",
"structured_data": None
}
# 4. 检查是否有表格数据用于可视化
if not structured_data.get("success"):
return {
"success": False,
"error": structured_data.get("error", "解析失败"),
"structured_data": None
}
parse_type = structured_data.get("type", "")
# 5. 提取可用于图表的数据
chart_data = None
if parse_type == "table_data":
headers = structured_data.get("headers", [])
rows = structured_data.get("rows", [])
if headers and rows:
chart_data = {
"columns": headers,
"rows": rows
}
elif parse_type == "structured_text":
tables = structured_data.get("tables", [])
if tables and len(tables) > 0:
first_table = tables[0]
headers = first_table.get("headers", [])
rows = first_table.get("rows", [])
if headers and rows:
chart_data = {
"columns": headers,
"rows": rows
}
# 6. 生成可视化图表
if chart_data:
logger.info(f"开始生成图表,列数: {len(chart_data['columns'])}, 行数: {len(chart_data['rows'])}")
vis_result = visualization_service.analyze_and_visualize(chart_data)
if vis_result.get("success"):
return {
"success": True,
"charts": vis_result.get("charts", {}),
"statistics": vis_result.get("statistics", {}),
"distributions": vis_result.get("distributions", {}),
"structured_data": structured_data,
"row_count": vis_result.get("row_count", 0),
"column_count": vis_result.get("column_count", 0)
}
else:
return {
"success": False,
"error": vis_result.get("error", "可视化生成失败"),
"structured_data": structured_data
}
else:
return {
"success": False,
"error": "文档中没有可用于图表的表格数据",
"structured_data": structured_data
}
except Exception as e:
logger.error(f"Word 文档图表生成失败: {str(e)}")
return {
"success": False,
"error": str(e),
"structured_data": None
}
async def parse_word_with_ai_from_db(
self,
content: str,
tables: List[Dict],
filename: str = "",
user_hint: str = ""
) -> Dict[str, Any]:
"""
使用 AI 解析从数据库读取的 Word 文档内容,提取结构化数据
Args:
content: 文档文本内容
tables: 表格数据列表
filename: 文件名
user_hint: 用户提示词
Returns:
Dict: 包含结构化数据的解析结果
"""
try:
# 解析段落
paragraphs = [p.strip() for p in content.split('\n') if p.strip()]
logger.info(f"从数据库解析 Word: {len(paragraphs)} 个段落, {len(tables)} 个表格")
# 优先处理表格数据
if tables and len(tables) > 0:
structured_data = await self._extract_tables_with_ai(
tables, paragraphs, 0, user_hint, {"filename": filename}
)
elif paragraphs and len(paragraphs) > 0:
structured_data = await self._extract_from_text_with_ai(
paragraphs, content, 0, [], user_hint
)
else:
structured_data = {
"success": True,
"type": "empty",
"message": "文档内容为空"
}
return structured_data
except Exception as e:
logger.error(f"从数据库解析 Word 文档失败: {str(e)}")
return {
"success": False,
"error": str(e)
}
async def generate_charts_from_db(
self,
content: str,
tables: List[Dict],
filename: str = "",
user_hint: str = ""
) -> Dict[str, Any]:
"""
使用 AI 解析从数据库读取的 Word 文档并生成可视化图表
Args:
content: 文档文本内容
tables: 表格数据列表
filename: 文件名
user_hint: 用户提示词
Returns:
Dict: 包含图表数据和统计信息的结果
"""
try:
# 解析段落
paragraphs = [p.strip() for p in content.split('\n') if p.strip()]
logger.info(f"从数据库生成 Word 图表: {len(paragraphs)} 个段落, {len(tables)} 个表格")
# 优先处理表格数据
if tables and len(tables) > 0:
structured_data = await self._extract_tables_with_ai(
tables, paragraphs, 0, user_hint, {"filename": filename}
)
elif paragraphs and len(paragraphs) > 0:
structured_data = await self._extract_from_text_with_ai(
paragraphs, content, 0, [], user_hint
)
else:
return {
"success": False,
"error": "文档内容为空"
}
# 提取可用于图表的数据
chart_data = None
logger.info(f"准备提取图表数据structured_data type: {structured_data.get('type')}, keys: {list(structured_data.keys())}")
if structured_data.get("type") == "table_data":
headers = structured_data.get("headers", [])
rows = structured_data.get("rows", [])
logger.info(f"table_data类型: headers数量={len(headers)}, rows数量={len(rows)}")
if headers and rows:
chart_data = {
"columns": headers,
"rows": rows
}
elif structured_data.get("type") == "structured_text":
tables_data = structured_data.get("tables", [])
logger.info(f"structured_text类型: tables数量={len(tables_data)}")
if tables_data and len(tables_data) > 0:
first_table = tables_data[0]
headers = first_table.get("headers", [])
rows = first_table.get("rows", [])
logger.info(f"第一个表格: headers={headers[:5]}, rows数量={len(rows)}")
if headers and rows:
chart_data = {
"columns": headers,
"rows": rows
}
else:
logger.warning(f"无法识别的structured_data类型: {structured_data.get('type')}")
# 生成可视化图表
if chart_data:
logger.info(f"开始生成图表,列数: {len(chart_data['columns'])}, 行数: {len(chart_data['rows'])}")
vis_result = visualization_service.analyze_and_visualize(chart_data)
if vis_result.get("success"):
return {
"success": True,
"charts": vis_result.get("charts", {}),
"statistics": vis_result.get("statistics", {}),
"distributions": vis_result.get("distributions", {}),
"structured_data": structured_data,
"row_count": vis_result.get("row_count", 0),
"column_count": vis_result.get("column_count", 0)
}
else:
return {
"success": False,
"error": vis_result.get("error", "可视化生成失败"),
"structured_data": structured_data
}
else:
return {
"success": False,
"error": "文档中没有可用于图表的表格数据",
"structured_data": structured_data
}
except Exception as e:
logger.error(f"从数据库生成 Word 图表失败: {str(e)}")
return {
"success": False,
"error": str(e)
}
# 全局单例
word_ai_service = WordAIService() word_ai_service = WordAIService()

View File

@@ -42,6 +42,9 @@ chardet==5.2.0
Pillow>=10.0.0 Pillow>=10.0.0
pytesseract>=0.3.10 pytesseract>=0.3.10
# ==================== PDF 生成 ====================
reportlab>=4.0.0
# ==================== AI / LLM ==================== # ==================== AI / LLM ====================
httpx==0.25.2 httpx==0.25.2

203
docker-compose.yml Normal file
View File

@@ -0,0 +1,203 @@
# ============================================================
# FilesReadSystem Docker Compose
# 全栈 AI 文档理解与数据融合系统
# ============================================================
version: "3.8"
services:
# ==================== 数据库服务 ====================
mongodb:
image: mongo:7.0
container_name: filesread_mongodb
restart: unless-stopped
ports:
- "27017:27017"
environment:
MONGO_INITDB_ROOT_USERNAME: ${MONGO_ROOT_USER:-admin}
MONGO_INITDB_ROOT_PASSWORD: ${MONGO_ROOT_PASSWORD:-20060825fhy}
MONGO_INITDB_DATABASE: ${MONGODB_DB_NAME:-document_system}
volumes:
- mongodb_data:/data/db
networks:
- filesread_network
healthcheck:
test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')", "--quiet"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
mysql:
image: mysql:8.0
container_name: filesread_mysql
restart: unless-stopped
ports:
- "3306:3306"
environment:
MYSQL_ROOT_PASSWORD: ${MYSQL_PASSWORD:-123456}
MYSQL_DATABASE: ${MYSQL_DATABASE:-document}
volumes:
- mysql_data:/var/lib/mysql
networks:
- filesread_network
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p${MYSQL_PASSWORD:-123456}"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
redis:
image: redis:7-alpine
container_name: filesread_redis
restart: unless-stopped
ports:
- "6379:6379"
volumes:
- redis_data:/data
networks:
- filesread_network
command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD:-}
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
# ==================== 应用服务 ====================
backend:
build:
context: ./backend
dockerfile: Dockerfile
container_name: filesread_backend
restart: unless-stopped
ports:
- "8000:8000"
environment:
# 应用配置
APP_NAME: FilesReadSystem
DEBUG: ${DEBUG:-false}
API_V1_STR: /api/v1
# MongoDB 配置 (使用 docker-compose 服务名)
MONGODB_URL: mongodb://${MONGO_ROOT_USER:-admin}:${MONGO_ROOT_PASSWORD:-20060825fhy}@mongodb:27017/admin
MONGODB_DB_NAME: ${MONGODB_DB_NAME:-document_system}
# MySQL 配置
MYSQL_HOST: mysql
MYSQL_PORT: 3306
MYSQL_USER: root
MYSQL_PASSWORD: ${MYSQL_PASSWORD:-123456}
MYSQL_DATABASE: ${MYSQL_DATABASE:-document}
MYSQL_CHARSET: utf8mb4
# Redis 配置
REDIS_URL: redis://:${REDIS_PASSWORD:-}@redis:6379/0
# LLM AI 配置
LLM_API_KEY: ${LLM_API_KEY}
LLM_BASE_URL: ${LLM_BASE_URL:-https://api.deepseek.com}
LLM_MODEL_NAME: ${LLM_MODEL_NAME:-deepseek-chat}
# Supabase 配置
SUPABASE_URL: ${SUPABASE_URL}
SUPABASE_ANON_KEY: ${SUPABASE_ANON_KEY}
SUPABASE_SERVICE_KEY: ${SUPABASE_SERVICE_KEY}
# Embedding / RAG 配置
EMBEDDING_MODEL: ${EMBEDDING_MODEL:-all-MiniLM-L6-v2}
FAISS_INDEX_DIR: /app/data/faiss
# 文件路径配置
UPLOAD_DIR: /app/data/uploads
MAX_UPLOAD_SIZE: 104857600
# Celery 配置
CELERY_BROKER_URL: redis://:${REDIS_PASSWORD:-}@redis:6379/1
CELERY_RESULT_BACKEND: redis://:${REDIS_PASSWORD:-}@redis:6379/2
volumes:
- backend_data:/app/data
networks:
- filesread_network
depends_on:
mongodb:
condition: service_healthy
mysql:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "python", "-c", "import httpx; httpx.get('http://localhost:8000/health')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
celery_worker:
build:
context: ./backend
dockerfile: Dockerfile
container_name: filesread_celery
restart: unless-stopped
command: celery -A app.celery_app worker --loglevel=info --prefetch-multiplier=1
environment:
# Celery 配置
CELERY_BROKER_URL: redis://:${REDIS_PASSWORD:-}@redis:6379/1
CELERY_RESULT_BACKEND: redis://:${REDIS_PASSWORD:-}@redis:6379/2
# 复用后端的数据库配置
MONGODB_URL: mongodb://${MONGO_ROOT_USER:-admin}:${MONGO_ROOT_PASSWORD:-20060825fhy}@mongodb:27017/admin
MONGODB_DB_NAME: ${MONGODB_DB_NAME:-document_system}
MYSQL_HOST: mysql
MYSQL_PORT: 3306
MYSQL_USER: root
MYSQL_PASSWORD: ${MYSQL_PASSWORD:-123456}
MYSQL_DATABASE: ${MYSQL_DATABASE:-document}
REDIS_URL: redis://:${REDIS_PASSWORD:-}@redis:6379/0
# LLM 配置
LLM_API_KEY: ${LLM_API_KEY}
LLM_BASE_URL: ${LLM_BASE_URL:-https://api.deepseek.com}
LLM_MODEL_NAME: ${LLM_MODEL_NAME:-deepseek-chat}
# Embedding 配置
EMBEDDING_MODEL: ${EMBEDDING_MODEL:-all-MiniLM-L6-v2}
FAISS_INDEX_DIR: /app/data/faiss
volumes:
- backend_data:/app/data
networks:
- filesread_network
depends_on:
- redis
- mongodb
- mysql
frontend:
build:
context: ./frontend
dockerfile: Dockerfile
container_name: filesread_frontend
restart: unless-stopped
ports:
- "80:80"
environment:
VITE_APP_ID: ${VITE_APP_ID:-}
VITE_SUPABASE_URL: ${SUPABASE_URL}
VITE_SUPABASE_ANON_KEY: ${SUPABASE_ANON_KEY}
VITE_BACKEND_API_URL: /api/v1
networks:
- filesread_network
depends_on:
- backend
networks:
filesread_network:
driver: bridge
volumes:
mongodb_data:
mysql_data:
redis_data:
backend_data:

169
docs/architecture.drawio Normal file
View File

@@ -0,0 +1,169 @@
<mxfile host="app.diagrams.net" modified="2026-04-16T14:00:00.000Z" agent="Claude" version="24.0.0">
<diagram name="系统架构图" id="architecture">
<mxGraphModel dx="1200" dy="800" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1920" pageHeight="1080" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<!-- 用户访问层 -->
<mxCell id="layer1" value="用户访问层" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=16;fontStyle=1;fontColor=#1a1a2e;" vertex="1" parent="1">
<mxGeometry x="800" y="20" width="120" height="30" as="geometry" />
</mxCell>
<mxCell id="browser" value="浏览器&#xa;(Browser)" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#e3f2fd;strokeColor=#1976d2;fontColor=#0d47a1;" vertex="1" parent="1">
<mxGeometry x="860" y="60" width="120" height="50" as="geometry" />
</mxCell>
<!-- 前端展示层 -->
<mxCell id="layer2" value="前端展示层" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=16;fontStyle=1;fontColor=#1a1a2e;" vertex="1" parent="1">
<mxGeometry x="800" y="140" width="120" height="30" as="geometry" />
</mxCell>
<mxCell id="frontend_box" value="" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#f3e5f5;strokeColor=#7b1fa2;strokeWidth=2;" vertex="1" parent="1">
<mxGeometry x="200" y="180" width="1520" height="140" as="geometry" />
</mxCell>
<mxCell id="frontend_title" value="React 18 + TypeScript + Vite + shadcn/ui" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=14;fontStyle=1;fontColor=#4a148c;" vertex="1" parent="1">
<mxGeometry x="760" y="185" width="280" height="25" as="geometry" />
</mxCell>
<mxCell id="dashboard" value="Dashboard&#xa;首页仪表盘" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ce93d8;strokeColor=#8e24aa;fontColor=#fff;" vertex="1" parent="1">
<mxGeometry x="240" y="220" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="documents" value="Documents&#xa;文档管理" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ce93d8;strokeColor=#8e24aa;fontColor=#fff;" vertex="1" parent="1">
<mxGeometry x="400" y="220" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="template" value="TemplateFill&#xa;智能填表" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ce93d8;strokeColor=#8e24aa;fontColor=#fff;" vertex="1" parent="1">
<mxGeometry x="560" y="220" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="instruction" value="Instruction&#xa;指令助手" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ce93d8;strokeColor=#8e24aa;fontColor=#fff;" vertex="1" parent="1">
<mxGeometry x="720" y="220" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="taskhistory" value="TaskHistory&#xa;任务历史" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ce93d8;strokeColor=#8e24aa;fontColor=#fff;" vertex="1" parent="1">
<mxGeometry x="880" y="220" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="frontend_libs" value="Recharts + Lucide Icons + React Router" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=11;fontColor=#6a1b9a;" vertex="1" parent="1">
<mxGeometry x="1040" y="250" width="280" height="25" as="geometry" />
</mxCell>
<!-- 连接线:浏览器到前端 -->
<mxCell id="conn1" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#1976d2;strokeWidth=2;" edge="1" parent="1" source="browser" target="frontend_box">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<!-- 后端服务层 -->
<mxCell id="layer3" value="后端服务层" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=16;fontStyle=1;fontColor=#1a1a2e;" vertex="1" parent="1">
<mxGeometry x="800" y="350" width="120" height="30" as="geometry" />
</mxCell>
<mxCell id="backend_box" value="" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#e8f5e9;strokeColor=#388e3c;strokeWidth=2;" vertex="1" parent="1">
<mxGeometry x="200" y="390" width="1520" height="180" as="geometry" />
</mxCell>
<mxCell id="backend_title" value="FastAPI + Uvicorn + Celery" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=14;fontStyle=1;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="800" y="395" width="200" height="25" as="geometry" />
</mxCell>
<mxCell id="upload" value="文档上传&#xa;/upload/*" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#81c784;strokeColor=#2e7d32;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="240" y="430" width="140" height="60" as="geometry" />
</mxCell>
<mxCell id="ai" value="AI分析&#xa;/ai/*" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#81c784;strokeColor=#2e7d32;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="420" y="430" width="140" height="60" as="geometry" />
</mxCell>
<mxCell id="rag" value="RAG检索&#xa;/rag/*" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#81c784;strokeColor=#2e7d32;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="600" y="430" width="140" height="60" as="geometry" />
</mxCell>
<mxCell id="template_api" value="模板填充&#xa;/templates/*" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#81c784;strokeColor=#2e7d32;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="780" y="430" width="140" height="60" as="geometry" />
</mxCell>
<mxCell id="instruction_api" value="指令解析&#xa;/instruction/*" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#81c784;strokeColor=#2e7d32;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="960" y="430" width="140" height="60" as="geometry" />
</mxCell>
<mxCell id="visualization" value="可视化&#xa;/visualization/*" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#81c784;strokeColor=#2e7d32;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="1140" y="430" width="140" height="60" as="geometry" />
</mxCell>
<mxCell id="celery" value="Celery&#xa;任务调度" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#a5d6a7;strokeColor=#2e7d32;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="1320" y="430" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="logging" value="监控日志" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#a5d6a7;strokeColor=#2e7d32;fontColor=#1b5e20;" vertex="1" parent="1">
<mxGeometry x="1480" y="430" width="100" height="60" as="geometry" />
</mxCell>
<!-- 连接线:前端到后端 -->
<mxCell id="conn2" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#388e3c;strokeWidth=2;dashed=1;dashPattern=8 8;" edge="1" parent="1" source="frontend_box" target="backend_box">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<!-- AI服务层 -->
<mxCell id="layer4" value="AI服务层" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=16;fontStyle=1;fontColor=#1a1a2e;" vertex="1" parent="1">
<mxGeometry x="800" y="600" width="120" height="30" as="geometry" />
</mxCell>
<mxCell id="ai_box" value="" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#fff3e0;strokeColor=#f57c00;strokeWidth=2;" vertex="1" parent="1">
<mxGeometry x="300" y="640" width="1320" height="120" as="geometry" />
</mxCell>
<mxCell id="llm_title" value="LLMService - 大模型服务" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=14;fontStyle=1;fontColor=#e65100;" vertex="1" parent="1">
<mxGeometry x="820" y="645" width="200" height="25" as="geometry" />
</mxCell>
<mxCell id="minimax" value="MiniMax-Text-01" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#ffcc80;strokeColor=#ef6c00;fontColor=#e65100;" vertex="1" parent="1">
<mxGeometry x="400" y="680" width="150" height="50" as="geometry" />
</mxCell>
<mxCell id="deepseek" value="DeepSeek-chat" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#ffcc80;strokeColor=#ef6c00;fontColor=#e65100;" vertex="1" parent="1">
<mxGeometry x="600" y="680" width="150" height="50" as="geometry" />
</mxCell>
<mxCell id="excel_ai" value="ExcelAIService" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ffe0b2;strokeColor=#f57c00;fontColor=#e65100;" vertex="1" parent="1">
<mxGeometry x="820" y="680" width="130" height="50" as="geometry" />
</mxCell>
<mxCell id="word_ai" value="WordAIService" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ffe0b2;strokeColor=#f57c00;fontColor=#e65100;" vertex="1" parent="1">
<mxGeometry x="980" y="680" width="130" height="50" as="geometry" />
</mxCell>
<mxCell id="md_ai" value="MarkdownAIService" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ffe0b2;strokeColor=#f57c00;fontColor=#e65100;" vertex="1" parent="1">
<mxGeometry x="1140" y="680" width="130" height="50" as="geometry" />
</mxCell>
<mxCell id="txt_ai" value="TxtAIService" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ffe0b2;strokeColor=#f57c00;fontColor=#e65100;" vertex="1" parent="1">
<mxGeometry x="1300" y="680" width="130" height="50" as="geometry" />
</mxCell>
<!-- 连接线后端到AI -->
<mxCell id="conn3" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#f57c00;strokeWidth=2;dashed=1;dashPattern=8 8;" edge="1" parent="1" source="backend_box" target="ai_box">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<!-- 数据存储层 -->
<mxCell id="layer5" value="数据存储层" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=16;fontStyle=1;fontColor=#1a1a2e;" vertex="1" parent="1">
<mxGeometry x="800" y="790" width="120" height="30" as="geometry" />
</mxCell>
<mxCell id="mongodb" value="MongoDB&#xa;文档数据库&#xa;&#xa;• 原始文档内容&#xa;• 元数据信息&#xa;• 文档标签&#xa;• 处理状态" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#e0e0e0;strokeColor=#616161;fontColor=#212121;align=left;spacingLeft=10;" vertex="1" parent="1">
<mxGeometry x="240" y="830" width="200" height="160" as="geometry" />
</mxCell>
<mxCell id="mysql" value="MySQL&#xa;关系数据库&#xa;&#xa;• Excel表格数据&#xa;• 结构化数据&#xa;• 字段描述&#xa;• RAG索引" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#e0e0e0;strokeColor=#616161;fontColor=#212121;align=left;spacingLeft=10;" vertex="1" parent="1">
<mxGeometry x="520" y="830" width="200" height="160" as="geometry" />
</mxCell>
<mxCell id="redis" value="Redis&#xa;缓存/队列&#xa;&#xa;• 会话缓存&#xa;• 任务队列&#xa;• Celery broker&#xa;• 临时数据" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#e0e0e0;strokeColor=#616161;fontColor=#212121;align=left;spacingLeft=10;" vertex="1" parent="1">
<mxGeometry x="800" y="830" width="200" height="160" as="geometry" />
</mxCell>
<mxCell id="faiss" value="FAISS&#xa;向量数据库&#xa;&#xa;• 文档向量索引&#xa;• 语义相似度&#xa;• RAG检索&#xa;• sentence-transformers" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#e0e0e0;strokeColor=#616161;fontColor=#212121;align=left;spacingLeft=10;" vertex="1" parent="1">
<mxGeometry x="1080" y="830" width="240" height="160" as="geometry" />
</mxCell>
<!-- 连接线AI到存储 -->
<mxCell id="conn4" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#616161;strokeWidth=2;dashed=1;dashPattern=8 8;" edge="1" parent="1" source="ai_box" target="mongodb">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="conn5" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#616161;strokeWidth=2;dashed=1;dashPattern=8 8;" edge="1" parent="1" source="ai_box" target="mysql">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="conn6" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#616161;strokeWidth=2;dashed=1;dashPattern=8 8;" edge="1" parent="1" source="ai_box" target="redis">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="conn7" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeColor=#616161;strokeWidth=2;dashed=1;dashPattern=8 8;" edge="1" parent="1" source="ai_box" target="faiss">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<!-- 标注 -->
<mxCell id="arrow1" value="HTTP/HTTPS&#xa;WebSocket" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=10;fontColor=#1976d2;" vertex="1" parent="1">
<mxGeometry x="1020" y="130" width="80" height="30" as="geometry" />
</mxCell>
<mxCell id="arrow2" value="API调用" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=10;fontColor=#388e3c;" vertex="1" parent="1">
<mxGeometry x="1020" y="570" width="60" height="20" as="geometry" />
</mxCell>
<mxCell id="arrow3" value="数据读写" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=10;fontColor=#616161;" vertex="1" parent="1">
<mxGeometry x="1020" y="770" width="60" height="20" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>

36
frontend/Dockerfile Normal file
View File

@@ -0,0 +1,36 @@
# ============================================================
# FilesReadSystem Frontend - React + Vite
# 多阶段构建: Node 构建 -> Nginx 运行
# ============================================================
# === 阶段1: 构建阶段 ===
FROM node:20-alpine AS builder
WORKDIR /app
# 复制 package 文件和锁文件
COPY package.json pnpm-lock.yaml* ./
# 安装 pnpm 并安装依赖
RUN npm install -g pnpm && \
pnpm install --frozen-lockfile
# 复制源码
COPY . .
# 构建生产版本
RUN pnpm build
# === 阶段2: 运行阶段 ===
FROM nginx:alpine
# 复制 nginx 配置
COPY nginx.conf /etc/nginx/conf.d/default.conf
# 复制构建产物
COPY --from=builder /app/dist /usr/share/nginx/html
# 暴露端口
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

47
frontend/nginx.conf Normal file
View File

@@ -0,0 +1,47 @@
# ============================================================
# FilesReadSystem Nginx 配置
# 反向代理 API 请求到后端
# ============================================================
server {
listen 80;
server_name localhost;
# 前端静态文件
root /usr/share/nginx/html;
index index.html;
# SPA 支持 - 所有请求都尝试返回 index.html
location / {
try_files $uri $uri/ /index.html;
}
# 静态资源缓存
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$ {
expires 1y;
add_header Cache-Control "public, immutable";
}
# API 反向代理到后端
location /api/ {
proxy_pass http://backend:8000/api/;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# 超时设置
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# 文件上传代理
location /uploads/ {
proxy_pass http://backend:8000/uploads/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
client_max_body_size 100M;
}
}

View File

@@ -8,7 +8,8 @@ import {
Menu, Menu,
ChevronRight, ChevronRight,
Sparkles, Sparkles,
Clock Clock,
FileDown
} from 'lucide-react'; } from 'lucide-react';
import { Button } from '@/components/ui/button'; import { Button } from '@/components/ui/button';
import { cn } from '@/lib/utils'; import { cn } from '@/lib/utils';
@@ -19,6 +20,7 @@ const navItems = [
{ name: '文档中心', path: '/documents', icon: FileText }, { name: '文档中心', path: '/documents', icon: FileText },
{ name: '智能填表', path: '/form-fill', icon: TableProperties }, { name: '智能填表', path: '/form-fill', icon: TableProperties },
{ name: '智能助手', path: '/assistant', icon: MessageSquareCode }, { name: '智能助手', path: '/assistant', icon: MessageSquareCode },
{ name: '文档转PDF', path: '/pdf-converter', icon: FileDown },
{ name: '任务历史', path: '/task-history', icon: Clock }, { name: '任务历史', path: '/task-history', icon: Clock },
]; ];
@@ -32,7 +34,7 @@ const MainLayout: React.FC = () => {
<FileText size={24} /> <FileText size={24} />
</div> </div>
<div className="flex flex-col"> <div className="flex flex-col">
<span className="font-bold text-lg tracking-tight text-sidebar-foreground"></span> <span className="font-bold text-lg tracking-tight text-sidebar-foreground"></span>
<span className="text-xs text-muted-foreground"></span> <span className="text-xs text-muted-foreground"></span>
</div> </div>
</div> </div>
@@ -66,7 +68,7 @@ const MainLayout: React.FC = () => {
<Sparkles size={20} className="text-primary" /> <Sparkles size={20} className="text-primary" />
</div> </div>
<div className="flex flex-col overflow-hidden"> <div className="flex flex-col overflow-hidden">
<span className="font-semibold text-sm truncate"></span> <span className="font-semibold text-sm truncate"></span>
<span className="text-[10px] uppercase tracking-wider text-muted-foreground"></span> <span className="text-[10px] uppercase tracking-wider text-muted-foreground"></span>
</div> </div>
</div> </div>

View File

@@ -250,6 +250,98 @@ export interface AIExcelAnalyzeResult {
error?: string; error?: string;
} }
// ==================== Word/TXT AI 分析类型 ====================
export type WordAnalysisType = 'structured' | 'charts';
export type TxtAnalysisType = 'structured' | 'charts';
export interface WordAIStructuredResult {
success: boolean;
result?: {
success?: boolean;
type?: string;
headers?: string[];
rows?: string[][];
key_values?: Record<string, string>;
list_items?: string[];
summary?: string;
error?: string;
};
error?: string;
}
export interface WordAIChartsResult {
success: boolean;
result?: {
success?: boolean;
charts?: {
histograms?: Array<any>;
bar_charts?: Array<any>;
box_plots?: Array<any>;
correlation?: any;
};
statistics?: {
numeric?: Record<string, any>;
categorical?: Record<string, any>;
};
distributions?: Record<string, any>;
row_count?: number;
column_count?: number;
error?: string;
};
error?: string;
}
export interface TxtAIStructuredResult {
success: boolean;
result?: {
success?: boolean;
type?: string;
tables?: Array<{
headers?: string[];
rows?: string[][];
}>;
key_values?: Record<string, string>;
list_items?: string[];
summary?: string;
error?: string;
};
error?: string;
}
export interface TxtAIChartsResult {
success: boolean;
result?: {
success?: boolean;
charts?: {
histograms?: Array<any>;
bar_charts?: Array<any>;
box_plots?: Array<any>;
correlation?: any;
};
statistics?: {
numeric?: Record<string, any>;
categorical?: Record<string, any>;
};
distributions?: Record<string, any>;
row_count?: number;
column_count?: number;
key_statistics?: Array<{
name?: string;
value?: string;
trend?: string;
description?: string;
}>;
chart_suggestions?: Array<{
chart_type?: string;
title?: string;
data_source?: string;
}>;
error?: string;
};
error?: string;
}
// ==================== API 封装 ==================== // ==================== API 封装 ====================
export const backendApi = { export const backendApi = {
@@ -1061,6 +1153,120 @@ export const backendApi = {
} }
}, },
// ==================== PDF 转换 API ====================
/**
* 将文件转换为 PDF
*/
/**
* PDF转换并直接下载使用XHR支持IDM拦截
*/
async convertAndDownloadPdf(file: File): Promise<void> {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
xhr.open('POST', `${BACKEND_BASE_URL}/pdf/convert`);
xhr.onload = function() {
if (xhr.status >= 200 && xhr.status < 300) {
// 创建 blob 并触发下载
const blob = xhr.response;
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `${file.name.replace(/\.[^.]+$/, '')}.pdf`;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
resolve();
} else {
reject(new Error(`转换失败: ${xhr.status}`));
}
};
xhr.onerror = function() {
reject(new Error('网络错误'));
};
const formData = new FormData();
formData.append('file', file);
xhr.responseType = 'blob';
xhr.send(formData);
});
},
/**
* PDF转换返回Blob
*/
async convertToPdf(file: File): Promise<Blob> {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
xhr.open('POST', `${BACKEND_BASE_URL}/pdf/convert`);
xhr.onload = function() {
if (xhr.status >= 200 && xhr.status < 300) {
resolve(xhr.response);
} else {
reject(new Error(`转换失败: ${xhr.status}`));
}
};
xhr.onerror = function() {
reject(new Error('网络错误'));
};
const formData = new FormData();
formData.append('file', file);
xhr.responseType = 'blob';
xhr.send(formData);
});
},
/**
* 批量将文件转换为 PDF
*/
async batchConvertToPdf(files: File[]): Promise<Blob> {
const formData = new FormData();
files.forEach(file => formData.append('files', file));
const url = `${BACKEND_BASE_URL}/pdf/convert/batch`;
try {
const response = await fetch(url, {
method: 'POST',
body: formData,
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.detail || '批量PDF转换失败');
}
return await response.blob();
} catch (error) {
console.error('批量PDF转换失败:', error);
throw error;
}
},
/**
* 获取支持的 PDF 转换格式
*/
async getPdfSupportedFormats(): Promise<{
success: boolean;
formats: string[];
}> {
const url = `${BACKEND_BASE_URL}/pdf/formats`;
try {
const response = await fetch(url);
if (!response.ok) throw new Error('获取支持的格式失败');
return await response.json();
} catch (error) {
console.error('获取支持的格式失败:', error);
return { success: false, formats: ['docx', 'xlsx', 'txt', 'md'] };
}
}
}; };
// ==================== AI 分析 API ==================== // ==================== AI 分析 API ====================
@@ -1095,11 +1301,19 @@ export const aiApi = {
* 上传并使用 AI 分析 Excel 文件 * 上传并使用 AI 分析 Excel 文件
*/ */
async analyzeExcel( async analyzeExcel(
file: File, file: File | null,
options: AIAnalyzeOptions = {} options: AIAnalyzeOptions = {},
docId: string | null = null
): Promise<AIExcelAnalyzeResult> { ): Promise<AIExcelAnalyzeResult> {
const formData = new FormData(); const formData = new FormData();
formData.append('file', file);
if (docId) {
formData.append('doc_id', docId);
} else if (file) {
formData.append('file', file);
} else {
throw new Error('必须提供文件或文档ID');
}
const params = new URLSearchParams(); const params = new URLSearchParams();
if (options.userPrompt) { if (options.userPrompt) {
@@ -1176,7 +1390,9 @@ export const aiApi = {
try { try {
const response = await fetch(url); const response = await fetch(url);
if (!response.ok) throw new Error('获取分析类型失败'); if (!response.ok) throw new Error('获取分析类型失败');
return await response.json(); const data = await response.json();
// 转换后端返回格式 {excel_types: [], markdown_types: []} 为前端期望的 {types: []}
return { types: data.excel_types || [] };
} catch (error) { } catch (error) {
console.error('获取分析类型失败:', error); console.error('获取分析类型失败:', error);
throw error; throw error;
@@ -1187,15 +1403,21 @@ export const aiApi = {
* 上传并使用 AI 分析 Markdown 文件 * 上传并使用 AI 分析 Markdown 文件
*/ */
async analyzeMarkdown( async analyzeMarkdown(
file: File, file: File | null,
options: { options: {
docId?: string;
analysisType?: MarkdownAnalysisType; analysisType?: MarkdownAnalysisType;
userPrompt?: string; userPrompt?: string;
sectionNumber?: string; sectionNumber?: string;
} = {} } = {}
): Promise<AIMarkdownAnalyzeResult> { ): Promise<AIMarkdownAnalyzeResult> {
const formData = new FormData(); const formData = new FormData();
formData.append('file', file); if (file) {
formData.append('file', file);
}
if (options.docId) {
formData.append('doc_id', options.docId);
}
const params = new URLSearchParams(); const params = new URLSearchParams();
if (options.analysisType) { if (options.analysisType) {
@@ -1337,28 +1559,31 @@ export const aiApi = {
}, },
/** /**
* 上传并使用 AI 分析 TXT 文本文件,提取结构化数据 * 上传并使用 AI 分析 TXT 文本文件,提取结构化数据或生成图表
*/ */
async analyzeTxt( async analyzeTxt(
file: File file: File | null,
docId: string | null = null,
analysisType: TxtAnalysisType = 'structured'
): Promise<{ ): Promise<{
success: boolean; success: boolean;
filename?: string; filename?: string;
structured_data?: { analysis_type?: string;
table?: { result?: any;
columns?: string[];
rows?: string[][];
};
summary?: string;
key_value_pairs?: Array<{ key: string; value: string }>;
numeric_data?: Array<{ name: string; value: number; unit?: string }>;
};
error?: string; error?: string;
}> { }> {
const formData = new FormData(); const formData = new FormData();
formData.append('file', file); if (file) {
formData.append('file', file);
}
if (docId) {
formData.append('doc_id', docId);
}
const url = `${BACKEND_BASE_URL}/ai/analyze/txt`; const params = new URLSearchParams();
params.append('analysis_type', analysisType);
const url = `${BACKEND_BASE_URL}/ai/analyze/txt?${params.toString()}`;
try { try {
const response = await fetch(url, { const response = await fetch(url, {
@@ -1480,28 +1705,35 @@ export const aiApi = {
// ==================== Word AI 解析 ==================== // ==================== Word AI 解析 ====================
/** /**
* 使用 AI 解析 Word 文档,提取结构化数据 * 使用 AI 解析 Word 文档,提取结构化数据或生成图表
*/ */
async analyzeWordWithAI( async analyzeWordWithAI(
file: File, file: File | null,
userHint: string = '' docId: string | null = null,
userHint: string = '',
analysisType: WordAnalysisType = 'structured'
): Promise<{ ): Promise<{
success: boolean; success: boolean;
type?: string; filename?: string;
headers?: string[]; analysis_type?: string;
rows?: string[][]; result?: any;
key_values?: Record<string, string>;
list_items?: string[];
summary?: string;
error?: string; error?: string;
}> { }> {
const formData = new FormData(); const formData = new FormData();
formData.append('file', file); if (file) {
formData.append('file', file);
}
if (docId) {
formData.append('doc_id', docId);
}
if (userHint) { if (userHint) {
formData.append('user_hint', userHint); formData.append('user_hint', userHint);
} }
const url = `${BACKEND_BASE_URL}/ai/analyze/word`; const params = new URLSearchParams();
params.append('analysis_type', analysisType);
const url = `${BACKEND_BASE_URL}/ai/analyze/word?${params.toString()}`;
try { try {
const response = await fetch(url, { const response = await fetch(url, {
@@ -1687,5 +1919,6 @@ export const aiApi = {
console.error('获取会话列表失败:', error); console.error('获取会话列表失败:', error);
return { success: false, conversations: [] }; return { success: false, conversations: [] };
} }
} },
}; };

View File

@@ -41,7 +41,7 @@ const Assistant: React.FC = () => {
{ {
id: '1', id: '1',
role: 'assistant', role: 'assistant',
content: '您好!我是智联文档 AI 助手。您可以告诉我您想对文档进行的操作,例如:\n- "帮我列出最近上传的所有 docx 文档"\n- "从 2026 财报文档中提取出关键的利润数据"\n- "帮我创建一个汇总各部门报销单的填表任务"\n\n请问有什么我可以帮您的', content: '您好!我是表易智融 AI 助手。您可以告诉我您想对文档进行的操作,例如:\n- "帮我列出最近上传的所有 docx 文档"\n- "从 2026 财报文档中提取出关键的利润数据"\n- "帮我创建一个汇总各部门报销单的填表任务"\n\n请问有什么我可以帮您的',
created_at: new Date().toISOString() created_at: new Date().toISOString()
} }
]); ]);

View File

@@ -89,7 +89,7 @@ const Dashboard: React.FC = () => {
<section className="flex flex-col md:flex-row md:items-center justify-between gap-4"> <section className="flex flex-col md:flex-row md:items-center justify-between gap-4">
<div className="space-y-1"> <div className="space-y-1">
<h1 className="text-3xl font-extrabold tracking-tight"> <h1 className="text-3xl font-extrabold tracking-tight">
使 <span className="text-primary"></span> 👋 使 <span className="text-primary"></span> 👋
</h1> </h1>
<p className="text-muted-foreground"></p> <p className="text-muted-foreground"></p>
</div> </div>

View File

@@ -10,7 +10,7 @@ import {
ChevronDown, ChevronDown,
ChevronUp, ChevronUp,
FileSpreadsheet, FileSpreadsheet,
File, File as FileIcon,
Table, Table,
CheckCircle, CheckCircle,
AlertCircle, AlertCircle,
@@ -107,6 +107,15 @@ const Documents: React.FC = () => {
const [mdStreaming, setMdStreaming] = useState(false); const [mdStreaming, setMdStreaming] = useState(false);
const [mdStreamingContent, setMdStreamingContent] = useState(''); const [mdStreamingContent, setMdStreamingContent] = useState('');
// Word AI 分析相关状态
const [wordAnalysis, setWordAnalysis] = useState<any>(null);
const [wordAnalysisType, setWordAnalysisType] = useState<'structured' | 'charts'>('structured');
const [wordUserHint, setWordUserHint] = useState('');
// TXT AI 分析相关状态
const [txtAnalysis, setTxtAnalysis] = useState<any>(null);
const [txtAnalysisType, setTxtAnalysisType] = useState<'structured' | 'charts'>('structured');
// RAG 向量检索相关状态 // RAG 向量检索相关状态
const [ragStatus, setRagStatus] = useState<{ vector_count: number; collections: string[] } | null>(null); const [ragStatus, setRagStatus] = useState<{ vector_count: number; collections: string[] } | null>(null);
const [ragSearchQuery, setRagSearchQuery] = useState(''); const [ragSearchQuery, setRagSearchQuery] = useState('');
@@ -114,6 +123,17 @@ const Documents: React.FC = () => {
const [ragResults, setRagResults] = useState<any[]>([]); const [ragResults, setRagResults] = useState<any[]>([]);
const [ragRebuilding, setRagRebuilding] = useState(false); const [ragRebuilding, setRagRebuilding] = useState(false);
// 选中的文档详情
const [selectedDocument, setSelectedDocument] = useState<{
doc_id: string;
original_filename: string;
doc_type: string;
content?: string;
structured_data?: any;
metadata?: any;
} | null>(null);
const [loadingDocument, setLoadingDocument] = useState(false);
// 解析选项 // 解析选项
const [parseOptions, setParseOptions] = useState({ const [parseOptions, setParseOptions] = useState({
parseAllSheets: false, parseAllSheets: false,
@@ -268,6 +288,33 @@ const Documents: React.FC = () => {
return { ...s, status: 'failed', progress: 0, error: fileResult?.error || '处理失败' }; return { ...s, status: 'failed', progress: 0, error: fileResult?.error || '处理失败' };
} }
})); }));
// 设置第一个成功文件的 uploadedFile
const firstSuccessIdx = fileResults.findIndex((fr: any) => fr?.success);
if (firstSuccessIdx >= 0 && acceptedFiles[firstSuccessIdx]) {
const firstFile = acceptedFiles[firstSuccessIdx];
const firstResult = fileResults[firstSuccessIdx];
const ext = firstFile.name.split('.').pop()?.toLowerCase();
// 设置 uploadedFile
setUploadedFile(firstFile);
// 对于 Excel 文件,获取 parseResult
if (ext === 'xlsx' || ext === 'xls') {
// 调用 parseDocument 获取 parseResult
if (firstResult?.file_path) {
try {
const parseResult = await backendApi.parseDocument(firstResult.file_path);
if (parseResult.success) {
setParseResult(parseResult as any);
}
} catch (parseErr) {
console.warn('获取 parseResult 失败:', parseErr);
}
}
}
}
loadDocuments(); loadDocuments();
return; return;
} else if (status.status === 'failure') { } else if (status.status === 'failure') {
@@ -425,11 +472,17 @@ const Documents: React.FC = () => {
setAnalysisCharts(null); setAnalysisCharts(null);
try { try {
const result = await aiApi.analyzeExcel(uploadedFile, { // 判断是从历史文档还是本地上传
userPrompt: aiOptions.userPrompt, const docId = selectedDocument?.doc_id && uploadedFile.size === 0 ? selectedDocument.doc_id : null;
analysisType: aiOptions.analysisType, const result = await aiApi.analyzeExcel(
parseAllSheets: aiOptions.parseAllSheetsForAI uploadedFile.size > 0 ? uploadedFile : null,
}); {
userPrompt: aiOptions.userPrompt,
analysisType: aiOptions.analysisType,
parseAllSheets: aiOptions.parseAllSheetsForAI
},
docId
);
if (result.success) { if (result.success) {
toast.success('AI 分析完成'); toast.success('AI 分析完成');
@@ -446,24 +499,79 @@ const Documents: React.FC = () => {
// 基于 AI 分析生成图表 // 基于 AI 分析生成图表
const handleGenerateCharts = async () => { const handleGenerateCharts = async () => {
if (!aiAnalysis || !aiAnalysis.success) { // 检查是否有任何 AI 分析结果
const hasExcelAI = aiAnalysis?.success;
const hasMdAI = mdAnalysis?.success;
const hasWordAI = wordAnalysis?.success;
const hasTxtAI = txtAnalysis?.success;
if (!hasExcelAI && !hasMdAI && !hasWordAI && !hasTxtAI) {
toast.error('请先进行 AI 分析'); toast.error('请先进行 AI 分析');
return; return;
} }
// 如果是 Markdown 分析已有图表,直接显示
if (hasMdAI && mdAnalysis?.chart_data?.tables) {
setAnalysisCharts({
success: true,
charts: { tables: mdAnalysis.chart_data.tables },
statistics: mdAnalysis.chart_data.key_statistics
});
toast.success('图表生成完成');
return;
}
// 如果是 Word 分析已有图表,直接显示
if (hasWordAI && wordAnalysis?.result?.charts) {
setAnalysisCharts({
success: true,
charts: wordAnalysis.result.charts,
statistics: wordAnalysis.result.statistics
});
toast.success('图表生成完成');
return;
}
// 如果是 TXT 分析已有图表,直接显示
if (hasTxtAI && txtAnalysis?.result?.charts) {
setAnalysisCharts({
success: true,
charts: txtAnalysis.result.charts,
statistics: txtAnalysis.result.statistics
});
toast.success('图表生成完成');
return;
}
// 尝试从各种分析结果中提取文本并生成图表
let analysisText = ''; let analysisText = '';
if (aiAnalysis.analysis?.analysis) { let fileType = 'unknown';
analysisText = aiAnalysis.analysis.analysis;
} else if (aiAnalysis.analysis?.sheets) { if (hasExcelAI) {
const sheets = aiAnalysis.analysis.sheets; if (aiAnalysis.analysis?.analysis) {
if (sheets && Object.keys(sheets).length > 0) { analysisText = aiAnalysis.analysis.analysis;
const firstSheet = Object.keys(sheets)[0]; fileType = 'excel';
analysisText = sheets[firstSheet]?.analysis || ''; } else if (aiAnalysis.analysis?.sheets) {
const sheets = aiAnalysis.analysis.sheets;
if (sheets && Object.keys(sheets).length > 0) {
const firstSheet = Object.keys(sheets)[0];
analysisText = sheets[firstSheet]?.analysis || '';
fileType = 'excel';
}
} }
} else if (hasMdAI && mdAnalysis?.analysis) {
analysisText = mdAnalysis.analysis;
fileType = 'markdown';
} else if (hasWordAI && wordAnalysis?.result?.summary) {
analysisText = wordAnalysis.result.summary;
fileType = 'word';
} else if (hasTxtAI && txtAnalysis?.result?.summary) {
analysisText = txtAnalysis.result.summary;
fileType = 'txt';
} }
if (!analysisText?.trim()) { if (!analysisText?.trim()) {
toast.error('无法获取 AI 分析结果'); toast.error('无法获取 AI 分析文本结果');
return; return;
} }
@@ -474,7 +582,7 @@ const Documents: React.FC = () => {
const result = await aiApi.extractAndGenerateCharts({ const result = await aiApi.extractAndGenerateCharts({
analysis_text: analysisText, analysis_text: analysisText,
original_filename: uploadedFile?.name || 'unknown', original_filename: uploadedFile?.name || 'unknown',
file_type: 'excel' file_type: fileType
}); });
if (result.success) { if (result.success) {
@@ -592,6 +700,9 @@ const Documents: React.FC = () => {
const result = await backendApi.deleteDocument(docId); const result = await backendApi.deleteDocument(docId);
if (result.success) { if (result.success) {
setDocuments(prev => prev.filter(d => d.doc_id !== docId)); setDocuments(prev => prev.filter(d => d.doc_id !== docId));
if (selectedDocument?.doc_id === docId) {
setSelectedDocument(null);
}
toast.success('文档已删除'); toast.success('文档已删除');
} }
} catch (err: any) { } catch (err: any) {
@@ -599,6 +710,101 @@ const Documents: React.FC = () => {
} }
}; };
const handleSelectDocument = async (docId: string) => {
setLoadingDocument(true);
// 重置所有 AI 分析结果,避免显示上一个文档的分析
setAiAnalysis(null);
setAnalysisCharts(null);
setMdAnalysis(null);
setWordAnalysis(null);
setTxtAnalysis(null);
try {
const result = await backendApi.getDocument(docId);
if (result.success && result.document) {
setSelectedDocument(result.document);
const doc = result.document;
// 优先使用 file_path 调用 parseDocument 获取完整解析结果
const filePath = doc.metadata?.file_path;
if (filePath) {
try {
const parseResult = await backendApi.parseDocument(filePath);
if (parseResult.success) {
setParseResult(parseResult as any);
const ext = doc.original_filename.split('.').pop()?.toLowerCase() || doc.doc_type;
const fakeFile = new File([], doc.original_filename, { type: getMimeType(ext) });
setUploadedFile(fakeFile);
toast.success('已加载文档: ' + doc.original_filename);
setLoadingDocument(false);
return;
} else {
console.warn('parseDocument returned success:false, using fallback');
}
} catch (parseErr) {
console.warn('parseDocument failed, fallback to structured_data:', parseErr);
}
}
// 后备:使用 structured_data 构建 parseResult
const ext = doc.original_filename.split('.').pop()?.toLowerCase() || doc.doc_type;
const fakeFile = new File([], doc.original_filename, { type: getMimeType(ext) });
if (doc.structured_data) {
const mockParseResult: ExcelParseResult = {
success: true,
data: {},
metadata: {
filename: doc.filename,
original_filename: doc.original_filename,
extension: doc.doc_type,
doc_type: doc.doc_type as any,
file_size: doc.metadata?.file_size || 0,
}
};
if (doc.structured_data.tables && doc.structured_data.tables.length > 0) {
const firstTable = doc.structured_data.tables[0];
mockParseResult.data = {
columns: firstTable.headers || [],
rows: (firstTable.rows || []).map((row: string[]) => {
const obj: Record<string, any> = {};
(firstTable.headers || []).forEach((h: string, i: number) => {
obj[h] = row[i] || '';
});
return obj;
}),
row_count: firstTable.rows?.length || 0,
column_count: firstTable.headers?.length || 0,
};
}
if (doc.structured_data.sheets) {
mockParseResult.data.sheets = doc.structured_data.sheets;
}
setParseResult(mockParseResult);
} else if (doc.content) {
setParseResult({
success: true,
data: { content: doc.content },
metadata: {
filename: doc.filename,
original_filename: doc.original_filename,
extension: doc.doc_type,
doc_type: doc.doc_type as any,
file_size: doc.metadata?.file_size || 0,
}
});
}
setUploadedFile(fakeFile);
toast.success('已加载文档: ' + doc.original_filename);
} else {
toast.error(result.error || '获取文档详情失败');
}
} catch (err: any) {
toast.error(err.message || '获取文档详情失败');
} finally {
setLoadingDocument(false);
}
};
const filteredDocs = documents.filter(doc => const filteredDocs = documents.filter(doc =>
doc.original_filename.toLowerCase().includes(search.toLowerCase()) doc.original_filename.toLowerCase().includes(search.toLowerCase())
); );
@@ -612,7 +818,7 @@ const Documents: React.FC = () => {
case 'doc': case 'doc':
return <FileText size={28} />; return <FileText size={28} />;
default: default:
return <File size={28} />; return <FileIcon size={28} />;
} }
}; };
@@ -632,11 +838,17 @@ const Documents: React.FC = () => {
setMdAnalysis(null); setMdAnalysis(null);
try { try {
const result = await aiApi.analyzeMarkdown(uploadedFile, { // 判断是从历史文档还是本地上传
analysisType: mdAnalysisType, const docId = selectedDocument?.doc_id && uploadedFile.size === 0 ? selectedDocument.doc_id : undefined;
userPrompt: mdUserPrompt, const result = await aiApi.analyzeMarkdown(
sectionNumber: mdSelectedSection || undefined uploadedFile.size > 0 ? uploadedFile : null,
}); {
docId: docId || undefined,
analysisType: mdAnalysisType,
userPrompt: mdUserPrompt,
sectionNumber: mdSelectedSection || undefined
}
);
if (result.success) { if (result.success) {
toast.success('Markdown AI 分析完成'); toast.success('Markdown AI 分析完成');
@@ -701,6 +913,71 @@ const Documents: React.FC = () => {
} }
}; };
// Word AI 分析处理
const handleWordAnalyze = async () => {
if (!uploadedFile || !isWordFile(uploadedFile.name)) {
toast.error('请先上传 Word 文件');
return;
}
setAnalyzing(true);
setWordAnalysis(null);
try {
// 判断是从历史文档还是本地上传
const docId = selectedDocument?.doc_id && uploadedFile.size === 0 ? selectedDocument.doc_id : null;
const result = await aiApi.analyzeWordWithAI(
uploadedFile.size > 0 ? uploadedFile : null,
docId,
wordUserHint,
wordAnalysisType
);
if (result.success) {
toast.success('Word AI 分析完成');
setWordAnalysis(result);
} else {
toast.error(result.error || 'AI 分析失败');
}
} catch (error: any) {
toast.error(error.message || 'AI 分析失败');
} finally {
setAnalyzing(false);
}
};
// TXT AI 分析处理
const handleTxtAnalyze = async () => {
if (!uploadedFile || !isTxtFile(uploadedFile.name)) {
toast.error('请先上传 TXT 文件');
return;
}
setAnalyzing(true);
setTxtAnalysis(null);
try {
// 判断是从历史文档还是本地上传
const docId = selectedDocument?.doc_id && uploadedFile.size === 0 ? selectedDocument.doc_id : null;
const result = await aiApi.analyzeTxt(
uploadedFile.size > 0 ? uploadedFile : null,
docId,
txtAnalysisType
);
if (result.success) {
toast.success('TXT AI 分析完成');
setTxtAnalysis(result);
} else {
toast.error(result.error || 'AI 分析失败');
}
} catch (error: any) {
toast.error(error.message || 'AI 分析失败');
} finally {
setAnalyzing(false);
}
};
const getMdAnalysisIcon = (type: string) => { const getMdAnalysisIcon = (type: string) => {
switch (type) { switch (type) {
case 'summary': return <FileText size={20} />; case 'summary': return <FileText size={20} />;
@@ -724,6 +1001,18 @@ const Documents: React.FC = () => {
return `${(bytes / Math.pow(k, i)).toFixed(2)} ${sizes[i]}`; return `${(bytes / Math.pow(k, i)).toFixed(2)} ${sizes[i]}`;
}; };
const getMimeType = (ext: string): string => {
const mimeTypes: Record<string, string> = {
'xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'xls': 'application/vnd.ms-excel',
'docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'doc': 'application/msword',
'md': 'text/markdown',
'txt': 'text/plain',
};
return mimeTypes[ext] || 'application/octet-stream';
};
const getAnalysisIcon = (type: string) => { const getAnalysisIcon = (type: string) => {
switch (type) { switch (type) {
case 'general': return <FileText size={20} />; case 'general': return <FileText size={20} />;
@@ -739,6 +1028,16 @@ const Documents: React.FC = () => {
return ext === 'xlsx' || ext === 'xls'; return ext === 'xlsx' || ext === 'xls';
}; };
const isWordFile = (filename: string) => {
const ext = filename.split('.').pop()?.toLowerCase();
return ext === 'docx';
};
const isTxtFile = (filename: string) => {
const ext = filename.split('.').pop()?.toLowerCase();
return ext === 'txt';
};
return ( return (
<div className="space-y-8 pb-10"> <div className="space-y-8 pb-10">
<section className="flex flex-col md:flex-row md:items-center justify-between gap-4"> <section className="flex flex-col md:flex-row md:items-center justify-between gap-4">
@@ -1055,7 +1354,7 @@ const Documents: React.FC = () => {
<FileText size={12} className="mr-1" /> Markdown <FileText size={12} className="mr-1" /> Markdown
</Badge> </Badge>
<Badge variant="outline" className="bg-gray-500/10 text-gray-600 border-gray-200 text-xs"> <Badge variant="outline" className="bg-gray-500/10 text-gray-600 border-gray-200 text-xs">
<File size={12} className="mr-1" /> <FileIcon size={12} className="mr-1" />
</Badge> </Badge>
</div> </div>
</div> </div>
@@ -1064,6 +1363,38 @@ const Documents: React.FC = () => {
)} )}
</Card> </Card>
{/* 从历史文档中选择 */}
{documents.length > 0 && (
<Card className="border-none shadow-md">
<CardHeader className="pb-4">
<CardTitle className="flex items-center gap-2">
<Clock className="text-primary" size={20} />
</CardTitle>
</CardHeader>
<CardContent className="space-y-3">
<Select
value=""
onValueChange={async (docId) => {
if (!docId) return;
await handleSelectDocument(docId);
}}
>
<SelectTrigger className="bg-background">
<SelectValue placeholder="选择历史文档..." />
</SelectTrigger>
<SelectContent>
{documents.slice(0, 20).map((doc) => (
<SelectItem key={doc.doc_id} value={doc.doc_id}>
{doc.original_filename}
</SelectItem>
))}
</SelectContent>
</Select>
</CardContent>
</Card>
)}
{/* Excel 解析选项 */} {/* Excel 解析选项 */}
{uploadedFile && isExcelFile(uploadedFile.name) && ( {uploadedFile && isExcelFile(uploadedFile.name) && (
<Card className="border-none shadow-md"> <Card className="border-none shadow-md">
@@ -1238,8 +1569,117 @@ const Documents: React.FC = () => {
</Card> </Card>
)} )}
{/* Word AI 分析选项 */}
{uploadedFile && isWordFile(uploadedFile.name) && (
<Card className="border-none shadow-md bg-gradient-to-br from-blue-500/5 to-cyan-500/5">
<CardHeader className="pb-4">
<CardTitle className="flex items-center gap-2">
<Sparkles className="text-blue-500" size={20} />
Word AI
</CardTitle>
</CardHeader>
<CardContent className="space-y-4">
<div className="space-y-2">
<Label htmlFor="word-analysis-type" className="text-sm"></Label>
<Select value={wordAnalysisType} onValueChange={(value: any) => setWordAnalysisType(value)}>
<SelectTrigger id="word-analysis-type" className="bg-background">
<SelectValue />
</SelectTrigger>
<SelectContent>
<SelectItem value="structured">
<div className="flex items-center gap-2">
<FileText size={16} />
<div className="flex flex-col">
<span className="font-medium"></span>
<span className="text-xs text-muted-foreground"></span>
</div>
</div>
</SelectItem>
<SelectItem value="charts">
<div className="flex items-center gap-2">
<TrendingUp size={16} />
<div className="flex flex-col">
<span className="font-medium"></span>
<span className="text-xs text-muted-foreground"></span>
</div>
</div>
</SelectItem>
</SelectContent>
</Select>
</div>
<div className="space-y-2">
<Label htmlFor="word-user-prompt" className="text-sm"></Label>
<Textarea
id="word-user-prompt"
placeholder="例如:请提取文档中的表格数据..."
value={wordUserHint}
onChange={(e) => setWordUserHint(e.target.value)}
className="bg-background resize-none"
rows={2}
/>
</div>
<Button
onClick={handleWordAnalyze}
disabled={analyzing}
className="w-full bg-gradient-to-r from-blue-500 to-cyan-600 hover:from-blue-500/90 hover:to-cyan-600/90"
>
{analyzing ? <><Loader2 className="mr-2 animate-spin" size={16} /> ...</> : <><Sparkles className="mr-2" size={16} /> AI </>}
</Button>
</CardContent>
</Card>
)}
{/* TXT AI 分析选项 */}
{uploadedFile && isTxtFile(uploadedFile.name) && (
<Card className="border-none shadow-md bg-gradient-to-br from-amber-500/5 to-orange-500/5">
<CardHeader className="pb-4">
<CardTitle className="flex items-center gap-2">
<Sparkles className="text-amber-500" size={20} />
TXT AI
</CardTitle>
</CardHeader>
<CardContent className="space-y-4">
<div className="space-y-2">
<Label htmlFor="txt-analysis-type" className="text-sm"></Label>
<Select value={txtAnalysisType} onValueChange={(value: any) => setTxtAnalysisType(value)}>
<SelectTrigger id="txt-analysis-type" className="bg-background">
<SelectValue />
</SelectTrigger>
<SelectContent>
<SelectItem value="structured">
<div className="flex items-center gap-2">
<FileText size={16} />
<div className="flex flex-col">
<span className="font-medium"></span>
<span className="text-xs text-muted-foreground"></span>
</div>
</div>
</SelectItem>
<SelectItem value="charts">
<div className="flex items-center gap-2">
<TrendingUp size={16} />
<div className="flex flex-col">
<span className="font-medium"></span>
<span className="text-xs text-muted-foreground"></span>
</div>
</div>
</SelectItem>
</SelectContent>
</Select>
</div>
<Button
onClick={handleTxtAnalyze}
disabled={analyzing}
className="w-full bg-gradient-to-r from-amber-500 to-orange-600 hover:from-amber-500/90 hover:to-orange-600/90"
>
{analyzing ? <><Loader2 className="mr-2 animate-spin" size={16} /> ...</> : <><Sparkles className="mr-2" size={16} /> AI </>}
</Button>
</CardContent>
</Card>
)}
{/* 数据操作 */} {/* 数据操作 */}
{parseResult?.success && ( {(parseResult?.success || aiAnalysis?.success || mdAnalysis?.success || wordAnalysis?.success || txtAnalysis?.success) && (
<Card className="border-none shadow-md bg-gradient-to-br from-emerald-500/5 to-blue-500/5"> <Card className="border-none shadow-md bg-gradient-to-br from-emerald-500/5 to-blue-500/5">
<CardHeader className="pb-4"> <CardHeader className="pb-4">
<CardTitle className="flex items-center gap-2"> <CardTitle className="flex items-center gap-2">
@@ -1248,7 +1688,7 @@ const Documents: React.FC = () => {
</CardTitle> </CardTitle>
</CardHeader> </CardHeader>
<CardContent className="space-y-3"> <CardContent className="space-y-3">
<Button onClick={handleGenerateCharts} disabled={!aiAnalysis?.success || analyzingForCharts} className="w-full bg-gradient-to-r from-primary to-purple-600 hover:from-primary/90 hover:to-purple-600/90"> <Button onClick={handleGenerateCharts} disabled={!(aiAnalysis?.success || mdAnalysis?.success || wordAnalysis?.success || txtAnalysis?.success) || analyzingForCharts} className="w-full bg-gradient-to-r from-primary to-purple-600 hover:from-primary/90 hover:to-purple-600/90">
{analyzingForCharts ? <><Loader2 className="mr-2 animate-spin" size={16} />...</> : <><Brain size={16} className="mr-2" />AI </>} {analyzingForCharts ? <><Loader2 className="mr-2 animate-spin" size={16} />...</> : <><Brain size={16} className="mr-2" />AI </>}
</Button> </Button>
<Button onClick={openExportDialog} variant="outline" className="w-full"> <Button onClick={openExportDialog} variant="outline" className="w-full">
@@ -1338,6 +1778,114 @@ const Documents: React.FC = () => {
</Card> </Card>
)} )}
{/* Word AI 分析结果 */}
{wordAnalysis && (
<Card className="border-none shadow-md border-l-4 border-l-blue-500">
<CardHeader>
<div className="flex items-center justify-between">
<div className="space-y-1">
<CardTitle className="flex items-center gap-2">
<Sparkles className="text-blue-500" size={20} />
Word AI
</CardTitle>
{wordAnalysis.filename && (
<CardDescription>
{wordAnalysis.filename} {wordAnalysis.analysis_type}
</CardDescription>
)}
</div>
</div>
</CardHeader>
<CardContent className="max-h-[500px] overflow-y-auto">
{wordAnalysis.analysis_type === 'charts' && wordAnalysis.result?.charts ? (
<AIChartDisplay
charts={wordAnalysis.result.charts}
statistics={wordAnalysis.result.statistics}
distributions={wordAnalysis.result.distributions}
/>
) : wordAnalysis.result?.success === false ? (
<p className="text-sm text-destructive">{wordAnalysis.result?.error || wordAnalysis.error || '分析失败'}</p>
) : wordAnalysis.result?.summary ? (
<Markdown content={wordAnalysis.result.summary} />
) : wordAnalysis.result?.headers && wordAnalysis.result?.rows ? (
<div className="space-y-2">
<p className="text-sm font-medium"></p>
<div className="border rounded-lg overflow-x-auto">
<TableComponent>
<TableHeader>
<TableRow>
{wordAnalysis.result.headers.map((header: string, idx: number) => (
<TableHead key={idx}>{header}</TableHead>
))}
</TableRow>
</TableHeader>
<TableBody>
{wordAnalysis.result.rows.slice(0, 20).map((row: string[], idx: number) => (
<TableRow key={idx}>
{row.map((cell: string, cidx: number) => (
<TableCell key={cidx}>{cell}</TableCell>
))}
</TableRow>
))}
</TableBody>
</TableComponent>
</div>
</div>
) : (
<p className="text-sm text-muted-foreground"></p>
)}
</CardContent>
</Card>
)}
{/* TXT AI 分析结果 */}
{txtAnalysis && (
<Card className="border-none shadow-md border-l-4 border-l-amber-500">
<CardHeader>
<div className="flex items-center justify-between">
<div className="space-y-1">
<CardTitle className="flex items-center gap-2">
<Sparkles className="text-amber-500" size={20} />
TXT AI
</CardTitle>
{txtAnalysis.filename && (
<CardDescription>
{txtAnalysis.filename} {txtAnalysis.analysis_type}
</CardDescription>
)}
</div>
</div>
</CardHeader>
<CardContent className="max-h-[500px] overflow-y-auto">
{txtAnalysis.analysis_type === 'charts' && txtAnalysis.result?.charts ? (
<AIChartDisplay
charts={txtAnalysis.result.charts}
statistics={txtAnalysis.result.statistics}
distributions={txtAnalysis.result.distributions}
/>
) : txtAnalysis.result?.success === false ? (
<p className="text-sm text-destructive">{txtAnalysis.result?.error || txtAnalysis.error || '分析失败'}</p>
) : txtAnalysis.result?.summary ? (
<Markdown content={txtAnalysis.result.summary} />
) : txtAnalysis.result?.key_values && Object.keys(txtAnalysis.result.key_values || {}).length > 0 ? (
<div className="space-y-2">
<p className="text-sm font-medium"></p>
<div className="grid grid-cols-2 gap-2">
{Object.entries(txtAnalysis.result.key_values || {}).map(([key, value]: [string, any]) => (
<div key={key} className="flex gap-2 p-2 bg-muted/30 rounded-lg">
<span className="font-medium text-sm">{key}:</span>
<span className="text-sm text-muted-foreground">{String(value)}</span>
</div>
))}
</div>
</div>
) : (
<p className="text-sm text-muted-foreground"></p>
)}
</CardContent>
</Card>
)}
{/* 图表显示 */} {/* 图表显示 */}
{analysisCharts && ( {analysisCharts && (
<Card className="border-none shadow-md border-l-4 border-l-indigo-500"> <Card className="border-none shadow-md border-l-4 border-l-indigo-500">
@@ -1482,6 +2030,95 @@ const Documents: React.FC = () => {
</CardContent> </CardContent>
</Card> </Card>
{/* 已上传文档详情 */}
{selectedDocument && (
<Card className="border-none shadow-md border-l-4 border-l-cyan-500">
<CardHeader>
<div className="flex items-center justify-between">
<div className="space-y-1">
<CardTitle className="flex items-center gap-2">
<FileText className="text-cyan-500" size={20} />
</CardTitle>
<CardDescription>
{selectedDocument.original_filename} {selectedDocument.doc_type.toUpperCase()}
</CardDescription>
</div>
<Button variant="ghost" size="sm" onClick={() => setSelectedDocument(null)}>
</Button>
</div>
</CardHeader>
<CardContent className="max-h-[500px] overflow-y-auto">
{loadingDocument ? (
<div className="flex items-center justify-center py-8">
<Loader2 className="animate-spin" size={24} />
<span className="ml-2">...</span>
</div>
) : (
<div className="space-y-4">
{selectedDocument.structured_data?.tables && selectedDocument.structured_data.tables.length > 0 && (
<div className="space-y-2">
<p className="text-sm font-medium"></p>
{selectedDocument.structured_data.tables.slice(0, 3).map((table: any, idx: number) => (
<div key={idx} className="border rounded-lg overflow-x-auto">
{table.headers && (
<TableComponent>
<TableHeader>
<TableRow>
{table.headers.map((header: string, hIdx: number) => (
<TableHead key={hIdx}>{header}</TableHead>
))}
</TableRow>
</TableHeader>
<TableBody>
{(table.rows || []).slice(0, 10).map((row: string[], rIdx: number) => (
<TableRow key={rIdx}>
{row.map((cell: string, cIdx: number) => (
<TableCell key={cIdx}>{cell}</TableCell>
))}
</TableRow>
))}
</TableBody>
</TableComponent>
)}
</div>
))}
</div>
)}
{selectedDocument.structured_data?.key_values && Object.keys(selectedDocument.structured_data.key_values || {}).length > 0 && (
<div className="space-y-2">
<p className="text-sm font-medium"></p>
<div className="grid grid-cols-2 gap-2">
{Object.entries(selectedDocument.structured_data.key_values || {}).map(([key, value]: [string, any]) => (
<div key={key} className="flex gap-2 p-2 bg-muted/30 rounded-lg">
<span className="font-medium text-sm">{key}:</span>
<span className="text-sm text-muted-foreground">{String(value)}</span>
</div>
))}
</div>
</div>
)}
{selectedDocument.content && (
<div className="space-y-2">
<p className="text-sm font-medium"></p>
<div className="p-3 bg-muted/30 rounded-lg max-h-[300px] overflow-y-auto">
<p className="text-sm whitespace-pre-wrap font-mono">
{selectedDocument.content.slice(0, 2000)}
{selectedDocument.content.length > 2000 && '...'}
</p>
</div>
</div>
)}
{!selectedDocument.content && !selectedDocument.structured_data?.tables && !selectedDocument.structured_data?.key_values && (
<p className="text-sm text-muted-foreground text-center py-4"></p>
)}
</div>
)}
</CardContent>
</Card>
)}
{/* 文档列表 */} {/* 文档列表 */}
<Card className="border-none shadow-md"> <Card className="border-none shadow-md">
<CardHeader> <CardHeader>
@@ -1509,7 +2146,14 @@ const Documents: React.FC = () => {
) : (filteredDocs?.length ?? 0) > 0 ? ( ) : (filteredDocs?.length ?? 0) > 0 ? (
<div className="space-y-3"> <div className="space-y-3">
{(filteredDocs || []).map(doc => ( {(filteredDocs || []).map(doc => (
<div key={doc.doc_id} className="flex items-center gap-4 p-4 rounded-xl border border-transparent hover:bg-muted/30 transition-all group"> <div
key={doc.doc_id}
className={cn(
"flex items-center gap-4 p-4 rounded-xl border border-transparent hover:bg-muted/30 transition-all group cursor-pointer",
selectedDocument?.doc_id === doc.doc_id && "bg-primary/5 border-primary/20"
)}
onClick={() => handleSelectDocument(doc.doc_id)}
>
<div className={cn( <div className={cn(
"w-10 h-10 rounded-lg flex items-center justify-center shrink-0", "w-10 h-10 rounded-lg flex items-center justify-center shrink-0",
doc.doc_type === 'xlsx' ? "bg-emerald-500/10 text-emerald-500" : "bg-blue-500/10 text-blue-500" doc.doc_type === 'xlsx' ? "bg-emerald-500/10 text-emerald-500" : "bg-blue-500/10 text-blue-500"
@@ -1522,7 +2166,10 @@ const Documents: React.FC = () => {
{doc.doc_type.toUpperCase()} {format(new Date(doc.created_at), 'yyyy-MM-dd HH:mm')} {doc.doc_type.toUpperCase()} {format(new Date(doc.created_at), 'yyyy-MM-dd HH:mm')}
</p> </p>
</div> </div>
<Button variant="ghost" size="icon" className="text-destructive hover:bg-destructive/10 opacity-0 group-hover:opacity-100" onClick={() => handleDelete(doc.doc_id)}> <Button variant="ghost" size="icon" className="text-destructive hover:bg-destructive/10 opacity-0 group-hover:opacity-100" onClick={(e) => {
e.stopPropagation();
handleDelete(doc.doc_id);
}}>
<Trash2 size={16} /> <Trash2 size={16} />
</Button> </Button>
</div> </div>
@@ -1629,39 +2276,57 @@ const Documents: React.FC = () => {
); );
}; };
// 数据表格组件 // 数据表格组件 - 滑动窗口样式
const DataTable: React.FC<{ columns: string[]; rows: Record<string, any>[] }> = ({ columns, rows }) => { const DataTable: React.FC<{ columns: string[]; rows: Record<string, any>[] }> = ({ columns, rows }) => {
if (!columns.length || !rows.length) { if (!columns.length || !rows.length) {
return <div className="text-center py-8 text-muted-foreground text-sm"></div>; return <div className="text-center py-8 text-muted-foreground text-sm"></div>;
} }
const displayRows = rows.slice(0, 500); // 限制最多显示500行
return ( return (
<div className="rounded-lg border overflow-x-auto"> <div className="rounded-lg border overflow-hidden">
<TableComponent> {/* 表头 - 固定 */}
<TableHeader> <div className="overflow-x-auto">
<TableRow> <TableComponent>
<TableHead className="w-16 text-center text-muted-foreground">#</TableHead> <TableHeader>
{columns.map((col, idx) => ( <TableRow className="bg-muted/50">
<TableHead key={idx} className="whitespace-nowrap">{col || `<列${idx + 1}>`}</TableHead> <TableHead className="w-16 text-center text-muted-foreground">#</TableHead>
))} {columns.map((col, idx) => (
</TableRow> <TableHead key={idx} className="whitespace-nowrap">{col || `<列${idx + 1}>`}</TableHead>
</TableHeader>
<TableBody>
{rows.slice(0, 100).map((row, rowIdx) => (
<TableRow key={rowIdx}>
<TableCell className="text-center text-muted-foreground font-medium">{rowIdx + 1}</TableCell>
{columns.map((col, colIdx) => (
<TableCell key={colIdx} className="whitespace-nowrap">
{row[col] !== null && row[col] !== undefined ? String(row[col]) : '-'}
</TableCell>
))} ))}
</TableRow> </TableRow>
))} </TableHeader>
</TableBody> </TableComponent>
</TableComponent> </div>
{rows.length > 100 && ( {/* 表体 - 可滚动 */}
<div
className="overflow-y-auto"
style={{ maxHeight: '400px' }}
>
<TableComponent>
<TableBody>
{displayRows.map((row, rowIdx) => (
<TableRow key={rowIdx}>
<TableCell className="text-center text-muted-foreground font-medium w-16">{rowIdx + 1}</TableCell>
{columns.map((col, colIdx) => (
<TableCell key={colIdx} className="whitespace-nowrap">
{row[col] !== null && row[col] !== undefined ? String(row[col]) : '-'}
</TableCell>
))}
</TableRow>
))}
</TableBody>
</TableComponent>
</div>
{rows.length > 500 && (
<div className="p-3 text-center text-sm text-muted-foreground bg-muted/30"> <div className="p-3 text-center text-sm text-muted-foreground bg-muted/30">
100 500 {rows.length}
</div>
)}
{rows.length > 100 && rows.length <= 500 && (
<div className="p-2 text-center text-xs text-muted-foreground bg-muted/20">
{rows.length}
</div> </div>
)} )}
</div> </div>

View File

@@ -45,7 +45,7 @@ const InstructionChat: React.FC = () => {
{ {
id: 'welcome', id: 'welcome',
role: 'assistant', role: 'assistant',
content: `您好!我是智联文档 AI 助手。 content: `您好!我是表易智融 AI 助手。
**📄 文档智能操作** **📄 文档智能操作**
- "提取文档中的医院数量和床位数" - "提取文档中的医院数量和床位数"

View File

@@ -0,0 +1,446 @@
/**
* PDF 转换页面
* 支持将 Word、Excel、Txt、Markdown 格式转换为 PDF
*/
import React, { useState, useCallback } from 'react';
import { useDropzone } from 'react-dropzone';
import {
FileText,
Upload,
Download,
FileSpreadsheet,
File as FileIcon,
Loader2,
CheckCircle,
AlertCircle,
Trash2,
FileDown,
X,
Copy
} from 'lucide-react';
import { Button } from '@/components/ui/button';
import { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card';
import { Badge } from '@/components/ui/badge';
import { Label } from '@/components/ui/label';
import { toast } from 'sonner';
import { cn } from '@/lib/utils';
import { backendApi } from '@/db/backend-api';
type FileState = {
file: File;
status: 'pending' | 'converting' | 'success' | 'failed';
progress: number;
pdfBlob?: Blob;
error?: string;
};
const SUPPORTED_FORMATS = [
{ ext: 'docx', name: 'Word 文档', icon: FileText, color: 'blue' },
{ ext: 'xlsx', name: 'Excel 表格', icon: FileSpreadsheet, color: 'emerald' },
{ ext: 'txt', name: '文本文件', icon: FileIcon, color: 'gray' },
{ ext: 'md', name: 'Markdown', icon: FileText, color: 'purple' },
];
const PdfConverter: React.FC = () => {
const [files, setFiles] = useState<FileState[]>([]);
const [converting, setConverting] = useState(false);
const [convertedCount, setConvertedCount] = useState(0);
const onDrop = useCallback((acceptedFiles: File[]) => {
const newFiles: FileState[] = acceptedFiles.map(file => ({
file,
status: 'pending',
progress: 0,
}));
setFiles(prev => [...prev, ...newFiles]);
}, []);
const { getRootProps, getInputProps, isDragActive } = useDropzone({
onDrop,
accept: {
'application/vnd.openxmlformats-officedocument.wordprocessingml.document': ['.docx'],
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx'],
'application/vnd.ms-excel': ['.xls'],
'text/markdown': ['.md'],
'text/plain': ['.txt'],
},
multiple: true,
});
const handleConvert = async () => {
if (files.length === 0) {
toast.error('请先上传文件');
return;
}
setConverting(true);
setConvertedCount(0);
const pendingFiles = files.filter(f => f.status === 'pending' || f.status === 'failed');
let successCount = 0;
for (let i = 0; i < pendingFiles.length; i++) {
const fileState = pendingFiles[i];
const fileIndex = files.findIndex(f => f.file === fileState.file);
// 更新状态为转换中
setFiles(prev => prev.map((f, idx) =>
idx === fileIndex ? { ...f, status: 'converting', progress: 10 } : f
));
try {
// 获取 PDF blob
const pdfBlob = await backendApi.convertToPdf(fileState.file);
// 触发下载
const url = URL.createObjectURL(pdfBlob);
const a = document.createElement('a');
a.href = url;
a.download = `${fileState.file.name.replace(/\.[^.]+$/, '')}.pdf`;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
// 保存 blob 以便批量下载
setFiles(prev => prev.map((f, idx) =>
idx === fileIndex ? { ...f, status: 'success', progress: 100, pdfBlob } : f
));
successCount++;
setConvertedCount(successCount);
toast.success(`${fileState.file.name} 下载已开始`);
} catch (error: any) {
setFiles(prev => prev.map((f, idx) =>
idx === fileIndex ? { ...f, status: 'failed', error: error.message || '转换失败' } : f
));
}
}
setConverting(false);
toast.success(`转换完成:${successCount}/${pendingFiles.length} 个文件`);
};
const handleDownload = (fileState: FileState) => {
if (!fileState.pdfBlob) return;
const url = URL.createObjectURL(fileState.pdfBlob);
const link = document.createElement('a');
link.href = url;
link.download = `${fileState.file.name.replace(/\.[^.]+$/, '')}.pdf`;
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
URL.revokeObjectURL(url);
};
const handleDownloadAll = async () => {
const successFiles = files.filter(f => f.status === 'success' && f.pdfBlob);
if (successFiles.length === 0) {
toast.error('没有可下载的文件');
return;
}
if (successFiles.length === 1) {
handleDownload(successFiles[0]);
return;
}
// 多个文件,下载 ZIP
try {
const zipBlob = await backendApi.batchConvertToPdf(
successFiles.map(f => f.file)
);
const url = URL.createObjectURL(zipBlob);
const link = document.createElement('a');
link.href = url;
link.download = 'converted_pdfs.zip';
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
URL.revokeObjectURL(url);
toast.success('ZIP 下载开始');
} catch (error: any) {
toast.error(error.message || '下载失败');
}
};
const handleRemove = (index: number) => {
setFiles(prev => prev.filter((_, i) => i !== index));
};
const handleClear = () => {
setFiles([]);
setConvertedCount(0);
};
const getFileIcon = (filename: string) => {
const ext = filename.split('.').pop()?.toLowerCase();
const format = SUPPORTED_FORMATS.find(f => f.ext === ext);
if (!format) return FileIcon;
return format.icon;
};
const getFileColor = (filename: string) => {
const ext = filename.split('.').pop()?.toLowerCase();
const format = SUPPORTED_FORMATS.find(f => f.ext === ext);
return format?.color || 'gray';
};
const colorClasses: Record<string, string> = {
blue: 'bg-blue-500/10 text-blue-500',
emerald: 'bg-emerald-500/10 text-emerald-500',
purple: 'bg-purple-500/10 text-purple-500',
gray: 'bg-gray-500/10 text-gray-500',
};
return (
<div className="space-y-8 pb-10">
<section className="flex flex-col md:flex-row md:items-center justify-between gap-4">
<div className="space-y-1">
<h1 className="text-3xl font-extrabold tracking-tight"> PDF</h1>
<p className="text-muted-foreground"> WordExcelMarkdown PDF </p>
</div>
{files.length > 0 && (
<div className="flex gap-2">
<Button variant="outline" onClick={handleClear}>
<Trash2 size={18} className="mr-2" />
</Button>
<Button onClick={handleDownloadAll} disabled={files.filter(f => f.status === 'success').length === 0}>
<Download size={18} className="mr-2" />
({files.filter(f => f.status === 'success').length})
</Button>
</div>
)}
</section>
<div className="grid grid-cols-1 lg:grid-cols-3 gap-6">
{/* 左侧:上传区域 */}
<div className="lg:col-span-1 space-y-6">
{/* 上传卡片 */}
<Card className="border-none shadow-md">
<CardHeader className="pb-4">
<CardTitle className="flex items-center gap-2">
<Upload className="text-primary" size={20} />
</CardTitle>
<CardDescription></CardDescription>
</CardHeader>
<CardContent className="space-y-4">
<div
{...getRootProps()}
className={cn(
"border-2 border-dashed rounded-2xl p-8 transition-all duration-300 flex flex-col items-center justify-center text-center cursor-pointer group",
isDragActive ? "border-primary bg-primary/5" : "border-muted-foreground/20 hover:border-primary/50 hover:bg-primary/5",
converting && "opacity-50 pointer-events-none"
)}
>
<input {...getInputProps()} />
<div className="w-14 h-14 rounded-xl bg-primary/10 text-primary flex items-center justify-center mb-4 group-hover:scale-110 transition-transform">
{converting ? <Loader2 className="animate-spin" size={28} /> : <Upload size={28} />}
</div>
<p className="font-semibold text-sm">
{isDragActive ? '释放以开始上传' : '点击或拖拽文件到这里'}
</p>
<div className="mt-4 flex flex-wrap justify-center gap-2">
{SUPPORTED_FORMATS.map(format => (
<Badge key={format.ext} variant="outline" className={cn("text-xs", colorClasses[format.color])}>
{format.name}
</Badge>
))}
</div>
</div>
{/* 转换按钮 */}
{files.length > 0 && (
<Button
onClick={handleConvert}
disabled={converting || files.filter(f => f.status === 'pending' || f.status === 'failed').length === 0}
className="w-full bg-gradient-to-r from-primary to-purple-600 hover:from-primary/90 hover:to-purple-600/90"
>
{converting ? (
<>
<Loader2 className="mr-2 animate-spin" size={16} />
... ({convertedCount}/{files.filter(f => f.status === 'pending' || f.status === 'failed').length})
</>
) : (
<>
<FileDown className="mr-2" size={16} />
({files.filter(f => f.status === 'pending' || f.status === 'failed').length})
</>
)}
</Button>
)}
</CardContent>
</Card>
{/* 格式说明 */}
<Card className="border-none shadow-md">
<CardHeader className="pb-4">
<CardTitle className="flex items-center gap-2">
<FileText className="text-primary" size={20} />
</CardTitle>
</CardHeader>
<CardContent>
<div className="space-y-3">
{SUPPORTED_FORMATS.map(format => {
const Icon = format.icon;
return (
<div key={format.ext} className="flex items-center gap-3 p-2 rounded-lg hover:bg-muted/30 transition-colors">
<div className={cn("w-8 h-8 rounded flex items-center justify-center", colorClasses[format.color])}>
<Icon size={16} />
</div>
<div className="flex-1">
<p className="text-sm font-medium">.{format.ext.toUpperCase()}</p>
<p className="text-xs text-muted-foreground">{format.name}</p>
</div>
</div>
);
})}
</div>
</CardContent>
</Card>
</div>
{/* 右侧:文件列表 */}
<div className="lg:col-span-2 space-y-6">
<Card className="border-none shadow-md">
<CardHeader>
<div className="flex items-center justify-between">
<div className="space-y-1">
<CardTitle className="flex items-center gap-2">
<FileIcon className="text-primary" size={20} />
</CardTitle>
<CardDescription>
{files.length} {files.filter(f => f.status === 'success').length}
</CardDescription>
</div>
</div>
</CardHeader>
<CardContent>
{files.length === 0 ? (
<div className="text-center py-12 text-muted-foreground">
<FileIcon size={48} className="mx-auto mb-4 opacity-30" />
<p></p>
</div>
) : (
<div className="space-y-3">
{files.map((fileState, index) => {
const Icon = getFileIcon(fileState.file.name);
const color = getFileColor(fileState.file.name);
return (
<div
key={index}
className="flex items-center gap-4 p-4 rounded-xl border bg-card hover:bg-muted/30 transition-colors"
>
<div className={cn("w-10 h-10 rounded-lg flex items-center justify-center shrink-0", colorClasses[color])}>
<Icon size={20} />
</div>
<div className="flex-1 min-w-0">
<p className="font-semibold truncate">{fileState.file.name}</p>
<div className="flex items-center gap-2">
<span className="text-xs text-muted-foreground">
{(fileState.file.size / 1024).toFixed(1)} KB
</span>
{fileState.status === 'pending' && (
<Badge variant="secondary" className="text-xs"></Badge>
)}
{fileState.status === 'converting' && (
<Badge variant="default" className="text-xs bg-blue-500"></Badge>
)}
{fileState.status === 'success' && (
<Badge variant="default" className="text-xs bg-emerald-500"></Badge>
)}
{fileState.status === 'failed' && (
<Badge variant="destructive" className="text-xs"></Badge>
)}
</div>
{fileState.status === 'converting' && (
<div className="mt-1 h-1 bg-muted rounded-full overflow-hidden">
<div
className="h-full bg-primary transition-all duration-300"
style={{ width: `${fileState.progress}%` }}
/>
</div>
)}
{fileState.error && (
<p className="text-xs text-destructive mt-1">{fileState.error}</p>
)}
</div>
<div className="flex items-center gap-2 shrink-0">
{fileState.status === 'success' && (
<>
<Button variant="ghost" size="icon" onClick={() => handleDownload(fileState)}>
<Download size={18} className="text-emerald-500" />
</Button>
<Button
variant="ghost"
size="icon"
onClick={() => {
// 复制下载链接到剪贴板
if (fileState.pdfBlob) {
const url = URL.createObjectURL(fileState.pdfBlob);
navigator.clipboard.writeText(url);
toast.success('链接已复制');
}
}}
>
<Copy size={18} />
</Button>
</>
)}
{(fileState.status === 'pending' || fileState.status === 'failed') && (
<Button
variant="ghost"
size="icon"
onClick={() => handleRemove(index)}
className="text-destructive hover:bg-destructive/10"
>
<X size={18} />
</Button>
)}
</div>
</div>
);
})}
</div>
)}
</CardContent>
</Card>
{/* 使用说明 */}
<Card className="border-none shadow-md bg-gradient-to-br from-primary/5 to-purple-500/5">
<CardHeader className="pb-4">
<CardTitle className="flex items-center gap-2">
<FileText className="text-primary" size={20} />
使
</CardTitle>
</CardHeader>
<CardContent>
<div className="space-y-3 text-sm text-muted-foreground">
<div className="flex gap-3">
<div className="w-6 h-6 rounded-full bg-primary/10 text-primary flex items-center justify-center shrink-0 text-xs font-bold">1</div>
<p> Word(.docx)Excel(.xlsx)(.txt)Markdown(.md) </p>
</div>
<div className="flex gap-3">
<div className="w-6 h-6 rounded-full bg-primary/10 text-primary flex items-center justify-center shrink-0 text-xs font-bold">2</div>
<p> PDF </p>
</div>
<div className="flex gap-3">
<div className="w-6 h-6 rounded-full bg-primary/10 text-primary flex items-center justify-center shrink-0 text-xs font-bold">3</div>
<p> PDF 使</p>
</div>
</div>
</CardContent>
</Card>
</div>
</div>
</div>
);
};
export default PdfConverter;

View File

@@ -4,6 +4,7 @@ import Documents from '@/pages/Documents';
import TemplateFill from '@/pages/TemplateFill'; import TemplateFill from '@/pages/TemplateFill';
import InstructionChat from '@/pages/InstructionChat'; import InstructionChat from '@/pages/InstructionChat';
import TaskHistory from '@/pages/TaskHistory'; import TaskHistory from '@/pages/TaskHistory';
import PdfConverter from '@/pages/PdfConverter';
import MainLayout from '@/components/layouts/MainLayout'; import MainLayout from '@/components/layouts/MainLayout';
export const routes = [ export const routes = [
@@ -31,6 +32,10 @@ export const routes = [
path: '/task-history', path: '/task-history',
element: <TaskHistory />, element: <TaskHistory />,
}, },
{
path: '/pdf-converter',
element: <PdfConverter />,
},
], ],
}, },
{ {

View File

@@ -23,7 +23,6 @@
"noUnusedParameters": true, "noUnusedParameters": true,
"noFallthroughCasesInSwitch": true, "noFallthroughCasesInSwitch": true,
"noUncheckedSideEffectImports": true, "noUncheckedSideEffectImports": true,
"baseUrl": ".",
"paths": { "paths": {
"@/*": ["./src/*"] "@/*": ["./src/*"]
}, },

Binary file not shown.

After

Width:  |  Height:  |  Size: 552 KiB