抽 PDF 內文

twtools-extract_pdf_text

通用MIT288 近 30 天呼叫

提取 PDF 內文（pymupdf-based, born-digital only）。

Args:
    source: PDF URL 或 base64-encoded PDF 內容。
    max_pages: 最多處理幾頁（None = 全部）。

Returns: text + per-page list + is_scanned flag。

**行為說明（重要，client agents 請注意）**：
- 對 born-digital PDF（含可選取文字）：正常回 text + pages
- 對 scanned / image-only PDF：**這不是錯誤**，會優雅降級回
  `{is_scanned: true, reason: "OCR required, ..."}`，無 `error` 欄位、
  也不丟例外。掃描件 OCR 屬高階方案 call_agent 範圍，本 tool 不處理。
- 下游 agent 看到 `is_scanned: true` 應該當作「PDF 不適合此 tool」而非
  「呼叫失敗」，可改走 OCR 或請使用者重供 born-digital 版本。

輸入 schema

{
  "properties": {
    "source": {
      "title": "Source",
      "type": "string"
    },
    "max_pages": {
      "anyOf": [
        {
          "type": "integer"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Max Pages"
    }
  },
  "required": [
    "source"
  ],
  "title": "extract_pdf_textArguments",
  "type": "object"
}

怎麼呼叫

配置好 Twinkle Hub 的 MCP client 後，agent 會看到 twtools-extract_pdf_text。直接讓它呼叫即可，例如：

# Ask Claude / any MCP client:
請用 twtools-extract_pdf_text 處理 "…"。

# It will call:
twtools-extract_pdf_text(input="…")

還沒設好 client？

Claude Desktop 各 3 分鐘可接好 — 下載 .mcpb 雙擊即可，或看 docs 各 client 安裝步驟。

看 user 設定文件