Python實現(xiàn)將HTML轉成PDF的方法分析

更新時間：2019年05月04日 12:38:50 作者：Tacey Wong

這篇文章主要介紹了Python實現(xiàn)將HTML轉成PDF的方法,結合實例形式分析了Python基于pdfkit模塊實現(xiàn)HTML轉換成PDF文件的相關操作技巧與注意事項,需要的朋友可以參考下

本文實例講述了Python實現(xiàn)將HTML轉成PDF的方法。分享給大家供大家參考，具體如下：

主要使用的是wkhtmltopdf的Python封裝——pdfkit

安裝

1. Install python-pdfkit:

$ pip install pdfkit

2. Install wkhtmltopdf:

Debian/Ubuntu:

$ sudo apt-get install wkhtmltopdf

Redhat/CentOS

sudo yum intsall wkhtmltopdf

MacOS

brew install Caskroom/cask/wkhtmltopdf

使用

一個簡單的例子:

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

你也可以傳遞一個url或者文件名列表:

pdfkit.from_url(['google.com', 'yandex.ru', 'engadget.com'], 'out.pdf')
pdfkit.from_file(['file1.html', 'file2.html'], 'out.pdf')

也可以傳遞一個打開的文件:

with open('file.html') as f:
  pdfkit.from_file(f, 'out.pdf')

如果你想對生成的PDF作進一步處理，你可以將其讀取到一個變量中:

# 設置輸出文件為False，將結果賦給一個變量
pdf = pdfkit.from_url('http://google.com', False)

你可以制定所有的 wkhtmltopdf 選項 <http://wkhtmltopdf.org/usage/wkhtmltopdf.txt>. 你可以移除選項名字前面的 '--' .如果選項沒有值, 使用None, Falseor * 作為字典值:

  options = {
    'page-size': 'Letter',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'no-outline': None
  }
  pdfkit.from_url('http://google.com', 'out.pdf', options=options)

默認情況下, PDFKit 將會顯示所有的 wkhtmltopdf 輸出. 如果你不想看到這些信息，你需要傳遞一個 quiet 選項:

  options = {
    'quiet': ''
    }
  pdfkit.from_url('google.com', 'out.pdf', options=options)

由于wkhtmltopdf的命令語法 , TOC 和 Cover 選項必須分開指定:

  toc = {
    'xsl-style-sheet': 'toc.xsl'
  }
  cover = 'cover.html'
  pdfkit.from_file('file.html', options=options, toc=toc, cover=cover)

當你轉換文件、或字符串的時候，你可以通過css選項指定擴展的 CSS 文件。

  # 單個 CSS 文件
  css = 'example.css'
  pdfkit.from_file('file.html', options=options, css=css)
  # Multiple CSS files
  css = ['example.css', 'example2.css']
  pdfkit.from_file('file.html', options=options, css=css)

你也可以通過你的HTML中的meta tags傳遞任意選項：

  body = """
    <html>
     <head>
      <meta name="pdfkit-page-size" content="Legal"/>
      <meta name="pdfkit-orientation" content="Landscape"/>
     </head>
     Hello World!
     </html>
    """
  pdfkit.from_string(body, 'out.pdf') #with --page-size=Legal and --orientation=Landscape

配置

每個API調(diào)用都有一個可選的參數(shù)。這應該是pdfkit.configuration()API 調(diào)用的一個實例. 采用configuration 選項作為初始化參數(shù)?？捎玫倪x項有:

wkhtmltopdf ——wkhtmltopdf二進制文件所在的位置。默認情況下pdfkit 會嘗試使用which (在類UNIX系統(tǒng)中) 或 where (在Windows系統(tǒng)中)來判斷.
meta_tag_prefix -- pdfkit的前綴指定 meta tags（元標簽） - 默認情況是pdfkit-

示例：針對wkhtmltopdf不在系統(tǒng)路徑中（不在$PATH里面）:

config = pdfkit.configuration(wkhtmltopdf='/opt/bin/wkhtmltopdf'))
pdfkit.from_string(html_string, output_file, configuration=config)

問題

IOError: 'No wkhtmltopdf executable found':

確保 wkhtmltopdf 在你的系統(tǒng)路徑中（$PATH），會通過 configuration進行了配置 (詳情看上文描述)。在Windows系統(tǒng)中使用where wkhtmltopdf命令或在 linux系統(tǒng)中使用 which wkhtmltopdf 會返回 wkhtmltopdf二進制可執(zhí)行文件所在的確切位置.

IOError: 'Command Failed'

如果出現(xiàn)這個錯誤意味著 PDFKit不能處理一個輸入。你可以嘗試直接在錯誤信息后面直接運行一個命令來查看是什么導致了這個錯誤（某些版本的 wkhtmltopdf會因為段錯誤導致處理失敗）

正常生成，但是出現(xiàn)中文亂碼

確保兩項：

1）、你的系統(tǒng)中有中文字體

2）、在html中加入<meta charset="UTF-8">

下面是我隨便寫的一個HTML表格：

<html>
<head><meta charset="UTF-8"></head>
<body>
<table width="400" border="1">
 <tr>
 <th align="left">Item....</th>
 <th align="right">1</th>
 </tr>
 <tr>
 <td align="left">衣服</td>
 <td align="right">$241.10</td>
 </tr>
 <tr>
 <td align="left">化妝品</td>
 <td align="right">$30.00</td>
 </tr>
 <tr>
 <td align="left">食物</td>
 <td align="right">$730.40</td>
 </tr>
 <tr>
 <th align="left">tOTAL</th>
 <th align="right">$1001.50</th>
 </tr>
</table>
</body>
</html>

下面是生成的PDF截圖