快捷導(dǎo)航

使用Python給PDF添加目錄書簽的實(shí)現(xiàn)方法

更新時(shí)間：2023年10月06日 09:55:28 作者：飛由于度

有時(shí)下載到掃描版的 PDF 是不帶書簽?zāi)夸浀?這樣閱讀起來(lái)很不方便,下面通過(guò) python 實(shí)現(xiàn)一個(gè)半自動(dòng)化添加書簽?zāi)夸浀哪_本,文中通過(guò)代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,需要的朋友可以參考下

0、庫(kù)的選擇——pypdf

原因：Python Version Support

Python	3.11	3.10	3.9	3.8	3.7	3.6	2.7
pypdf>=3.0	YES	YES	YES	YES	YES	YES
PyPDF2>=2.0	YES	YES	YES	YES	YES	YES
PyPDF2 1.20.0 - 1.28.4		YES	YES	YES	YES	YES	YES
PyPDF2 1.15.0 - 1.20.0							YES

我的版本

Python=3.6.13

pypdf=3.16.2

1、添加書簽——方法add_outline_item的使用

# https://zhuanlan.zhihu.com/p/603340639
import pypdf  #
import sys
wk_in_file_name = 'PythonTutorial.pdf'
input1 = open(wk_in_file_name, "rb")  # 打開需要添加書簽的PDF
writer = pypdf.PdfWriter()  # 創(chuàng)建一個(gè)PdfWriter類
writer.append(input1)  # 將PDF讀入writer中，然后進(jìn)行書簽的編輯
writer.add_outline_item(title='10', page_number=10, parent=None)  # 添加第一個(gè)書簽
writer.add_outline_item(title='11', page_number=11, parent=None)  # 添加第二個(gè)書簽
# Write to an output PDF document
output = open('01_' + wk_in_file_name, "wb")  # 如果wk_out_file_name不存在，則創(chuàng)建一個(gè)
writer.write(output)  # 將添加書簽后的PDF保存
# Close File Descriptors
writer.close()
output.close()
print('pypdf.__version__=', pypdf.__version__)
print('sys.version=', sys.version)
pass

運(yùn)行結(jié)果

2、添加子書簽——參數(shù)parent的使用

# https://zhuanlan.zhihu.com/p/603340639
import pypdf
wk_in_file_name = 'PythonTutorial.pdf'
writer = pypdf.PdfWriter()
input1 = open(wk_in_file_name, "rb")
writer.append(input1)
parent_bookmark_0 = writer.add_outline_item(title='10', page_number=10, parent=None)  # 添加第一個(gè)書簽
writer.add_outline_item(title='10_1', page_number=11, parent=parent_bookmark_0)  # 添加第一個(gè)書簽的子書簽
parent_bookmark_1 = writer.add_outline_item(title='11', page_number=20, parent=None)  # 添加第二個(gè)書簽
writer.add_outline_item(title='11_1', page_number=21, parent=parent_bookmark_1)  # 添加第二個(gè)書簽的子書簽
# Write to an output PDF document
output = open('02_'+wk_in_file_name, "wb")
writer.write(output)
# Close File Descriptors
writer.close()
output.close()
pass

運(yùn)行結(jié)果

3、讀取txt文件

# https://blog.csdn.net/kobeyu652453/article/details/106876829
f = open('dir.txt', 'r', encoding='utf8')
# f = open('dir.txt', encoding='gbk', errors='ignore'), errors='ignore'
# f = open('dir.txt', encoding='gb18030', errors='ignore')
line1 = f.readline()  # 讀取第一行，大文件readline
# https://blog.csdn.net/andyleo0111/article/details/87878784
lines = f.readlines()  # 讀取所有行，小文件readlines
num_lines = len(lines)  # 標(biāo)題的總個(gè)數(shù)
txt = []
for line in lines:
    txt.append(line.strip())
    print(line.strip())
    line.strip()  # 去掉末尾的'\n'
    line.split(' ')  # 根據(jù)line中' '進(jìn)行分割
    line.count('.')  # 有n個(gè)'.'就是n+1級(jí)標(biāo)題
print(txt)
f.close()  # 關(guān)閉文件
print('f.closed=', f.closed)

運(yùn)行結(jié)果

D:\SoftProgram\JetBrains\anaconda3_202303\envs\py3_6_for_TimeSeries\python.exe E:\program\python\gitTemp\pdf\test\03_read_txt.py 
1 課前甜點(diǎn) 3
2 使用Python解釋器 5
2.1 調(diào)用解釋器 5
2.1.1 傳入?yún)?shù) 6
2.1.2 交互模式 6
2.2 解釋器的運(yùn)行環(huán)境 6
2.2.1 源文件的字符編碼 6
3 Python的非正式介紹 9
3.1 Python作為計(jì)算器使用 9
3.1.1 數(shù)字 9
3.1.2 字符串 11
3.1.3 列表 14
3.2 走向編程的第一步 15
4 其他流程控制工具 17
4.1 if語(yǔ)句 17
4.2 for語(yǔ)句 17
4.3 range()函數(shù) 18
4.4 break和continue語(yǔ)句，以及循環(huán)中的else子句 19
4.5 pass 語(yǔ)句 20
4.6 定義函數(shù) 20
4.7 函數(shù)定義的更多形式 22
4.8 小插曲：編碼風(fēng)格 29
['1 課前甜點(diǎn) 3', '2 使用Python解釋器 5', '2.1 調(diào)用解釋器 5', '2.1.1 傳入?yún)?shù) 6', '2.1.2 交互模式 6', '2.2 解釋器的運(yùn)行環(huán)境 6', '2.2.1 源文件的字符編碼 6', '3 Python的非正式介紹 9', '3.1 Python作為計(jì)算器使用 9', '3.1.1 數(shù)字 9', '3.1.2 字符串 11', '3.1.3 列表 14', '3.2 走向編程的第一步 15', '4 其他流程控制工具 17', '4.1 if語(yǔ)句 17', '4.2 for語(yǔ)句 17', '4.3 range()函數(shù) 18', '4.4 break和continue語(yǔ)句，以及循環(huán)中的else子句 19', '4.5 pass 語(yǔ)句 20', '4.6 定義函數(shù) 20', '4.7 函數(shù)定義的更多形式 22', '4.8 小插曲：編碼風(fēng)格 29']
f.closed= True
進(jìn)程已結(jié)束,退出代碼0

4、從txt中讀取目錄與頁(yè)碼并寫入PDF的書簽

# https://blog.csdn.net/kobeyu652453/article/details/106876829
import pypdf
wk_in_file_name = 'PythonTutorial.pdf'
writer = pypdf.PdfWriter()
input1 = open(wk_in_file_name, "rb")
writer.append(input1)
f = open('dir.txt', 'r', encoding='utf8')
lines = f.readlines()  # 讀取所有行
num_lines = len(lines)  # 標(biāo)題的總個(gè)數(shù)
txt = []
for line in lines:
    line = line.strip()  # 去掉末尾的'\n'
    pline = line.split(' ')  # 根據(jù)line中' '進(jìn)行分割
    level = line.count('.')  # 有n個(gè)'.'就是n+1級(jí)標(biāo)題
    if level == 0:
        bookmark_parent_0 = writer.add_outline_item(title=pline[0] + pline[1], page_number=int(pline[-1]), parent=None)
    elif level == 1:
        bookmark_parent_1 = writer.add_outline_item(title=pline[0] + pline[1], page_number=int(pline[-1]),
                                                    parent=bookmark_parent_0)
    else:
        writer.add_outline_item(title=pline[0] + pline[1], page_number=int(pline[-1]), parent=bookmark_parent_1)
# Write to an output PDF document
output = open('04_'+wk_in_file_name, "wb")
writer.write(output)
# Close File Descriptors
writer.close()
output.close()
f.close()  # 關(guān)閉文件
print('f.closed=', f.closed)

運(yùn)行結(jié)果

5、添加偏置

# https://blog.csdn.net/kobeyu652453/article/details/106876829
import pypdf
wk_in_file_name = 'PythonTutorial.pdf'
writer = pypdf.PdfWriter()
input1 = open(wk_in_file_name, "rb")
writer.append(input1)
f = open('dir.txt', 'r', encoding='utf8')
lines = f.readlines()  # 讀取所有行
num_lines = len(lines)  # 標(biāo)題的總個(gè)數(shù)
offset = 5  # 添加偏置
txt = []
bookmark_parent_0 = None
bookmark_parent_1 = None
for line in lines:
    line = line.strip()  # 去掉末尾的'\n'
    pline = line.split(' ')  # 根據(jù)line中' '進(jìn)行分割
    level = line.count('.')  # 有n個(gè)'.'就是n+1級(jí)標(biāo)題
    page_title = pline[0] + ' ' + pline[1]
    page_num = int(pline[-1]) + offset
    if level == 0:
        bookmark_parent_0 = writer.add_outline_item(title=page_title, page_number=page_num, parent=None)
    elif level == 1:
        bookmark_parent_1 = writer.add_outline_item(title=page_title, page_number=page_num, parent=bookmark_parent_0)
    else:
        writer.add_outline_item(title=page_title, page_number=page_num, parent=bookmark_parent_1)
    print(line.strip())
print(txt)
# Write to an output PDF document
output = open('05_' + wk_in_file_name, "wb")
writer.write(output)
# Close File Descriptors
writer.close()
output.close()
f.close()  # 關(guān)閉文件
print('f.closed=', f.closed)

運(yùn)行結(jié)果：

6、dir中沒有頁(yè)碼的情況

# https://blog.csdn.net/kobeyu652453/article/details/106876829
import pypdf
wk_in_file_name = 'PythonTutorial.pdf'
writer = pypdf.PdfWriter()
input1 = open(wk_in_file_name, "rb")
writer.append(input1)
f = open('dir.txt', 'r', encoding='utf8')
lines = f.readlines()  # 讀取所有行
num_lines = len(lines)  # 標(biāo)題的總個(gè)數(shù)
offset = 5  # 添加偏置
txt = []
bookmark_parent_0 = None
bookmark_parent_1 = None
for line in lines:
    line = line.strip()  # 去掉末尾的'\n'
    pline = line.split(' ')  # 根據(jù)line中' '進(jìn)行分割
    level = line.count('.')  # 有n個(gè)'.'就是n+1級(jí)標(biāo)題
    page_title = pline[0] + ' ' + pline[1]
    page_num = offset
    if level == 0:
        bookmark_parent_0 = writer.add_outline_item(title=page_title, page_number=page_num, parent=None)
    elif level == 1:
        bookmark_parent_1 = writer.add_outline_item(title=page_title, page_number=page_num, parent=bookmark_parent_0)
    else:
        writer.add_outline_item(title=page_title, page_number=page_num, parent=bookmark_parent_1)
    print(line.strip())
print(txt)
# Write to an output PDF document
output = open('06_' + wk_in_file_name, "wb")
writer.write(output)
# Close File Descriptors
writer.close()
output.close()
f.close()  # 關(guān)閉文件
print('f.closed=', f.closed)

運(yùn)行結(jié)果