如何使用python?docx模塊操作word文檔

更新時間：2022年09月28日 10:22:35 作者：安替-AnTi

這篇文章主要介紹了如何使用python?docx模塊操作word文檔，文章圍繞主題展開詳細的內(nèi)容介紹，具有一定的參考價值，需要的小伙伴可以參考一下

引言

入門python-docx很容易。讓我們來看一下基礎(chǔ)知識。

打開文檔

你需要的第一件事是工作的文檔。最簡單的方法是：

from docx import Document
document = Document()

這將打開一個基于默認“模板”的空白文檔，您可以打開并使用現(xiàn)有的Word文檔的工作python-docx，我們會讓事情變得簡單。

正文應(yīng)用字符樣式(字體，大小，顏色)

# 設(shè)置正文字型 英文字型：Times New Roman; 中文字型：宋體
document.styles['Normal'].font.name = 'Times New Roman'
document.styles['Normal']._element.rPr.rFonts.set(qn('w:eastAsia'), u'宋體')
# 設(shè)置正文字體大小
document.styles['Normal'].font.size = Pt(12)
# 設(shè)置正文字體顏色
document.styles['Normal'].font.color.rgb = RGBColor(0, 0, 0)

添加標題

document.add_heading('The REAL meaning of the universe')

默認情況下，這會添加頂級標題，Word中顯示為“標題1”。當您需要子節(jié)的標題時，只需指定所需的級別為1到9之間的整數(shù)：

document.add_heading('The role of dolphins', level=2)

level代表word文檔內(nèi)的標題等級，從0開始。

操作段落

添加段落

段落是Word的基礎(chǔ)。它們用于正文文本，但也用于標題和列表項目（如項目符號）。

這里是添加一個最簡單的方法：

paragraph_format= document.add_paragraph('床前明月光\n疑是地上霜\n舉頭望明月\n低頭思故鄉(xiāng)', style='')
paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER #文本居中

#左縮進
from docx.shared import Inches
paragraph_format.left_indent = Inches(0.3)

#首行縮進
paragraph_format.first_line_indent = Inches(0.3)

#上行間距
from docx.shared import Pt
paragraph_format.space_before = Pt(18)

#下行間距
paragraph_format.space_after = Pt(12)

#增加分頁
document.add_page_break()

style設(shè)置段落樣式，有兩種類型：

①List Bullet：項目符號
②List Number：列表編號

此方法返回對段落的引用，新添加的段落在文檔的結(jié)尾。

還可以使用一個段落作為“光標”，并在其上直接插入一個新段落：

prior_paragraph = paragraph.insert_paragraph_before('Lorem ipsum')

刪除段落

python-docx中并沒有提供delete()方法, github上給出了解決方法:

def delete_paragraph(paragraph):
    p = paragraph._element
    p.getparent().remove(p)
    # p._p = p._element = None
    paragraph._p = paragraph._element = None

經(jīng)試驗, 此方法對刪除段落,表格,標題, 圖片都是管用的:

from docx import Document
docx = Document('word_file.docx')
def delete_docx_prefix_description(docx):
    delete_paragraph(docx.tables[0]) # 刪除word中第一個table
    for p in docx.paragraphs:
        delete_paragraph(p)
        if ''.join(p.text.split(' ')).lower()=='header_keyword':
            break
    for p in docx.paragraphs:  
        if p.text.lower()=='': # 刪除word中在開始部分的空白段落
            delete_paragraph(p)
        else:
            break

替換文字

#  將想要替換的內(nèi)容寫成字典的形式，
#  dict = {"想要被替換的字符串": "新的字符串"}
replace_dict = {
    "蘋果"："apple",
    "香蕉"："banana",
    "獼猴桃"："Kiwi fruit",
    "火龍果"："pitaya",
}

def check_and_change(document, replace_dict):
    """
    遍歷word中的所有 paragraphs，在每一段中發(fā)現(xiàn)含有key 的內(nèi)容，就替換為 value 。 
   （key 和 value 都是replace_dict中的鍵值對。）
    """
    for para in document.paragraphs:
        for i in range(len(para.runs)):
            for key, value in replace_dict.items():
                if key in para.runs[i].text:
                    print(key+"->"+value)
                    para.runs[i].text = para.runs[i].text.replace(key, value)
    return document

設(shè)置段落對齊方式

paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER

這里使用了CENTER這個枚舉值。

字體格式

這里引用一下官方文檔的說明：

In order to understand how bold(粗體) and italic(斜體) work, you need to understand a little about what goes on inside a paragraph. The short version is this:

A paragraph holds all the block-level(塊級元素) formatting, like indentation(縮進), line height, tabs, and so forth.
Character-level formatting, such as bold and italic, are applied at the run level(運行級別). All content within a paragraph must be within a run, but there can be more than one. So a paragraph with a bold word in the middle would need three runs, a normal one, a bold one containing the word, and another normal one for the text after.

When you add a paragraph by providing text to the method, it gets put into a single run. You can add more using the method on the paragraph:.add_paragraph().add_run()

為此需要設(shè)置第一次運行的正常文本，這個文本可以為空。

P = document.add_paragraph('')
run = P.add_run("靜夜思")
run.font.color.rgb = RGBColor(255, 0, 0)
run.font.size = Pt(14)
run.bold = True
run.italic = False
P.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER

#加粗
paragraph.add_run(u'粗體').bold = True

#斜體
paragraph.add_run(u'斜體、').italic = True

#設(shè)置中文字體
paragraph.add_run(u'設(shè)置中文字體，')
run.font.name=u'宋體'
r = run._element
r.rPr.rFonts.set(qn('w:eastAsia'), u'宋體')

#字號
paragraph.add_run(u'設(shè)置字號').font.size=Pt(24)

#增加引用
document.add_paragraph('Intense quote', style='Intense Quote')

#增加有序列表
document.add_paragraph(
    u'有序列表元素1',style='List Number'
)
document.add_paragraph(
    u'有序列別元素2',style='List Number'
)

#增加無序列表
document.add_paragraph(
    u'無序列表元素1',style='List Bullet'
)
document.add_paragraph(
    u'無序列表元素2',style='List Bullet'
)

#增加圖片
document.add_picture('jdb.jpg',width=Inches(1.25))

#增加表格
table = document.add_table(rows=3,cols=3)
hdr_cells=table.rows[0].cells
hdr_cells[0].text="第一列"
hdr_cells[1].text="第二列"
hdr_cells[2].text="第三列"

hdr_cells = table.rows[1].cells
hdr_cells[0].text = '2'
hdr_cells[1].text = 'aerszvfdgx'
hdr_cells[2].text = 'abdzfgxfdf'

hdr_cells = table.rows[2].cells
hdr_cells[0].text = '3'
hdr_cells[1].text = 'cafdwvaef'
hdr_cells[2].text = 'aabs zfgf'

添加分頁符

你想要下一個文本在一個單獨的頁面

document.add_page_break()

添加表

一個經(jīng)常遇到的內(nèi)容，它自己的表格呈現(xiàn)，排列在整齊的行和列。

以下是添加表格的方法：

table = document.add_table(rows=2, cols=2)

表具有幾個屬性和方法，您將需要它們來填充它們。訪問單個單元格可能是一個好的開始的地方。

作為基線，我們可以始終按其行和列指示訪問單元格：

cell = table.cell(0, 1)
cell.text = 'parrot, possibly dead'

通常，一次訪問一行單元格更容易，例如，當從數(shù)據(jù)源填充可變長度的表時。在.rows 一個表中的屬性提供給單獨的行，每個都具有一個 .cells屬性。該.cells兩個物業(yè)Row和Column 支持索引訪問，就像一個列表：

row = table.rows[1]
row.cells[0].text = 'Foo bar to you.'
row.cells[1].text = 'And a hearty foo bar to you too sir!'

在.rows和.columns桌子上的集合是可迭代的，這樣你就可以直接在使用它們for循環(huán)。

相同的.cells上行或列序列：

for row in table.rows:
    for cell in row.cells:
        print(cell.text)

如果你想在表中的行或列的計數(shù)，只要使用len()的順序：

row_count = len(table.rows)
col_count = len(table.columns)

您還可以以遞增方式向表中添加行，如下所示：

row = table.add_row()

這對于我們上面提到的可變長度表場景非常方便：

# get table data -------------
items = get_things_from_database_or_something()
 
# add table ------------------
table = document.add_table(1, 3)
 
# populate header row --------
heading_cells = table.rows[0].cells
heading_cells[0].text = 'Qty'
heading_cells[1].text = 'SKU'
heading_cells[2].text = 'Description'
 
# add a data row for each item
for item in items:
    cells = table.add_row().cells
    cells[0].text = str(item.qty)
    cells[1].text = item.sku
    cells[2].text = item.desc

同樣的工作對列，雖然我還沒有看到它的一個用例。Word具有一組預(yù)格式化的表格樣式，您可以從其表格樣式庫中選擇。您可以將其中的一個應(yīng)用于表格，

如下所示：

table.style = 'LightShading-Accent1'

通過從表樣式名稱中刪除所有空格形成樣式名稱。通過將鼠標懸停在Word的表樣式庫中的縮略圖上，可以找到表樣式名稱。

添加圖片

Word中，您可以將圖像使用的文檔中的菜單項

document.add_picture('image-filename.png')

此示例使用路徑，從本地文件系統(tǒng)加載圖像文件。你也可以使用一個類文件對象，本質(zhì)上就像一個打開的文件的任何對象。如果您從數(shù)據(jù)庫或網(wǎng)絡(luò)檢索圖像，并且不想獲取涉及的文件系統(tǒng)，這可能很方便。

圖像大小

默認情況下，添加圖像出現(xiàn)在本地的大小。這通常比你想要的更大。本機大小的計算方法。因此，具有300dpi分辨率的300×300像素圖像出現(xiàn)在一平方英寸。問題是大多數(shù)圖像不包含dpi屬性，它默認為72 dpi。這將使同一圖像在一邊，在一半左右的某處出現(xiàn)4.167英寸。pixels / dpi

要獲得所需的圖像大小，您可以以方便的單位指定其寬度或高度，如英寸或厘米：

from docx.shared import Inches
document.add_picture('image-filename.png', width=Inches(1.0))

你可以自由地指定寬度和高度，但通常你不想要。如果僅指定一個，python-docx用它來計算出其他的適當換算值。這樣的高寬比是保留的，你的圖像看起來不拉伸。

在Inches和Cm提供課程，讓你指定派上用場單位進行測量。在內(nèi)部，python-docx使用英語公制單位，914400為英寸。所以，如果你忘記了，只是把喜歡的東西width=2，你會得到一個非常小的圖像:)。你需要從導(dǎo)入docx.shared 子包。你可以在算術(shù)中使用它們，就像它們是一個整數(shù)，事實上它們是。因此，像一個表達式的作品就好了。width = Inches(3) /thing_count

應(yīng)用段落樣式

如果你不知道一個Word段落風(fēng)格是你應(yīng)該肯定檢查出來?；旧?，它允許您將一整套格式化選項立即應(yīng)用到段落。這很像CSS樣式，如果你知道那些是。

您可以在創(chuàng)建段落時應(yīng)用段落樣式：

document.add_paragraph('Lorem ipsum dolor sit amet.', style='ListBullet')

這種特殊的風(fēng)格導(dǎo)致段落顯示為一個子彈，一個非常方便的東西。您也可以在之后應(yīng)用樣式。

這兩行相當于上面的一行：

paragraph = document.add_paragraph('Lorem ipsum dolor sit amet.')
paragraph.style = 'ListBullet'

在此示例中，樣式使用其樣式ID“ListBullet”指定。通常，通過去除樣式名稱中出現(xiàn)在Word用戶界面（UI）中的空格來形成樣式ID。所以風(fēng)格’列表3號’將被指定為’ListNumber3’。但是，請注意，如果您使用的是本地化版本的Word，則樣式ID可能來自英語樣式名稱，并且可能不會完全對應(yīng)于其在Word UI中的樣式名稱。

應(yīng)用粗體和斜體

paragraph = document.add_paragraph('Lorem ipsum ')
paragraph.add_run('dolor sit amet.')

Run對象既有.bold和.italic屬性，您可以設(shè)置其值為運行：

paragraph = document.add_paragraph('Lorem ipsum ')
run = paragraph.add_run('dolor')
run.bold = True
paragraph.add_run(' sit amet.')

其產(chǎn)生的文字，看起來像這樣：'Lorem存有悲坐阿梅德。“

請注意，您可以對結(jié)果集粗體或斜體正確的.add_run()，如果你不需要它為別的：

paragraph.add_run('dolor').bold = True
 
# is equivalent to:
 
run = paragraph.add_run('dolor')
run.bold = True
 
# except you don't have a reference to `run` afterward

它不是必須提供的文字給.add_paragraph()方法。

這可以使你的代碼更簡單，如果你從建立段從運行反正：

paragraph = document.add_paragraph()
paragraph.add_run('Lorem ipsum ')
paragraph.add_run('dolor').bold = True
paragraph.add_run(' sit amet.')

應(yīng)用字符樣式

除了段落樣式，其中指定一組段落級別設(shè)置，Word有字符樣式其指定一組運行級別設(shè)置。一般來說，您可以將字符樣式視為指定字體，包括其字體，大小，顏色，粗體，斜體等。
像段落樣式，字符樣式必須已經(jīng)與你在打開的文檔中定義的Document()調(diào)用（參見了解樣式）。

添加新運行時可以指定字符樣式：

paragraph = document.add_paragraph('Normal text, ')
paragraph.add_run('text with emphasis.', 'Emphasis')

您還可以在運行創(chuàng)建后將樣式應(yīng)用于運行。

此代碼產(chǎn)生的結(jié)果與上面的行相同：

paragraph = document.add_paragraph('Normal text, ')
run = paragraph.add_run('text with emphasis.')
run.style = 'Emphasis'

與段落樣式一樣，通過刪除名稱中出現(xiàn)在Word UI中的空格形成樣式ID。所以風(fēng)格’微妙強調(diào)’將被指定為’SubtleEmphasis’。請注意，如果您使用的是本地化版本的Word，則樣式ID可能來自英語樣式名稱，并且可能不對應(yīng)于其在Word UI中的樣式名稱。

document.add_page_break()

指定路徑并保存文件

document.save(filename)

實際案例

案例1

config.py

# -*- encoding: utf-8 -*-
# @File : config.py
# @Author : 安替
# @Time : 2022/9/26 11:45
# @Software : PyCharm
# Python版本：3.6.3

list_files = ["./test1.docx", "./test2.docx"]
final_path = "./test3.docx"

# 插入位置
insert_text = "aaaa"

#插入內(nèi)容
insert_content = ["", "", ""]

#主標題
replace_text_title_dict = {"", ""}

#二級標題
replace_text_second_dict = {}

# 三級標題
replace_text_third_dict = {}

# 四級標題
replace_text_forth_dict = {}

main.py

# -*- encoding: utf-8 -*-
# @File : test.py
# @Software : PyCharm
# Python版本：3.6.3

from config import *
from docx.oxml.ns import qn
from docx.shared import Pt
from docx import Document
from docx.shared import Inches
from docx.shared import RGBColor
from docxcompose.composer import Composer
from docx.enum.text import WD_ALIGN_PARAGRAPH

def combine_docx():
    """
    Returns
    -------
    合并多個docx文檔
    """
    new_document = Document()
    composer = Composer(new_document)
    for fn in list_files:
        composer.append(Document(fn))
    composer.save(final_path)
    print("docx合并完成")
def delete_paragraph(paragraph):
    """
    Parameters
    ----------
    paragraph : 段落

    Returns
    -------
    刪除指定段落
    """
    p = paragraph._element
    p.getparent().remove(p)
    paragraph._p = paragraph._element = None
def delete_docx_prefix_description(line, new_line, level, fontsize, alignment, isDelete = True):
    docx = Document(final_path)
    for p in docx.paragraphs:
        if line in p.text:
            ss = p.insert_paragraph_before(new_line)
            ss.alignment = alignment
            ss.style = docx.styles[level]
            for run in ss.runs:
                run.font.size = Pt(fontsize)
                run.font.name = 'Times New Roman'  # 控制是西文時的字體
                run.element.rPr.rFonts.set(qn('w:eastAsia'), '宋體')  # 控制是中文時的字體
                run.font.color.rgb = RGBColor(100, 149, 237)
                run.first_line_indent = Inches(0.0)
                run.italic = False
                run.bold = False
            if(isDelete):
                delete_paragraph(p)
    docx.styles['Normal'].font.name = 'Times New Roman'
    docx.styles['Normal']._element.rPr.rFonts.set(qn('w:eastAsia'), u'宋體')
    docx.save(final_path)
def add_content():
    """
    Returns
    -------
    在指定位置添加段落
    """
    for i in range(len(insert_content)):
        if i != 2:
            delete_docx_prefix_description(insert_text, insert_content[i], "Heading 1", 14, WD_ALIGN_PARAGRAPH.LEFT, False)
        else:
            delete_docx_prefix_description(insert_text, insert_content[i], "Heading 2", 13, WD_ALIGN_PARAGRAPH.LEFT, False)

def modify_styles():
    """
    Returns
    -------
    修改段落字體等樣式
    """
    for key ,value in replace_text_title_dict.items():
        delete_docx_prefix_description(key, value, "Heading 1", 26, WD_ALIGN_PARAGRAPH.CENTER)
    for key ,value in replace_text_second_dict.items():
        delete_docx_prefix_description(key, value, "Heading 2", 13, WD_ALIGN_PARAGRAPH.LEFT)
    for key ,value in replace_text_third_dict.items():
        delete_docx_prefix_description(key, value, "Heading 3", 11, WD_ALIGN_PARAGRAPH.LEFT)

    for key ,value in replace_text_forth_dict.items():
        delete_docx_prefix_description(key, value, "Heading 4", 11, WD_ALIGN_PARAGRAPH.LEFT)

def main():
    combine_docx()
    add_content()
    modify_styles()
    
if __name__ == '__main__':
    main()

案例2

from docx import Document
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
from docx.oxml.ns import qn
from docx.shared import Pt, RGBColor
document = Document()
 
# 設(shè)置正文字型 英文字型：Times New Roman; 中文字型：宋體
document.styles['Normal'].font.name = 'Times New Roman'
document.styles['Normal']._element.rPr.rFonts.set(qn('w:eastAsia'), u'宋體')
# 設(shè)置正文字體大小
document.styles['Normal'].font.size = Pt(12)
# 設(shè)置正文字體顏色
document.styles['Normal'].font.color.rgb = RGBColor(0, 0, 0)
 
"""標題比較特殊需要單獨設(shè)置"""
P = document.add_paragraph('')
run = P.add_run("靜夜思")
run.font.color.rgb = RGBColor(255, 0, 0)
run.font.size = Pt(14)
run.bold = True
P.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
 
# 添加段落
p_1 = document.add_paragraph('床前明月光\n疑是地上霜\n舉頭望明月\n低頭思故鄉(xiāng)')
# 段落居中對齊
p_1.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
  
filename = '靜夜思'
document.save('D:\桌面\{}.docx'.format(filename))

附表1——字號磅值對應(yīng)表

附表2——常用顏色RGB值對照表

附表3——Word內(nèi)置字符樣式中英文對照表

到此這篇關(guān)于如何使用python docx模塊操作word文檔的文章就介紹到這了,更多相關(guān)python操作word文檔內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

如何使用python?docx模塊操作word文檔

目錄

引言

打開文檔

正文應(yīng)用字符樣式(字體，大小，顏色)

添加標題

操作段落

添加段落

刪除段落

替換文字

設(shè)置段落對齊方式

字體格式

添加分頁符

添加表

添加圖片

圖像大小

應(yīng)用段落樣式

應(yīng)用粗體和斜體

應(yīng)用字符樣式

指定路徑并保存文件

實際案例

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

如何使用python?docx模塊操作word文檔

目錄

引言

打開文檔

正文應(yīng)用字符樣式(字體，大小，顏色)

添加標題

操作段落

添加段落

刪除段落

替換文字

設(shè)置段落對齊方式

字體格式

添加分頁符

添加表

添加圖片

圖像大小

應(yīng)用段落樣式

應(yīng)用粗體和斜體

應(yīng)用字符樣式

指定路徑并保存文件

實際案例

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

正文應(yīng)用字符樣式(字體，大小，顏色)