詳解用 python-docx 創(chuàng)建浮動圖片

更新時間：2021年01月24日 10:47:05 作者：Python中文社區(qū)

這篇文章主要介紹了詳解用 python-docx 創(chuàng)建浮動圖片，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧

相信大家對python-docx這個常用的操作docx文檔的庫都不陌生，它支持以內(nèi)聯(lián)形狀（Inline Shape）的形式插入圖片，即圖片和文本之間沒有重疊，遵循流動版式（flow layout）。但是，截至最新的0.8.10版本，python-docx尚不支持插入浮動圖片（floating picture）。這顯然不能滿足豐富多彩的文檔樣式的需要，因此本文探究基于python-docx插入浮動圖片——剖析xml、追蹤源碼，最后得到完整代碼。

問題提出

作者在嘗試實現(xiàn)PDF文檔轉docx（pdf2docx：https://github.com/dothinking/pdf2docx，開發(fā)中）的過程中遇到一個需求：根據(jù)背景圖片在PDF頁面的具體位置（例如左上角坐標和圖片區(qū)域的長寬），將其重現(xiàn)到docx頁面的相應位置。考慮到背景圖片與文本的重疊，這就需要實現(xiàn)精確定位的浮動圖片，參考下圖示例。

Word中的設置

我們先嘗試在Office Word中，手動解決上述問題。具備基礎的Word使用經(jīng)驗即可知，通過設置圖片版式來控制圖片的浮動和具體位置。

上圖版式設置中的文本環(huán)繞樣式，大體可以分為三類：

分類	文本重疊	自由定位	樣式名稱
嵌入型	否	否	In line with text
環(huán)繞型	否	是	Square, Tight, Through, Top and bottom
完全浮動	是	是	behind text, In front of text

例如最常見的嵌入型圖片，它占據(jù)了整行區(qū)域，我們既不能將其與文字重疊，也不能自由放置它的位置，而是由頁面排版自動確定。對于環(huán)繞型圖片，文本可以進入圖片所在行，但是無法與之重疊；并且，我們可以用鼠標自由拖動其位置。完全浮動型圖片則可以浮于文本上方或者襯于文本下方，同時支持隨意放置其位置。

如果需要精確定位，則可在圖片版式的位置（Position）選項卡進行設置。它提供了多種定位方式，例如絕對定位——根據(jù)圖片左上角點距離水平和豎直參考的坐標值來定位。至于參考對象，可以是頁面（Page）本身，這樣(0, 0)就是頁面左上角；也可以是邊距（Margin），此時(0, 0)即為正文區(qū)域的左上角。

綜上，我們需要實現(xiàn)精確定位的襯于文本下方的圖片版式。

docx背后的xml

我們還知道，docx文檔的背后是xml格式的數(shù)據(jù)，python-docx正是通過處理xml的方式來讀寫word文檔。所以，接下來先手工創(chuàng)建word文檔，然后查看圖片部分的xml內(nèi)容。

作為對比，首先分別創(chuàng)建一個普通嵌入型圖片文件和一個襯于文本下方的浮動型圖片文件。然后執(zhí)行查看步驟：右鍵docx文件 | 7-zip打開壓縮包 | word | document.xml，復制文件內(nèi)容并格式化xml，得到如下的關于圖片部分的片段。為了便于對比分析，刪除了一些節(jié)點屬性。

內(nèi)聯(lián)圖片片段：

<w:drawing>
    <wp:inline>
        <wp:extent cx="3297600" cy="2782800"/>
        <wp:effectExtent l="0" t="0" r="0" b="0"/>
        <wp:docPr id="1" name="Picture 1"/>
        <wp:cNvGraphicFramePr>
            <a:graphicFrameLocks/>
        </wp:cNvGraphicFramePr>
        <a:graphic>
            <a:graphicData>
                <pic:pic>
                    <!-- more pic content -->
                </pic:pic>
            </a:graphicData>
        </a:graphic>
    </wp:inline>
</w:drawing>

浮動圖片片段：

<w:drawing>
    <wp:anchor behindDoc="1" locked="0" layoutInCell="1" allowOverlap="1">
        <wp:simplePos x="0" y="0"/>
        <wp:positionH relativeFrom="page">
            <wp:posOffset>285750</wp:posOffset>
        </wp:positionH>
        <wp:positionV relativeFrom="page">
            <wp:posOffset>457200</wp:posOffset>
        </wp:positionV>
        <wp:extent cx="3297600" cy="2782800"/>
        <wp:effectExtent l="0" t="0" r="0" b="0"/>
        <wp:wrapNone/>
        <wp:docPr id="1" name="Picture 1"/>
        <wp:cNvGraphicFramePr>
            <a:graphicFrameLocks/>
        </wp:cNvGraphicFramePr>
        <a:graphic>
            <a:graphicData>
                <pic:pic>
                    <!-- more pic content -->
                </pic:pic>
            </a:graphicData>
        </a:graphic>
    </wp:anchor>
</w:drawing>

對比發(fā)現(xiàn)以下相同/相似點：

兩類圖片都放在<w:drawing>節(jié)點下：內(nèi)聯(lián)圖片<wp:inline>，浮動圖片<wp:anchor>
具備相同的內(nèi)容節(jié)點：<wp:extent>、<wp:docPr>、<a:graphic>等

除此之外，浮動圖片還有一些獨有特征，并且我們可以從命名上猜測和解讀：

<wp:anchor>節(jié)點的behindDoc屬性表明圖片版式為襯于文本下方

<wp:positionH>和<wp:positionV>節(jié)點表明水平和豎直絕對定位方式，其中：

relativeFrom屬性指定用于定位的參考對象
子節(jié)點<wp:posOffset>指定具體坐標值

從內(nèi)聯(lián)圖片開始

從xml的結構對比來看，我們完全可以根據(jù)python-docx對內(nèi)聯(lián)圖片的實現(xiàn)來插入浮動圖片。于是，從插入內(nèi)聯(lián)圖片的代碼入手：

from docx import Document
from docx.shared import Pt
 
document = Document()
document.add_picture('image.jpg', width=Pt(200))
document.save('output.docx')

從python-docx安裝文件夾site-packages/docx進行內(nèi)容搜索add_picture，得到docx.text.run.add_picture原始定義處：

def add_picture(self, image_path_or_stream, width=None, height=None):
    inline = self.part.new_pic_inline(image_path_or_stream, width, height)
    self._r.add_drawing(inline)
    return InlineShape(inline)

繼續(xù)搜索new_pic_inline得到docx.parts.story.BaseStoryPart.new_pic_inline。從注釋可知這是利用CT_Inline類創(chuàng)建<wp:inline>元素，因此后續(xù)創(chuàng)建浮動圖片的<wp:anchor>可以在此基礎上修改。

def new_pic_inline(self, image_descriptor, width, height):
    """Return a newly-created `w:inline` element.
    The element contains the image specified by *image_descriptor* and is scaled
    based on the values of *width* and *height*.
    """
    rId, image = self.get_or_add_image(image_descriptor)
    cx, cy = image.scaled_dimensions(width, height)
    shape_id, filename = self.next_id, image.filename
    return CT_Inline.new_pic_inline(shape_id, rId, filename, cx, cy)

于是進入CT_Inline類（限于篇幅，刪除了前兩個類方法new和new_pic_inline的具體代碼）——終于見到了一開始探索的xml代碼：

class CT_Inline(BaseOxmlElement):
    """
    ``<w:inline>`` element, container for an inline shape.
    """
    @classmethod
    def new(cls, cx, cy, shape_id, pic):
        pass
 
    @classmethod
    def new_pic_inline(cls, shape_id, rId, filename, cx, cy):
        pass
 
    @classmethod
    def _inline_xml(cls):
        return (
            '<wp:inline %s>\n'
            '  <wp:extent cx="914400" cy="914400"/>\n'
            '  <wp:docPr id="666" name="unnamed"/>\n'
            '  <wp:cNvGraphicFramePr>\n'
            '    <a:graphicFrameLocks noChangeAspect="1"/>\n'
            '  </wp:cNvGraphicFramePr>\n'
            '  <a:graphic>\n'
            '    <a:graphicData uri="URI not set"/>\n'
            '  </a:graphic>\n'
            '</wp:inline>' % nsdecls('wp', 'a', 'pic', 'r')
        )

簡單掃一下CT_Inline類的三個方法，即可將它們聯(lián)系上：

_inline_xml()方法給出內(nèi)聯(lián)圖片<wp:inline>的xml結構。
new()方法調(diào)用_inline_xml()，并為其中的子節(jié)點例如<wp:extent>和<wp:docPr>賦值。
new_pic_inline()調(diào)用new()，同時拼接CT_Picture類的結果（節(jié)點<pic:pic>，即圖片的具體內(nèi)容）到<a:graphicData>節(jié)點中去。

綜上，實現(xiàn)了內(nèi)聯(lián)圖片的完整xml結構。

插入浮動圖片

從xml結構的對比及上述python-docx對內(nèi)聯(lián)圖片的實現(xiàn)，得到創(chuàng)建浮動圖片的思路：

初始化<wp:anchor>結構，例如behindDoc="1"指定圖片版式為襯于文本下方
使用類似的代碼填充<wp:anchor>元素，尤其是<wp:extent>、<wp:docPr>和<pic:pic>
填充<wp:positionH>和<wp:positionV>精確定位圖片

具體實踐中發(fā)現(xiàn)還有關鍵的一步——注冊xml標簽名稱到對應的類，例如<wp:inline>和CT_Inline：

# docx.oxml.__init__.py
register_element_cls('wp:inline', CT_Inline)

綜上，利用python-docx插入浮動圖片（襯于文本下方、頁面定位）的完整代碼如下：

# -*- coding: utf-8 -*-
 
# filename: add_float_picture.py
 
'''
Implement floating image based on python-docx.
- Text wrapping style: BEHIND TEXT <wp:anchor behindDoc="1">
- Picture position: top-left corner of PAGE `<wp:positionH relativeFrom="page">`.
Create a docx sample (Layout | Positions | More Layout Options) and explore the 
source xml (Open as a zip | word | document.xml) to implement other text wrapping
styles and position modes per `CT_Anchor._anchor_xml()`.
'''
 
from docx.oxml import parse_xml, register_element_cls
from docx.oxml.ns import nsdecls
from docx.oxml.shape import CT_Picture
from docx.oxml.xmlchemy import BaseOxmlElement, OneAndOnlyOne
 
# refer to docx.oxml.shape.CT_Inline
class CT_Anchor(BaseOxmlElement):
    """
    ``<w:anchor>`` element, container for a floating image.
    """
    extent = OneAndOnlyOne('wp:extent')
    docPr = OneAndOnlyOne('wp:docPr')
    graphic = OneAndOnlyOne('a:graphic')
 
    @classmethod
    def new(cls, cx, cy, shape_id, pic, pos_x, pos_y):
        """
        Return a new ``<wp:anchor>`` element populated with the values passed
        as parameters.
        """
        anchor = parse_xml(cls._anchor_xml(pos_x, pos_y))
        anchor.extent.cx = cx
        anchor.extent.cy = cy
        anchor.docPr.id = shape_id
        anchor.docPr.name = 'Picture %d' % shape_id
        anchor.graphic.graphicData.uri = (
            'http://schemas.openxmlformats.org/drawingml/2006/picture'
        )
        anchor.graphic.graphicData._insert_pic(pic)
        return anchor
 
    @classmethod
    def new_pic_anchor(cls, shape_id, rId, filename, cx, cy, pos_x, pos_y):
        """
        Return a new `wp:anchor` element containing the `pic:pic` element
        specified by the argument values.
        """
        pic_id = 0  # Word doesn't seem to use this, but does not omit it
        pic = CT_Picture.new(pic_id, filename, rId, cx, cy)
        anchor = cls.new(cx, cy, shape_id, pic, pos_x, pos_y)
        anchor.graphic.graphicData._insert_pic(pic)
        return anchor
    @classmethod
    def _anchor_xml(cls, pos_x, pos_y):
        return (
            '<wp:anchor distT="0" distB="0" distL="0" distR="0" simplePos="0" relativeHeight="0" \n'
            '           behindDoc="1" locked="0" layoutInCell="1" allowOverlap="1" \n'
            '           %s>\n'
            '  <wp:simplePos x="0" y="0"/>\n'
            '  <wp:positionH relativeFrom="page">\n'
            '    <wp:posOffset>%d</wp:posOffset>\n'
            '  </wp:positionH>\n'
            '  <wp:positionV relativeFrom="page">\n'
            '    <wp:posOffset>%d</wp:posOffset>\n'
            '  </wp:positionV>\n'                    
            '  <wp:extent cx="914400" cy="914400"/>\n'
            '  <wp:wrapNone/>\n'
            '  <wp:docPr id="666" name="unnamed"/>\n'
            '  <wp:cNvGraphicFramePr>\n'
            '    <a:graphicFrameLocks noChangeAspect="1"/>\n'
            '  </wp:cNvGraphicFramePr>\n'
            '  <a:graphic>\n'
            '    <a:graphicData uri="URI not set"/>\n'
            '  </a:graphic>\n'
            '</wp:anchor>' % ( nsdecls('wp', 'a', 'pic', 'r'), int(pos_x), int(pos_y) )
        )
# refer to docx.parts.story.BaseStoryPart.new_pic_inline
def new_pic_anchor(part, image_descriptor, width, height, pos_x, pos_y):
    """Return a newly-created `w:anchor` element.
    The element contains the image specified by *image_descriptor* and is scaled
    based on the values of *width* and *height*.
    """
    rId, image = part.get_or_add_image(image_descriptor)
    cx, cy = image.scaled_dimensions(width, height)
    shape_id, filename = part.next_id, image.filename    
    return CT_Anchor.new_pic_anchor(shape_id, rId, filename, cx, cy, pos_x, pos_y)
# refer to docx.text.run.add_picture
def add_float_picture(p, image_path_or_stream, width=None, height=None, pos_x=0, pos_y=0):
    """Add float picture at fixed position `pos_x` and `pos_y` to the top-left point of page.
    """
    run = p.add_run()
    anchor = new_pic_anchor(run.part, image_path_or_stream, width, height, pos_x, pos_y)
    run._r.add_drawing(anchor)
# refer to docx.oxml.__init__.py
register_element_cls('wp:anchor', CT_Anchor)

示例

最后，來一個例子看看結果吧：

from docx import Document
from docx.shared import Inches, Pt
from add_float_picture import add_float_picture
 
if __name__ == '__main__':
 
    document = Document()
 
    # add a floating picture
    p = document.add_paragraph()
    add_float_picture(p, 'test.png', width=Inches(5.0), pos_x=Pt(20), pos_y=Pt(30))
 
    # add text
    p.add_run('Hello World '*50)
 
    document.save('output.docx')

作者：crazyhat，Python及科學計算愛好者

到此這篇關于詳解用 python-docx 創(chuàng)建浮動圖片的文章就介紹到這了,更多相關python-docx 浮動圖片內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

Python numpy多維數(shù)組實現(xiàn)原理詳解
這篇文章主要介紹了python numpy多維數(shù)組實現(xiàn)原理詳解,文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下
2020-03-03
python量化之搭建Transformer模型用于股票價格預測
這篇文章主要介紹了python量化之搭建Transformer模型用于股票價格預測，文章圍繞主題展開基于python搭建Transformer，需要的小伙伴可以參考一下
2022-05-05
python實現(xiàn)的文件夾清理程序分享
這篇文章主要介紹了python實現(xiàn)的文件夾清理程序分享,可以按時間清理和指定配置文件清理,需要的朋友可以參考下
2014-11-11
Python圖片縮放cv2.resize()圖文詳解
這篇文章主要給大家介紹了關于Python圖片縮放cv2.resize()的相關資料, resize是opencv庫中的一個函數(shù),主要起到對圖片進行縮放的作用,文中通過代碼介紹的非常詳細,需要的朋友可以參考下
2023-10-10
利用matplotlib實現(xiàn)兩張子圖分別畫函數(shù)圖
這篇文章主要介紹了利用matplotlib實現(xiàn)兩張子圖分別畫函數(shù)圖問題,具有很好的參考價值,希望對大家有所幫助,如有錯誤或未考慮完全的地方,望不吝賜教
2023-08-08
python實現(xiàn)nao機器人手臂動作控制
這篇文章主要為大家詳細介紹了python實現(xiàn)nao機器人手臂動作控制，具有一定的參考價值，感興趣的小伙伴們可以參考一下
2019-04-04
numpy判斷數(shù)值類型、過濾出數(shù)值型數(shù)據(jù)的方法
今天小編就為大家分享一篇numpy判斷數(shù)值類型、過濾出數(shù)值型數(shù)據(jù)的方法，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2018-06-06
一行Python代碼實現(xiàn)為圖片上版權
不知道大家會不會遇到這樣的情況，自己辛辛苦苦整理的攻略，分享給自己的一些朋友，結果分享有人堂而皇之地拿著這份攻略圖片去引流，并聲稱是自己整理的，真是豈有此理！本文就來用Python實現(xiàn)為圖片上版權，需要的可以參考一下
2023-01-01
使用Python在Word中查找并高亮指定文本
當你需要在長文檔或報告中快速找到特定的關鍵詞或短語,Word中提供的查找并高亮這一功能可以幫助你迅速定位這些內(nèi)容,本文將介紹如何使用Python在Word中查找并突出顯示指定的文本,需要的朋友可以參考下
2024-03-03
Python獲取文件所在目錄和文件名的方法
下面小編就為大家?guī)硪黄狿ython獲取文件所在目錄和文件名的方法。小編覺得挺不錯的，現(xiàn)在就分享給大家，也給大家做個參考。一起跟隨小編過來看看吧
2017-01-01