快捷導(dǎo)航

python讀取json數(shù)據(jù)還原表格批量轉(zhuǎn)換成html

更新時間：2022年03月04日 09:51:22 作者：kuokay

這篇文章主要介紹了python讀取json數(shù)據(jù)還原表格批量轉(zhuǎn)換成html，由于需要對ocr識別系統(tǒng)的表格識別結(jié)果做驗證，通過返回的json文件結(jié)果對比比較麻煩，故需要將json文件里面的識別結(jié)果還原為表格做驗證，下面詳細(xì)內(nèi)容需要的小伙伴可以參考一下

{"row":"6","col","5""start_row": 0, "start_column": 0, "end_row": 0, "end_column": 0, "data": "稱", "position": [51, 71, 168, 93], "org_position": [50, 60, 167, 62, 166, 84, 49, 82], "char_position": [[86, 83, 100, 100]], "lines": [{"text": "稱", "poly": [84, 73, 98, 73, 98, 90, 84, 90, 0.874], "score": 0.874, "char_centers": [[91, 82]], "char_polygons": [[84, 77, 98, 74, 98, 87, 84, 90]], "char_candidates": [["稱"]], "char_candidates_score": [[0.999]], "char_scores": [0.999]}]}

現(xiàn)在需要通過行列的起始和結(jié)束坐標(biāo)以及內(nèi)容生成相應(yīng)的表格

開始準(zhǔn)備使用js但由于一些語法忘記，所以還是選用python進(jìn)行。
在經(jīng)過一些列研究后發(fā)現(xiàn)利用python-docx可自動生成表格，但是格式是word的，所有后期又進(jìn)行了word轉(zhuǎn)html操作。

一、實操

pip install python_docx

1.首先創(chuàng)建一個新的文檔

from docx import Document
document = Document()

然后用Document類的add_table方法增加一個表格，其中rows是行,cols是列,style表格樣式，具體可以查看官方文檔：

table = document.add_table(rows=37,cols=13,style='Table Grid')

上述代碼就在word里插入了一個37行、13列的表格。（有37*13=481個cell）

生成的每個cell都是有“坐標(biāo)”的，比如上面的表格左上角cell為（0，0），右下角cell為（36，12）

下面要做的就是合并一些cell，從而達(dá)到我們最終需要的表格

table.cell(0,0).merge(table.cell(2,2))

上述代碼就將cell(0,0)到cell(2,2)之間的所有cell合并成一個cell

這里需要注意的是，雖然每個cell都合并了，但其實它還是存在的。比如合并了(0,0)和(0,1)兩個cell，那么這個合并的cell其實就是(0,0;0,1)

如果cell較多，無法直觀的看出坐標(biāo)的話，可以用下列的代碼將每個cell的坐標(biāo)都標(biāo)注出來，方便合并

document = Document()
table = document.add_table(rows=37,cols=13,style='Table Grid')

document.save('table-1.docx')

document1 = Document('table-1.docx')
table = document1.tables[0]
for row,obj_row in enumerate(table.rows):
? ?for col,cell in enumerate(obj_row.cells):
? ? ? ?cell.text = cell.text + "%d,%d " % (row,col)

document1.save('table-2.docx')

2.添加文本

將所有cell依次合并后，就需要向合并后的cell里添加文本。

用table的row方法可以得到一個表格的一行l(wèi)ist其中包含了這一行的所有cell

hdr_cells0 = table.rows[0].cells

上面代碼就得到了合并表格后的第一行所有cell，然后我們用hdr_cell0[0]就可以得到合并表格后的第一行的第一個cell。用add_paragraph方法即可像cell里添加文本

hdr_cells0[0].add_paragraph('數(shù)據(jù)文字')

其他使用方法可參考官網(wǎng)模塊:https://www.osgeo.cn/python-docx/

二、word轉(zhuǎn)成html

1.使用pydocx轉(zhuǎn)換

pip install pydocx

from pydocx import PyDocX
html = PyDocX.to_html("test.docx")
f = open("test.html", 'w', encoding="utf-8")
f.write(html)
f.close()

通過網(wǎng)頁上傳word文檔，只接收docx

<form method="post" enctype="multipart/form-data">
<input type="file" name="file" accept="application/vnd.openxmlformats-officedocument.wordprocessingml.document">
</form>

2.使用win32模塊

pip3 install pypiwin32

from win32com import client as wc
import os

word = wc.Dispatch('Word.Application')


def wordsToHtml(dir):
? ? for path, subdirs, files in os.walk(dir):
? ? ? ? for wordFile in files:
? ? ? ? ? ? wordFullName = os.path.join(path, wordFile)
? ? ? ? ? ? doc = word.Documents.Open(wordFullName)

? ? ? ? ? ? wordFile2 = wordFile
? ? ? ? ? ? dotIndex = wordFile2.rfind(".")
? ? ? ? ? ? if (dotIndex == -1):
? ? ? ? ? ? ? ? print(wordFullName + "********************ERROR: 未取得后綴名！")

? ? ? ? ? ? fileSuffix = wordFile2[(dotIndex + 1):]
? ? ? ? ? ? if (fileSuffix == "doc" or fileSuffix == "docx"):
? ? ? ? ? ? ? ? fileName = wordFile2[: dotIndex]
? ? ? ? ? ? ? ? htmlName = fileName + ".html"
? ? ? ? ? ? ? ? htmlFullName = os.path.join(path, htmlName)
? ? ? ? ? ? ? ? print("generate html:" + htmlFullName)
? ? ? ? ? ? ? ? doc.SaveAs(htmlFullName, 10)
? ? ? ? ? ? ? ? doc.Close()

? ? word.Quit()
? ? print("")
? ? print("Finished!")


if __name__ == '__main__':
? ? import sys

? ? if len(sys.argv) != 2:
? ? ? ? print("Usage: python funcName.py rootdir")
? ? ? ? sys.exit(100)
? ? wordsToHtml(sys.argv[1])

到此這篇關(guān)于python讀取json數(shù)據(jù)還原表格批量轉(zhuǎn)換成html的文章就介紹到這了,更多相關(guān)python讀取json數(shù)據(jù)內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: