快捷導(dǎo)航

Java實(shí)現(xiàn)將Word轉(zhuǎn)換成Html的示例代碼

更新時(shí)間：2024年02月02日 10:21:45 作者：阿爾法哲

在業(yè)務(wù)中,常常會(huì)需要在瀏覽器中預(yù)覽Word文檔,或者需要將Word文檔轉(zhuǎn)成HTML文件保存,本文主要為大家詳細(xì)介紹了Java實(shí)現(xiàn)Word轉(zhuǎn)換成Html的相關(guān)方法,希望對(duì)大家有所幫助

前言

在業(yè)務(wù)中，如果需要在瀏覽器中預(yù)覽Word文檔，或者需要將Word文檔轉(zhuǎn)成HTML文件保存，那么本章內(nèi)容，可以幫助到你。

實(shí)現(xiàn)這一功能，有多種實(shí)現(xiàn)方式，如：docx4j、poi、Free Spire.Doc for Java、openoffice、jacob都可以實(shí)現(xiàn)轉(zhuǎn)換功能，但都有局限性。在這稍微介紹一下哈，大家可做個(gè)對(duì)比

docx4j

docx4j主要是針對(duì)docx文件進(jìn)行操作，操作的對(duì)象的Microsoft Open XML文件。

java當(dāng)中用于操作office(docx/xlsx/ppt)等文件的類庫(kù)

poi

POI是Apache軟件基金會(huì)的開(kāi)放源碼函式庫(kù)，POI提供API給Java程序?qū)icrosoft Office格式檔案讀和寫(xiě)的功能。

結(jié)構(gòu)：

HSSF －提供讀寫(xiě)Microsoft Excel格式檔案的功能。

XSSF －提供讀寫(xiě)Microsoft Excel OOXML格式檔案的功能。

HWPF －提供讀寫(xiě)Microsoft Word格式檔案的功能。

HSLF －提供讀寫(xiě)Microsoft PowerPoint格式檔案的功能。

HDGF －提供讀寫(xiě)Microsoft Visio格式檔案的功能。

Free Spire.Doc for Java(功能強(qiáng)大，但可以收費(fèi))

Free Spire.Doc for Java 是一款免費(fèi)、專業(yè)的 Java Word 組件，開(kāi)發(fā)人員使用它可以輕松地將 Word 文檔創(chuàng)建、讀取、編輯、轉(zhuǎn)換和打印等功能集成到自己的 Java 應(yīng)用程序中。作為一款完全獨(dú)立的組件，F(xiàn)ree Spire.Doc for Java的運(yùn)行環(huán)境無(wú)需安裝 Microsoft Office。

友情提示：免費(fèi)版有篇幅限制。在加載或保存 Word 文檔時(shí)，要求 Word 文檔不超過(guò) 500 個(gè)段落，25 個(gè)表格。同時(shí)將 Word 文檔轉(zhuǎn)換為 PDF 和 XPS 等格式時(shí)，僅支持轉(zhuǎn)換前三頁(yè)

openoffice

一、利用jodconverter(基于OpenOffice服務(wù))將文件(.doc、.docx、.xls、.ppt)轉(zhuǎn)化為html格式。

二、利用jodconverter(基于OpenOffice服務(wù))將文件(.doc、.docx、.xls、.ppt)轉(zhuǎn)化為pdf格式。需要用戶安裝了Adobe Reader XI

jacob（不能用于Linux）

需要引入jacob.jar jar包，并且jar包還要調(diào)用jacob.dll文件，需要事先把jacob.dll文件放到以下3處地方：C:\Windows\System32 目錄下，安裝的jdk文件夾下的bin目錄中，以及jre文件夾下的bin目錄（注意一定是你這個(gè)項(xiàng)目運(yùn)行所用到的jdk和jre）

它允許在java中調(diào)用com接口自動(dòng)組件，它使用JNI(本地調(diào)用進(jìn)程)來(lái)進(jìn)行本地調(diào)用COM庫(kù)。它可運(yùn)行在x86和支持32位和64位Java虛擬機(jī)

本文采用poi來(lái)進(jìn)行轉(zhuǎn)換

1、Poi轉(zhuǎn)換

1.1、引入依賴

<!-- WordToHtml .doc .odcx  poi  -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>4.1.2</version>
</dependency>
 
<!-- 操作excel的庫(kù) 注意版本保持一致 poi poi-ooxml  poi-scratchpad -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>4.1.2</version>
</dependency>
 
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>4.1.2</version>
</dependency>
 
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
    <version>2.0.2</version>
</dependency>
 
<!-- https://mvnrepository.com/artifact/fr.opensagres.xdocreport/fr.opensagres.xdocreport.converter.docx.xwpf -->
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
    <version>2.0.1</version>
</dependency>

1.2、工具類

poi轉(zhuǎn)換工具類

/**
 * poi WordToHtml工具類
 */
@Slf4j
public class WordToHtml {
 
//    文件上傳保存路徑
    @Value(value = "${upload.path}")
    private final String uploadPath = "";
 
    //轉(zhuǎn)換的方法
    public File convert(MultipartFile file) {
        //獲得文件的名字
        String filename = file.getOriginalFilename();
        //獲得文件的擴(kuò)展名
        String suffix = filename.substring(filename.lastIndexOf("."));
        String newName = UUID.randomUUID().toString();
        // TODO 需要保存在一個(gè)新的位置
        // File =new File 表示目錄的一個(gè)抽象,可以進(jìn)一步用exists()和isDirectory()方法判斷。
        File convFile = new File(uploadPath + newName + suffix);
        FileOutputStream fos = null;
        try {
            //創(chuàng)建文件
            convFile.createNewFile();
            //FileOutputStream 是輸出流 將文件輸出到磁盤或者數(shù)據(jù)庫(kù)中
            fos = new FileOutputStream(convFile);
            fos.write(file.getBytes());
        } catch (IOException ex) {
            log.error("上傳文件出錯(cuò)！", ex);
            return null;
        } finally {
            IOUtils.closeQuietly(fos);
        }
 
        // 輸入文件名的所在文件夾
        // 加上反斜杠
        String parentDirectory = convFile.getParent();
        if (!parentDirectory.endsWith("\\")) {
            parentDirectory = parentDirectory + "\\";
        }
 
        if (filename.endsWith(".docx")) {
            return docxConvert(parentDirectory, convFile.getAbsolutePath(), newName);
        } else if (filename.endsWith(".doc")) {
            return docConvert(parentDirectory, convFile.getAbsolutePath(), newName);
        } else {
            log.error("不支持的文件格式！");
            return null;
        }
    }
 
 
    /**
     * html 流文件 修改內(nèi)容 width:595.3pt;  因?yàn)檗D(zhuǎn)換的HTML頁(yè)面默認(rèn)內(nèi)容區(qū)域不是html自適應(yīng)大小，內(nèi)容位置不對(duì)
     * @param parentDirectory html文件所在文件夾
     * @param filename html舊文件地址
     * @param newName html新文件地址
     * @return
     */
    private File htmlreplace(String parentDirectory, String filename, String newName) {
        try {
//            讀取生成的Html
            FileInputStream inputStream = new FileInputStream(new File(parentDirectory + filename + ".html"));
            InputStream inputStrem = readInputStrem(inputStream);
//            清空文件內(nèi)容
            clearInfoForFile(parentDirectory + filename + ".html");
            // TODO: 2022/4/22 進(jìn)行流輸出Html文件 inputStrem
//            1、讀取內(nèi)容
            byte[] buffer = new byte[inputStrem.available()];
            inputStrem.read(buffer);
//            寫(xiě)入內(nèi)容
            OutputStream outStream = new FileOutputStream(new File(parentDirectory + newName + ".html"));
            outStream.write(buffer);
            return new File(parentDirectory + newName + ".html");
        } catch (FileNotFoundException e) {
            log.error("Html轉(zhuǎn)換失?。?,e);
            return null;
        } catch (IOException e) {
            log.error("Html轉(zhuǎn)換失?。?,e);
            return null;
        }
    }
 
    /**
     * 讀取HTML 流文件，并查詢當(dāng)中的width:595.3pt;  / white-space:pre-wrap; 或類似符號(hào)直接替換為空格
     *
     * @param inputStream
     * @return
     */
    private static InputStream readInputStrem(InputStream inputStream) {
//        匹配內(nèi)容
        String regEx_special = "width:595.3pt;";
 
        String regEx_special2 = "white-space:pre-wrap;";
 
//        替換新內(nèi)容
        String replace = "white-space:pre-wrap;word-break:break-all;";
        try {
            //<1>創(chuàng)建字節(jié)數(shù)組輸出流，用來(lái)輸出讀取到的內(nèi)容
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            //<2>創(chuàng)建緩存大小
            byte[] buffer = new byte[1024]; // 1KB
            //每次讀取到內(nèi)容的長(zhǎng)度
            int len = -1;
            //<3>開(kāi)始讀取輸入流中的內(nèi)容
            while ((len = inputStream.read(buffer)) != -1) { //當(dāng)?shù)扔?1說(shuō)明沒(méi)有數(shù)據(jù)可以讀取了
                baos.write(buffer, 0, len);   //把讀取到的內(nèi)容寫(xiě)到輸出流中
            }
            //<4> 把字節(jié)數(shù)組轉(zhuǎn)換為字符串
            String content = baos.toString();
            //<5>關(guān)閉輸入流和輸出流
//            inputStream.close();
            baos.close();
//            log.info("讀取的內(nèi)容：{}", content);
//            判斷HTML內(nèi)容是否具有HTML的 width:595.3pt;
            Pattern compile = Pattern.compile(regEx_special, Pattern.CASE_INSENSITIVE);
            Matcher matcher = compile.matcher(content);
            String replaceAll = matcher.replaceAll("");
//            判斷是否具有white-space:pre-wrap;
            Pattern compile2 = Pattern.compile(regEx_special2, Pattern.CASE_INSENSITIVE);
            Matcher matcher2 = compile2.matcher(replaceAll);
            String replaceAll2 = matcher2.replaceAll(replace);
//            log.info("替換后的內(nèi)容：{}", replaceAll2);
//            將字符串轉(zhuǎn)化為輸入流返回
            InputStream stringStream = getStringStream(replaceAll2);
            //<6>返回結(jié)果
            return stringStream;
        } catch (Exception e) {
            e.printStackTrace();
            log.error("錯(cuò)誤信息：{}", e.getMessage());
            return null;
        }
    }
 
    /**
     * 將一個(gè)字符串轉(zhuǎn)化為輸入流
     * @param sInputString 字符串
     * @return
     */
    public static InputStream getStringStream(String sInputString) {
        if (sInputString != null && !sInputString.trim().equals("")) {
            try {
                ByteArrayInputStream tInputStringStream = new ByteArrayInputStream(sInputString.getBytes());
                return tInputStringStream;
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        return null;
    }
 
    /**
     * 清空文件內(nèi)容
     * @param fileName
     */
    public static void clearInfoForFile(String fileName) {
        File file =new File(fileName);
        try {
            if(!file.exists()) {
                file.createNewFile();
            }
            FileWriter fileWriter =new FileWriter(file);
            fileWriter.write("");
            fileWriter.flush();
            fileWriter.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
 
    /**
     * 轉(zhuǎn)換.docx   當(dāng)word文檔字體大于5號(hào)字體時(shí)，會(huì)出現(xiàn)不規(guī)律排列文字換行(因?yàn)檗D(zhuǎn)換的HTML頁(yè)面默認(rèn)內(nèi)容區(qū)域不是html原始區(qū)域)
     * @param parentDirectory html文件所在文件夾 （主要用于圖像的管理）
     * @param filename word文件地址
     * @param newName html文件地址
     * @return
     */
    private File docxConvert(String parentDirectory, String filename, String newName) {
        try {
            // 1) 加載word文檔生成 XWPFDocument對(duì)象
            XWPFDocument document = new XWPFDocument(new FileInputStream(filename));
 
//           設(shè)置存放圖片地址
            XHTMLOptions options = XHTMLOptions.create().setImageManager(new ImageManager(new File(parentDirectory), UUID.randomUUID().toString())).indent(4);
            OutputStream out = new FileOutputStream(new File(parentDirectory + newName + ".html"));
//            自定義編碼格式
            OutputStreamWriter writer = new OutputStreamWriter(out,"GBK");
//            生成HTML
            XHTMLConverter xhtmlConverter = (XHTMLConverter)XHTMLConverter.getInstance();
            xhtmlConverter.convert(document, writer, options);
//            將生成的HTML進(jìn)行內(nèi)容匹配替換
            File htmlreplace = htmlreplace(parentDirectory, newName, newName);
            return htmlreplace;
//            return new File(parentDirectory + newName + ".html");
        } catch (IOException ex) {
            log.error("word轉(zhuǎn)化出錯(cuò)！", ex);
            return null;
        }
 
    }
 
 
    /**
     * 轉(zhuǎn)換.doc
     * @param parentDirectory html文件所在文件夾 （主要用于圖像的管理）
     * @param filename word文件地址
     * @param newName html文件地址
     * @return
     */
    private File docConvert(String parentDirectory, String filename, String newName) {
        try {
            HWPFDocument document = new HWPFDocument(new FileInputStream(filename));
            WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
                    DocumentBuilderFactory.newInstance().newDocumentBuilder()
                            .newDocument());
 
            // converter默認(rèn)對(duì)圖片不作處理，需要手動(dòng)下載圖片并嵌入到html中
            wordToHtmlConverter.setPicturesManager(new PicturesManager() {
                @Override
                public String savePicture(byte[] bytes, PictureType pictureType, String s, float v, float v1) {
                    String imageFilename = parentDirectory + "";
                    String identity = UUID.randomUUID().toString();
                    File imageFile = new File(imageFilename, identity + s);
                    imageFile.getParentFile().mkdirs();
                    InputStream in = null;
                    FileOutputStream out = null;
 
                    try {
                        in = new ByteArrayInputStream(bytes);
                        out = new FileOutputStream(imageFile);
                        IOUtils.copy(in, out);
 
                    } catch (IOException ex) {
                        log.error("word轉(zhuǎn)化出錯(cuò)！", ex);
                    } finally {
                        if (in != null) {
                            IOUtils.closeQuietly(in);
                        }
 
                        if (out != null) {
                            IOUtils.closeQuietly(out);
                        }
 
                    }
                    return imageFile.getName();
                }
            });
 
            wordToHtmlConverter.processDocument(document);
            Document htmlDocument = wordToHtmlConverter.getDocument();
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            DOMSource domSource = new DOMSource(htmlDocument);
            StreamResult streamResult = new StreamResult(out);
 
//            設(shè)置轉(zhuǎn)換屬性
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer serializer = tf.newTransformer();
            serializer.setOutputProperty(OutputKeys.ENCODING, "GBK");
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");
            serializer.setOutputProperty(OutputKeys.METHOD, "html");
            serializer.transform(domSource, streamResult);
            out.close();
 
            String result = new String(out.toByteArray());
            FileWriter writer = new FileWriter(parentDirectory + newName + ".html");
            writer.write(result);
            writer.close();
        } catch (IOException | TransformerException | ParserConfigurationException ex) {
            log.error("word轉(zhuǎn)化出錯(cuò)！", ex);
        }
        return new File(parentDirectory + newName + ".html");
    }
 
    /**
     * 將上傳的Word文檔轉(zhuǎn)化成HTML字符串
     *
     * @param file
     * @return
     */
    public String convertToHtml(MultipartFile file) {
        String wordContent = "";
        // 將Word文件轉(zhuǎn)換為html
        File file2 = convert(file);
        // 讀取html文件
        if (file2 != null) {
            return "文件轉(zhuǎn)換成功";
        }
        return "文件轉(zhuǎn)換失敗";
    }
 
    /**
     * wordToHtml
     * @param wordFilePath word文件路徑
     * @param htmlFilePath html文件路徑
     * @throws IOException
     * @throws ParserConfigurationException
     * @throws TransformerException
     */
    public static File wordToHtml(String wordFilePath,String htmlFilePath) {
//        提取出word文檔名稱和后綴
        String filename = wordFilePath.substring(wordFilePath.lastIndexOf("/")+1);
//        提取出html文件存放路徑和文件名稱
        String newName = htmlFilePath.substring(htmlFilePath.lastIndexOf("/")+1,htmlFilePath.lastIndexOf("."));
        File convFile = new File(htmlFilePath);
        // 輸入文件名的所在文件夾
        // 加上反斜杠
        String parentDirectory = convFile.getParent();
        if (!parentDirectory.endsWith("\\")) {
            parentDirectory = parentDirectory + "\\";
        }
 
        if (filename.endsWith(".docx")) {
            return new WordToHtml().docxConvert(parentDirectory, wordFilePath, newName);
        } else if (filename.endsWith(".doc")) {
            return new WordToHtml().docConvert(parentDirectory, wordFilePath, newName);
        } else {
            log.error("不支持的文件格式！");
            return null;
        }
 
    }
}

1.3、測(cè)試類

/**
 * @Author：wk
 * @Create：2022/4/21/15:10
 * @Description：WordToHtml測(cè)試類 poi
 * @Version：1.0
 */
@Slf4j
public class WordToHtmlTest {
 
    public static void main(String[] args) {
        long timeMillis = System.currentTimeMillis();
        log.info("開(kāi)始轉(zhuǎn)換！");
        String wordFilePath = "src/main/resources/word/nc.docx";
        String htmlFilePath = "src/main/resources/html/nc5.html";
        File file = WordToHtml.wordToHtml(wordFilePath, htmlFilePath);
        // 讀取html文件
        if (file != null) {
            log.info("文件存放路徑：{}",file.getPath());
            log.info("轉(zhuǎn)換結(jié)束！用時(shí)：{}ms",System.currentTimeMillis()-timeMillis);
            return;
        }
        log.error("文件轉(zhuǎn)換失敗！");
    }
 
}

測(cè)試效果（真實(shí)效果存在較小差異）由于截圖一頁(yè)顯示不全，所以文檔和頁(yè)面都做了響應(yīng)調(diào)整哈

.doc

.docx(文檔是一樣的，此處就不截屏了哈)

到此這篇關(guān)于Java實(shí)現(xiàn)將Word轉(zhuǎn)換成Html的示例代碼的文章就介紹到這了,更多相關(guān)Java Word轉(zhuǎn)Html內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: