- How to Covert Docx File to PDF using Apache POI Library in Java
- TECHNOLOGIES USED IN THIS TUTORIAL
- PROJECT DIRECTORY STRUCTURE
- Project Dependencies
- Docx to PDF conversion implementation
- Java Convert .docx File to .pdf File using XDocReport
- Add XDocReport Converter DOCX XWPF Dependency to Java Project
- How to convert .docx file to .pdf file in Java
- How to Use FileConverter Class to convert Word to PDF File
- Saved searches
- Use saved searches to filter your results more quickly
- License
- yeokm1/docs-to-pdf-converter
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Saved searches
- Use saved searches to filter your results more quickly
- de-jcup/xdocreport-testapplication
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.adoc
How to Covert Docx File to PDF using Apache POI Library in Java
In this article we will cover how to convert docx file to a pdf using the Apache POI library. You can see in this post how easy it is to convert a Word’s (.docx) to a PDF (.pdf) file. DOCX TO PDF CONVERSION using Apache POI Library. The Project is very simple maven project and it needs only one dependency.
TECHNOLOGIES USED IN THIS TUTORIAL
PROJECT DIRECTORY STRUCTURE
You can create the project directory structure manually or using the maven new simple project. Our project’s Directory structure is as below. For the simplicity we have used the project’s home directory for the rdtschools-Docx2PdfConversion-word-sample.docx file and the pdf file will also be generated at the same place.
Docx2PdfConversion ├── pom.xml ├── rdtschools-Docx2PdfConversion-word-sample.docx ├── src │ ├── main │ │ └── java │ │ └── com │ │ └── rdtschools │ │ └── Docx2PdfConversion.java │ └── test └── target
Project Dependencies
We only need one Apache POI’s following dependency to add in the project’s pom file. After adding the dependency the project’s pom file will look like this.
4.0.0 com.rdtschools.docx2pdf Docx2PdfConversion 0.0.1 fr.opensagres.xdocreport org.apache.poi.xwpf.converter.pdf 1.0.0
Docx to PDF conversion implementation
You can see in this class that its very simple to convert a word’s (.docx) document to a PDF (.pdf).
package com.rdtschools; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.InputStream; import java.io.OutputStream; import org.apache.poi.xwpf.converter.pdf.PdfConverter; import org.apache.poi.xwpf.converter.pdf.PdfOptions; import org.apache.poi.xwpf.usermodel.XWPFDocument; public class Docx2PdfConversion < public static void main(String[] args) < try (InputStream is = new FileInputStream(new File("rdtschools-Docx2PdfConversion-word-sample.docx")); OutputStream out = new FileOutputStream(new File("rdtschools-Docx2PdfConverted_PDF_File.pdf"));) < long start = System.currentTimeMillis(); // 1) Load DOCX into XWPFDocument XWPFDocument document = new XWPFDocument(is); // 2) Prepare Pdf options PdfOptions options = PdfOptions.create(); // 3) Convert XWPFDocument to Pdf PdfConverter.getInstance().convert(document, out, options); System.out.println("rdtschools-Docx2PdfConversion-word-sample.docx was converted to a PDF file in :: " + (System.currentTimeMillis() - start) + " milli seconds"); >catch (Throwable e) < e.printStackTrace(); >> >
Once you build and run it as a java application the word file will be converted to a pdf file and will be saved in the project’s root directory. As you can see I . added a print statement to print the conversion time so the output will be as follows.
rdtschools-Docx2PdfConversion-word-sample.docx was converted to a PDF file (rdtschools-Docx2PdfConverted_PDF_File.pdf) in :: 1781 milli seconds
Note: As I am using an IDE for these projects. So for testing/running this code I will suggest to import this as a Maven Project using the eclipse/STS IDE.
For your convenience I an uploading the project to the git repository. You can get it downloaded form there. Other articles for the utilization of the apache poi liberary can be found here.
Java Convert .docx File to .pdf File using XDocReport
In this Java tutorial we learn how to convert a Word file to PDF file in Java using the XDocReport library.
Table of contents
Add XDocReport Converter DOCX XWPF Dependency to Java Project
If you use Gradle build project, add the following dependency to the build.gradle file.
implementation group: 'fr.opensagres.xdocreport', name: 'fr.opensagres.xdocreport.converter.docx.xwpf', version: '2.0.3'
If you use Maven build project, add the following dependency to the pom.xml file.
fr.opensagres.xdocreport fr.opensagres.xdocreport.converter.docx.xwpf 2.0.3
How to convert .docx file to .pdf file in Java
In Java, with a given Word file we can use the XDocReport API with the following steps to convert it to a PDF file.
- Step 1: Open the .docx file as an InputStream using FileInputStream.
- Step 2: Create new XWPFDocument object using the XWPFDocument(InputStream is) constructor.
- Step 3: Create new instance of PdfOptions using the PdfOptions.create() static method.
- Step 4: Write the .pdf file as an OutputStream using FileOutputStream.
- Step 5: Use the PdfConverter.getInstance().convert( XWPFDocument document, OutputStream out, T options ) method to convert the .docx file to .pdf file.
In the FileConverter Java class below, we implement a method with the steps above to convert .docx file to .pdf file with given file names.
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter; import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions; import org.apache.poi.xwpf.usermodel.XWPFDocument; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.FileNotFoundException; import java.io.InputStream; import java.io.IOException; import java.io.OutputStream; public class FileConverter public void convertWordToPdf(String docxFileName, String pdfFileName) try(InputStream inputStream = new FileInputStream(docxFileName); OutputStream outputStream = new FileOutputStream(pdfFileName)) XWPFDocument document = new XWPFDocument(inputStream); PdfOptions options = PdfOptions.create(); // Convert .docx file to .pdf file PdfConverter.getInstance().convert(document, outputStream, options); > catch (FileNotFoundException e) e.printStackTrace(); > catch (IOException e) e.printStackTrace(); > > >
How to Use FileConverter Class to convert Word to PDF File
For example, we have a sample Word file located at D:\SimpleSolution\Data\Document.docx with the content as the screenshot below.
In the following example Java program, we use the FileConverter class in the previous step to convert the sample Word file above to a PDF file.
public class ConvertDocxToPdfExample1 public static void main(String. args) String docxFileName = "D:\\SimpleSolution\\Data\\Document.docx"; String pdfFileName = "D:\\SimpleSolution\\Data\\Document.pdf"; FileConverter fileConverter = new FileConverter(); fileConverter.convertWordToPdf(docxFileName, pdfFileName); > >
Execute the Java application, we have the PDF file be generated at D:\SimpleSolution\Data\Document.pdf as the screenshot below.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents to PDF files.
License
yeokm1/docs-to-pdf-converter
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
(I’m not maintaining this code as I neither have personal resources nor am I still using this project. I’ll be happy to oblige if you have any pull requests or even if you wish to be a co-maintainer.)
A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents to pdf files. (Requires JRE 7)
The v1.7 release has not been updated for about 2 years although it seems quite reliable for me. In response to an issue request to update the libraries, I have done so with the new v1.8. I now use Maven to managed the libraries in the pom.xml file.
I have not tested v1.8 much so if you face any issues, you can still use v1.7 in the Releases section.
I wanted a simple program that can convert Microsoft Office documents to PDF but without dependencies like LibreOffice or expensive proprietary solutions. Seeing as how code and libraries to convert each individual format is scattered around the web, I decided to combine all those solutions into one single program. Along the way, I decided to add ODT support as well since I encountered the code too.
java -jar doc-converter.jar -type "type" -input "path" -output "path" -verbose java -jar doc-converter.jar -input test.doc java -jar doc-converter.jar -i test.ppt -o ~\output.pdf java -jar doc-converter.jar -i ~\no-extension-file -o ~\output.pdf -t docx
-inputPath (-i, -in, -input) "path" : specifies a path for the input file -outputPath (-o, -out, -output) "path" : specifies a path for the output PDF, use input file directory and name.pdf if not specified (Optional) -type (-t) [DOC | DOCX | PPT | PPTX | ODT] : Specifies doc converter. Leave blank to let program infer via file extension (Optional) -verbose (-v) : To view intermediate processing messages. (Optional)
- Drop the jar into your lib folder and add to build path.
- Choose the converter of your choice, they are named DocToPDFConverter, DocxToPDFConverter, PptToPDFConverter, PptxToPDFConverter and OdtToPDFConverter.
- Instantiate with 4 parameters
- InputStream inStream : Document source stream to be converted
- OutputStream outStream : Document output stream
- boolean showMessages : Whether to show intermediate processing messages to Standard Out (stdout)
- boolean closeStreamsWhenComplete : Whether to close input and output streams when complete
- Call the «convert()» method and wait.
Caveats and technical details:
This tool relies on Apache POI, xdocreport, docx4j and odfdom libraries. They are not 100% reliable and the output format may not always be what you desire.
Generally ok but takes some time to convert.. I notice that after conversion, the paragraph spacing tends to increase affecting your page layout. Conversion is done using docx4j to convert DOC to DOCX then to PDF.(Cannot use xdocreport once the DOCX data is obtained as the intermediate data structure is docx4j specific.)
Very good results. Fast conversion too. Conversion is done using xdocreport library as it seems faster and more accurate than docx4j.
Resulting file is a PDF comprising of a PNG embedded in each page. Should be good enough for printing. This is the limitation of the Apache POI and docx4j libraries.
Quality and speed as good as DOCX. Conversion is done using odfdom of the Apache ODF Toolkit.
# If you don't already have Maven in your Mac brew install maven mvn clean package
The output jar file can be found in the target folder.
I’m using Eclipse Mars IDE Java EE with the M2Eclipse plugin. Simply create a workspace and import my project into it. Let Maven do its work in downloading all the necessary dependencies. Once everything is downloaded, you should be able to run the MainClass.
More details can be found in the Wiki section.
The MIT License (MIT) Copyright (c) 2013-2014 Yeo Kheng Meng
About
A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents to PDF files.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
A simple demo application showing howto simply convert from DOCX to PDF by using XDocReport
de-jcup/xdocreport-testapplication
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.adoc
This just a simple demo Java application to show how to use XDocReport Version 2.0.2 to convert from an existing DOCX document to a PDF.
How to start test application from command line
git clone https://github.com/de-jcup/xdocreport-testapplication cd xdocreport-testapplication ./gradlew run`
After execution you will find in last line something like
Wrote converted PDF to:/tmp/example_xdocreport_docx2pdf12410894179209343094.pdf
This is the generated PDF file. The origin DOCX file is inside src/main/resources .
How does it work — in a nutshell:
dependencies < // see https://github.com/opensagres/xdocreport/wiki/XDocReport200 implementation group: 'fr.opensagres.xdocreport', name: 'fr.opensagres.xdocreport.document.docx', version: '2.0.2' implementation group: 'fr.opensagres.xdocreport', name: 'fr.opensagres.xdocreport.converter.docx.xwpf', version: '2.0.2' >
link:src/main/java/de/jcup/examples/xdocreport/TestApplication.java[]